Skip to content

Analyzing the offspring of 11 Quercus muehlenbergii individuals at the Morton Arboretum for hybridization and predictors of pollination success.

Notifications You must be signed in to change notification settings

HobanLab/USBG_Hybrid_Acorns

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Description

This GitHub repository is a combination of the analyses that were used to detect hybrids in the offspring of 9 Quercus muehlenbergii trees sampled at the Morton Arboretum in Fall of 2022. These analyses were performed through 2022 - 2024 by a combination of collaborators: Ash Hamilton, Mikaely Evans, and Emily Schumacher. Our study was mostly concerned with quantifying the levels of hybridization within the botanic garden collections of Morton Arboretum white oaks to provide a model system for protecting oaks in living collections without producing hybrid offspring.

We performed a study analyzing the parentage of acorns produced by maternal Quercus muehlenbergii individuals at the Morton Arboretum to identify if (1) any offspring individuals were hybrids and (2) what factors contribute to the parentage of offspring produced in living collections. We collected a total of 385 seeds, grew them to seedling stage, and sampled leaf tissue from the seedlings. Using 13 microsatellite loci, we performed parentage analysis using CERVUS software. Following this analysis, we produced multiple figures comparing the distance between candidate fathers, mothers, and hybrid production.

Project Workflow

This project went through several stages, and so the workflow to get through the analyses the data files is described below:

Geographic analyses: This was a preliminary stage of analyses that were performed to determine which botanic garden was a suitable model for our experimental question. These analyses were conducted using the RScripts stored in the "Geographic Analysis" file in the Analysis folder, with the data files used in the Data_Files/Geographic_Files pathway. The resulting CSV files are stored in the Results/Geographic_Analyses pathways.

  • We examined 3 different botanic garden model systems: the UC Davis campus, Starhill Arboretum, and the Morton Arboretum. This stage of the process examined these arboretums for a few different traits:
    • At least 10 individuals of a specific oak species within breeding distance (350 m) that were producing acorns in the summer of 2022.
  • Following the stage of this analysis we determined that the Morton Arboretum was the best suited to our study design.

Parentage analyses: Following our decision that the Morton Arboretum was the best site for this analysis, we sampled Quercus muehlenbergii individuals for acorns and leaf tissue. The acorns were grown into seedlings and then leaf tissue was sampled for parentage analyses, and these individuals, as well as all maternal individuals and any white oak within a 350 m radius of each maternal individual was sampled for parentage analysis. These individuals were genotyped using 13 microsatellite loci and analyzed in CERVUS parentage software. The description of the data processing to prep files for parentage analysis and the processing of the outputs is described below.

  • Preparing files for CERVUS: Genotype files generated from Geneious were processed in the Analysis/Parentage_Analysis/RScripts /01_data_cleaning_for_parentage.R file and then were used to generate the input files for parentage analysis in CERVUS. The input files used in this analysis were a genepop file (.genepop) and CSV file generated from Geneious files. These files were used to generate "clean" genotype files, which removed any individuals with 25% missing genotypes. We also tested loci for linkage disequilibrium and null allele frequency. Following null allele analysis, we identified 4 loci with high frequencies of null alleles (>15%) and so we created data files with and without these loci in this data file. "all_loci" files refer to data files with all 13 loci run on individuals in these data, whereas "red_loci" files refer to genotype files that have 4 loci removed because they were identifed to have high frequencies of null alleles (>15%). The results of this script are stored in the Analysis/CERVUS_Files pathway.
  • CERVUS_Files: All files used to run parentage analysis in CERVUS are stored in the Analysis/CERVUS_Files folder, which has separate folders for "all_loci" and "red_loci" data files, as these parentage runs were done separately. Each scenario of the analysis has an "Input_Files" and "Output_Files" folder. The "Input_Files" folder store all of the files needed to run parenatge analysis in CERVUS: a cleaned, genotype_df CSV file, an offspring file, an allele frequency file (.alf) generated in CERVUS, and a simulation file (.sim) generated in CERVUS. The genotype data file is identical to the cleaned score genotype file generated by the 01_data_cleaning_for_parentage.R RScript, with all loci or reduced loci. The results of CERVUS runs are stored in the Results/Parentage_Results/CSV_Files file pathway and are referred to with the suffix par_sum.
  • Parentage analysis result generation: The other RScripts in the Analysis/Parentage_Analysis/RScripts folder are to process the results of parentage analysis. The steps are detailed below:
  • The 02_data_cleaning_post_parentage.R script is used to process the results of the CERVUS parentage runs (par_sum files). This script first creates data files with the designation HCF. HCF stands for "high confidence father" which are candidate father assignments that were made with pairwise and trio LOD scores > 0. This resulted in four par_sum data files - overall four scenarios to determine the impact of null alleles and confidence of parentage assignments on the final figures.
  • Then, the 02_data_cleaning_post_parentage.R script processes all of the different scenario data files to create an organized results data file with the suffix analysis_df which have all the information for the mother and assigned father for each offspring individual. These CSV files are stored in the Results/Parentage_Results/CSV_Files file.
  • In the next script, the 03_parentage_figures_results.R script uses the analysis_df CSV files to do analyses like distance bewteen mother and candidate father, half-sibling status of mother and candidate father related to distance, and whether or not the mother and candidate father are the same species. These are summarized in data files with the suffix sum_stat_df in the Results/Parentage_Results/CSV_Files file. There is also a data file with the non-exclusion probabilities for father assignments within this pathway. The figures generated in this script are stored in the Results/Parentage_Results/Figures pathway.
  • Then an analysis simulating successful pollination events related to distance is performed in the 04_dist_analysis.R script, and the figures are stored in the Results/Parentage_Results/Figures pathway.

Folder Descriptions

Analysis:

This folder is divided into 3 separate analysis sections that cover a different set of analyses.

  • Geographic_Analysis
    • 01_geographic_analysis_prep.R
      • Description: This R Script was used to visualize oak species in botanic gardens Starhill Arboretum, UC Davis Campus, and the Morton Arboreutm. These data files were used to generate color coded maps stored in the project guide for this project.
    • 02_garden_summary_dfs.R
      • Description: This R Script was used to generate overview data frames of the oak species in Starhill Arboretum, UC Davis Campus, and the Morton Arboreutm - oak individuals above 10 years of age to visualize candidate sampling areas.
    • 03_TMA_all_trees.R
      • Description: This script was used to visualize candidate sites for acorn sampling once the Morton Arboretum was decided to be the best site for this project.
  • Parentage_Analysis
    • Description: All of the files used to run parentage analysis are stored in this folder.
    • CERVUS_Files
      • All_Loci
        • Input_Files
          • all_loci_allfreq.alf
          • all_loci_sim.sim
          • UHA_all_loci_clean_genotype_df.csv
          • UHA_offspring.csv
        • Output_Files
          • all_loci_par_sum.csv
      • Red_Loci
        • Input_Files
          • red_loci_allfreq.alf
          • red_loci_sim.sim
          • UHA_offspring.csv
          • UHA_red_loci_clean_genotype.csv
        • Output_Files
          • red_loci_par_sum.csv
    • RScripts
      • 01_data_cleaning_for_parentage.R
      • 02_data_cleaning_post_parentage.R
      • 03_parentage_figures_results.R
      • 04_dist_analysis.R
  • STRUCTURE
    • STRUCTURE results are stored here.

Archive:

Files that are no longer in use for this project are stored in this folder - these are from previous iterations of parentage analysis when there were fewer individuals genotyped.

Data_Files:

There are three types of data files - CSV files, genotype files (.genepop), and geographic files (CSV files and Excel files with coordinates of trees).

  • CSV_Files
  • Genotype_Files
  • Geographic_Files

Results:

This folder specifically stores the results from each stage of the analysis: preliminary geographic analyses, parentage results, genotyping.

  • Geographic_Analyses
    • Description: Initial maps to make decisions on sampling and surveying are stored here.
  • Parentage_Results
    • CSV_Files
    • Figures
  • Preliminary_Genotyping_Analysis

About

Analyzing the offspring of 11 Quercus muehlenbergii individuals at the Morton Arboretum for hybridization and predictors of pollination success.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 88.1%
  • R 11.9%