Master - Balancing selection on a supergene controlling social organisation in the ant Formica selysi
This pipeline processes whole-genome re-sequencing data to find signs of balancing selection in a socially polymorphic ant. The alpine silver ant (Formica selysi) can be monogynous (Sm allele) as well as polygynous (Sp allele) and this trait is genetically based on a supergene.
To find the signs of balancing selection, the following steps will be performed:
-
Pre-processing of the reads to have a good quality alignment of Sm/Sm and Sp/Sp individuals reads to the reference genome
-
Fst / Diversity / Tajima's D analysis
-
Whole-genome McDonald-Kreitman test to find genes under positive selection in both Sm and Sp
- Raw reads quality control
- Trim the adapters
- Map the reads to their respective (M or P) reference genome
- Mark duplicates
- Realign indels
- Get a clean sam file for M and one for P to use in the analyses
- Raw reads quality control
- Adapters trimming
- Remove leading and trailing low quality bases
- Cut low quality 4-mer
- Drop reads below the minimal length threshold
- Index the reference genome
- Map the reads against it
- Output in .sam format
https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_indels_IndelRealigner.php
https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_variantutils_GenotypeGVCFs.php
- Use the annotation to extract the coding regions of the genome from the alignment
- Calculate the dN, dS, pN, Ps and other metrics needed by Snipre to find genes under positive selection
- Create the Snipre input file and launch the r code
- Compare the results to find genes under positive selection only in M or only in P