Skip to content

Mass23/Master

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Master - Balancing selection on a supergene controlling social organisation in the ant Formica selysi

Massimo Bourquin

Summary

This pipeline processes whole-genome re-sequencing data to find signs of balancing selection in a socially polymorphic ant. The alpine silver ant (Formica selysi) can be monogynous (Sm allele) as well as polygynous (Sp allele) and this trait is genetically based on a supergene.

To find the signs of balancing selection, the following steps will be performed:

  1. Pre-processing of the reads to have a good quality alignment of Sm/Sm and Sp/Sp individuals reads to the reference genome

  2. Fst / Diversity / Tajima's D analysis

  3. Whole-genome McDonald-Kreitman test to find genes under positive selection in both Sm and Sp


1. Pre-processing

  • Raw reads quality control
  • Trim the adapters
  • Map the reads to their respective (M or P) reference genome
  • Mark duplicates
  • Realign indels
  • Get a clean sam file for M and one for P to use in the analyses

1.1 Quality control - FastQC

  • Raw reads quality control

1.2 Reads trimming - Trimmomatic

  • Adapters trimming
  • Remove leading and trailing low quality bases
  • Cut low quality 4-mer
  • Drop reads below the minimal length threshold

1.3 Burrow-wheeler aligner and trimming - BWA

  • Index the reference genome
  • Map the reads against it
  • Output in .sam format

1.4 Duplicates marking - Picard


1.5 Indels realignment - GATK


1.6 Genotyping - GATK


1.7 Variant filtration BCFtools filter


2. Fst / Diversity / Tajima's D analysis


3. Positive selection inferences

  • Use the annotation to extract the coding regions of the genome from the alignment
  • Calculate the dN, dS, pN, Ps and other metrics needed by Snipre to find genes under positive selection
  • Create the Snipre input file and launch the r code
  • Compare the results to find genes under positive selection only in M or only in P

3.1 Create individuals fasta and extract annotation


3.2 Calculate the dN, dS, pN, pS for each gene (Snipre input) - fasta2snipre.py


3.3 Bayesian method for McDonald-Kreitman test - Snipre


3.4 Ontology term analysis - Fisher exact test - OG_fisher.py