Skip to content
Samuel Hamann edited this page May 5, 2021 · 4 revisions

Fst Estimations


NOTE: There are two different methods for estimating FST with ANGSD-wrapper
The first method uses ANGSD. This method does not have a visualization aspect
The second method uses ngsPopGen, and has a visualization aspect with Shiny graphing
To use ANGSD FST, please use the following command from within the ANGSD-wrapper directory:

git checkout master

This method estimates FST using ANGSD. Please see ANGSD's tutorial page for full details. When using ngsPopGen for calculating FST, please see ngsPopGen's documentation for full details.

Basic Usage

To run this method, use the following command

angsd-wrapper Fst FST_Config

where FST_Config is the full path to the configuration file for the FST estimations.

Input files

All inputs should be specified in FST_Config or 2DSFS_Fst_Config.

Common Variables

This method uses Common_Config. Variables and their functions are listed below:

Variable Function
ANC_SEQ Path to ancestral sequence
REF_SEQ Path to reference sequence
PROJECT Name given to all outputs in ANGSD-wrapper
SCRATCH Place to store files, the full path is SCRATCH/PROJECT/Fst
REGIONS Limit the scope of ANGSD-wrapper to certain genomic regions
UNIQUE_ONLY Use uniquely mapped reads only
MIN_BASEQUAL Minimum base quality score
BAQ Adjust Q scores around indels
MIN_IND1 Minimum number of individuals in GROUP_1 needed to use this site
MIN_IND2 Minimum number of individuals in GROUP_2 needed to use this site
GT_LIKELIHOOD Estimates genotype likelihoods
MIN_MAPQ Minimum base mapping quality
N_CORES Number of cores to use, please do not set above the limits of your system
DO_MAJORMINOR Estimate major/minor alleles
DO_MAF Calculate per-site frequencies
DO_POST Calculate the posterior probability using per-site frequencies

Method-Specific Variables

These variables are specific to this method:

Variable Function
GROUP_1 The name of the first population
G1_SAMPLE_LIST
G1_SAMPLES on dev
A list with the full file path to all BAM files in GROUP_1
G1_INBREEDING A list of inbreeding coefficients for GROUP_1
GROUP_2 The name of the second population
G1_SAMPLE_LIST
G2_SAMPLES on dev
A list with the full file path to all BAM files in GROUP_2
G2_INBREEDING A list of inbreeding coefficients for GROUP_2

Method Parameters

The parameters for this method can be tweaked as necessary, they have been set for optimal generalized function:

Parameter Function
DO_SAF Creates a site frequency spectrum
BLOCKSIZE Number of sites in each chunk
OVERRIDE If true, will recalculate files that already exist
RELATIVE ] Set if input are absolute counts or relative frequencies, used with ngsPopGen only
MAX_LIKE ] Method for computing maximum likelihood estimate, used with ngsPopGen only

Output files

Naming Scheme Contents
GROUP_1_Intergenic.arg Details of arguments for GROUP_1
GROUP_1_Intergenic.saf[.gz] Site frequency data for GROUP_1 in old (.saf) and new (saf.gz) formats
GROUP_1_Intergenic.saf.idx Index of site frequency data for GROUP_1
GROUP_1_Intergenic.mafs.gz Minor allele frequencies for GROUP_1
GROUP_1_Intergenic.saf.pos.gz Position data of the saf file GROUP_1
GROUP_2_Intergenic.arg Details of arguments for GROUP_2
GROUP_2_Intergenic.saf[.gz] Site frequency data for GROUP_2 in old (.saf) and new (saf.gz) formats
GROUP_2_Intergenic.saf.idx Index of site frequency data for GROUP_2
GROUP_2_Intergenic.mafs.gz Minor allele frequencies for GROUP_2
GROUP_2_Intergenic.saf.pos.gz Position data of the saf file for GROUP_2
shared.pos Overlapping sites of both groups
2DSFS_Intergenic.GROUP_1.GROUP_2.sfs 2DSFS results
GROUP_1.GROUP_2.spectrum.txt The prior spectrum data for FST estimations
GROUP_1.GROUP_2.fst Fst results in a tab-delimited table

The 2DSFS_Intergenic.GROUP_1.GROUP_2.sfs file will contain the 2DSFS results in matrix form. More recent versions of ANGSD automatically calculate 2DSFS on intersecting regions.

The .fst outfile will contain results in a tab-delimited file with columns "A, AB, f, FST, Pvar; where A is the expectation of genetic variance between populations, AB is the expectation of the total genetic variance, f is the correcting factor for the ratio of expectations, FST is the per-site FST value, Pvar is the probability for the site of being variable" (mfumagalli). There are no headers in the .fst outfile.

Visualization

ANGSD FST does not have visualization.


ngsFST

This method estimates FST with ngsPopGen. This method includes a Shiny graphing interface. To use this method, please run the following command:

git checkout ngsPopGen_Fst

You will have to run FST from this branch to generate files that can be visualized with the Shiny graphing interface.

Output files

Naming Scheme Contents
GROUP_1_Intergenic.arg Details of arguments for GROUP_1
GROUP_1_Intergenic.saf[.gz] Site frequency data for GROUP_1 in old (.saf) and new (saf.gz) formats
GROUP_1_Intergenic.saf.idx Index of site frequency data for GROUP_1
GROUP_1_Intergenic.mafs.gz Minor allele frequencies for GROUP_1
GROUP_1_Intergenic.saf.pos.gz Position data of the saf file GROUP_1
GROUP_2_Intergenic.arg Details of arguments for GROUP_2
GROUP_2_Intergenic.saf[.gz] Site frequency data for GROUP_2 in old (.saf) and new (saf.gz) formats
GROUP_2_Intergenic.saf.idx Index of site frequency data for GROUP_2
GROUP_2_Intergenic.mafs.gz Minor allele frequencies for GROUP_2
GROUP_2_Intergenic.saf.pos.gz Position data of the saf file for GROUP_2
shared.pos Overlapping sites of both groups
2DSFS_Intergenic.GROUP_1.GROUP_2.sfs 2DSFS results
GROUP_1.GROUP_2.spectrum.txt The prior spectrum data for FST estimations
GROUP_1.GROUP_2.fst Fst results in a tab-delimited table

The 2DSFS_Intergenic.GROUP_1.GROUP_2.sfs file will contain the 2DSFS results in matrix form. More recent versions of ANGSD automatically calculate 2DSFS on intersecting regions.

The .fst outfile will contain results in a tab-delimited file with columns "A, AB, f, FST, Pvar; where A is the expectation of genetic variance between populations, AB is the expectation of the total genetic variance, f is the correcting factor for the ratio of expectations, FST is the per-site FST value, Pvar is the probability for the site of being variable" (mfumagalli). There are no headers in the .fst outfile.

Visualization

GROUP_1.GROUP_2.fst can be visualized with the Shiny graphing interface. A web browser with a graphical user interface is required.