FST

Fst Estimations

NOTE: There are two different methods for estimating F_ST with ANGSD-wrapper
The first method uses ANGSD. This method does not have a visualization aspect
The second method uses ngsPopGen, and has a visualization aspect with Shiny graphing
To use ANGSD F_ST, please use the following command from within the ANGSD-wrapper directory:

git checkout master

This method estimates F_ST using ANGSD. Please see ANGSD's tutorial page for full details. When using ngsPopGen for calculating F_ST, please see ngsPopGen's documentation for full details.

Basic Usage

To run this method, use the following command

angsd-wrapper Fst FST_Config

where FST_Config is the full path to the configuration file for the F_ST estimations.

Input files

All inputs should be specified in FST_Config or 2DSFS_Fst_Config.

Common Variables

This method uses Common_Config. Variables and their functions are listed below:

Variable	Function
`ANC_SEQ`	Path to ancestral sequence
`REF_SEQ`	Path to reference sequence
`PROJECT`	Name given to all outputs in ANGSD-wrapper
`SCRATCH`	Place to store files, the full path is `SCRATCH/PROJECT/Fst`
`REGIONS`	Limit the scope of ANGSD-wrapper to certain genomic regions
`UNIQUE_ONLY`	Use uniquely mapped reads only
`MIN_BASEQUAL`	Minimum base quality score
`BAQ`	Adjust Q scores around indels
`MIN_IND1`	Minimum number of individuals in `GROUP_1` needed to use this site
`MIN_IND2`	Minimum number of individuals in `GROUP_2` needed to use this site
`GT_LIKELIHOOD`	Estimates genotype likelihoods
`MIN_MAPQ`	Minimum base mapping quality
`N_CORES`	Number of cores to use, please do not set above the limits of your system
`DO_MAJORMINOR`	Estimate major/minor alleles
`DO_MAF`	Calculate per-site frequencies
`DO_POST`	Calculate the posterior probability using per-site frequencies

Method-Specific Variables

These variables are specific to this method:

Variable	Function
`GROUP_1`	The name of the first population
`G1_SAMPLE_LIST` `G1_SAMPLES` on `dev`	A list with the full file path to all BAM files in `GROUP_1`
`G1_INBREEDING`	A list of inbreeding coefficients for `GROUP_1`
`GROUP_2`	The name of the second population
`G1_SAMPLE_LIST` `G2_SAMPLES` on `dev`	A list with the full file path to all BAM files in `GROUP_2`
`G2_INBREEDING`	A list of inbreeding coefficients for `GROUP_2`

Method Parameters

The parameters for this method can be tweaked as necessary, they have been set for optimal generalized function:

Parameter	Function
`DO_SAF`	Creates a site frequency spectrum
`BLOCKSIZE`	Number of sites in each chunk
`OVERRIDE`	If `true`, will recalculate files that already exist
`RELATIVE` ]	Set if input are absolute counts or relative frequencies, used with ngsPopGen only
`MAX_LIKE` ]	Method for computing maximum likelihood estimate, used with ngsPopGen only

Output files

Naming Scheme	Contents
`GROUP_1_Intergenic.arg`	Details of arguments for `GROUP_1`
`GROUP_1_Intergenic.saf[.gz]`	Site frequency data for `GROUP_1` in old (`.saf`) and new (`saf.gz`) formats
`GROUP_1_Intergenic.saf.idx`	Index of site frequency data for `GROUP_1`
`GROUP_1_Intergenic.mafs.gz`	Minor allele frequencies for `GROUP_1`
`GROUP_1_Intergenic.saf.pos.gz`	Position data of the saf file `GROUP_1`
`GROUP_2_Intergenic.arg`	Details of arguments for `GROUP_2`
`GROUP_2_Intergenic.saf[.gz]`	Site frequency data for `GROUP_2` in old (`.saf`) and new (`saf.gz`) formats
`GROUP_2_Intergenic.saf.idx`	Index of site frequency data for `GROUP_2`
`GROUP_2_Intergenic.mafs.gz`	Minor allele frequencies for `GROUP_2`
`GROUP_2_Intergenic.saf.pos.gz`	Position data of the saf file for `GROUP_2`
`shared.pos`	Overlapping sites of both groups
`2DSFS_Intergenic.GROUP_1.GROUP_2.sfs`	2DSFS results
`GROUP_1.GROUP_2.spectrum.txt`	The prior spectrum data for F_ST estimations
`GROUP_1.GROUP_2.fst`	Fst results in a tab-delimited table

The 2DSFS_Intergenic.GROUP_1.GROUP_2.sfs file will contain the 2DSFS results in matrix form. More recent versions of ANGSD automatically calculate 2DSFS on intersecting regions.

The .fst outfile will contain results in a tab-delimited file with columns "A, AB, f, FST, Pvar; where A is the expectation of genetic variance between populations, AB is the expectation of the total genetic variance, f is the correcting factor for the ratio of expectations, F_ST is the per-site F_ST value, Pvar is the probability for the site of being variable" (mfumagalli). There are no headers in the .fst outfile.

Visualization

ANGSD F_ST does not have visualization.

ngsFST

This method estimates F_ST with ngsPopGen. This method includes a Shiny graphing interface. To use this method, please run the following command:

git checkout ngsPopGen_Fst

You will have to run F_ST from this branch to generate files that can be visualized with the Shiny graphing interface.

Output files

Naming Scheme	Contents
`GROUP_1_Intergenic.arg`	Details of arguments for `GROUP_1`
`GROUP_1_Intergenic.saf[.gz]`	Site frequency data for `GROUP_1` in old (`.saf`) and new (`saf.gz`) formats
`GROUP_1_Intergenic.saf.idx`	Index of site frequency data for `GROUP_1`
`GROUP_1_Intergenic.mafs.gz`	Minor allele frequencies for `GROUP_1`
`GROUP_1_Intergenic.saf.pos.gz`	Position data of the saf file `GROUP_1`
`GROUP_2_Intergenic.arg`	Details of arguments for `GROUP_2`
`GROUP_2_Intergenic.saf[.gz]`	Site frequency data for `GROUP_2` in old (`.saf`) and new (`saf.gz`) formats
`GROUP_2_Intergenic.saf.idx`	Index of site frequency data for `GROUP_2`
`GROUP_2_Intergenic.mafs.gz`	Minor allele frequencies for `GROUP_2`
`GROUP_2_Intergenic.saf.pos.gz`	Position data of the saf file for `GROUP_2`
`shared.pos`	Overlapping sites of both groups
`2DSFS_Intergenic.GROUP_1.GROUP_2.sfs`	2DSFS results
`GROUP_1.GROUP_2.spectrum.txt`	The prior spectrum data for F_ST estimations
`GROUP_1.GROUP_2.fst`	Fst results in a tab-delimited table

The 2DSFS_Intergenic.GROUP_1.GROUP_2.sfs file will contain the 2DSFS results in matrix form. More recent versions of ANGSD automatically calculate 2DSFS on intersecting regions.

The .fst outfile will contain results in a tab-delimited file with columns "A, AB, f, FST, Pvar; where A is the expectation of genetic variance between populations, AB is the expectation of the total genetic variance, f is the correcting factor for the ratio of expectations, F_ST is the per-site F_ST value, Pvar is the probability for the site of being variable" (mfumagalli). There are no headers in the .fst outfile.

Visualization

GROUP_1.GROUP_2.fst can be visualized with the Shiny graphing interface. A web browser with a graphical user interface is required.

Main information

Methods

Formatting Files

Regions File Format

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FST

Fst Estimations

Basic Usage

Input files

Common Variables

Method-Specific Variables

Method Parameters

Output files

Visualization

ngsFST

Output files

Visualization

Main information

Methods

Formatting Files

Clone this wiki locally