-
Notifications
You must be signed in to change notification settings - Fork 4
FST
NOTE: There are two different methods for estimating FST with ANGSD-wrapper
The first method uses ANGSD. This method does not have a visualization aspect
The second method uses ngsPopGen, and has a visualization aspect with Shiny graphing
To use ANGSD FST, please use the following command from within the ANGSD-wrapper directory:
git checkout master
This method estimates FST using ANGSD. Please see ANGSD's tutorial page for full details. When using ngsPopGen for calculating FST, please see ngsPopGen's documentation for full details.
To run this method, use the following command
angsd-wrapper Fst FST_Config
where FST_Config
is the full path to the configuration file for the FST estimations.
All inputs should be specified in FST_Config
or 2DSFS_Fst_Config
.
This method uses Common_Config
. Variables and their functions are listed below:
Variable | Function |
---|---|
ANC_SEQ |
Path to ancestral sequence |
REF_SEQ |
Path to reference sequence |
PROJECT |
Name given to all outputs in ANGSD-wrapper |
SCRATCH |
Place to store files, the full path is SCRATCH/PROJECT/Fst
|
REGIONS |
Limit the scope of ANGSD-wrapper to certain genomic regions |
UNIQUE_ONLY |
Use uniquely mapped reads only |
MIN_BASEQUAL |
Minimum base quality score |
BAQ |
Adjust Q scores around indels |
MIN_IND1 |
Minimum number of individuals in GROUP_1 needed to use this site |
MIN_IND2 |
Minimum number of individuals in GROUP_2 needed to use this site |
GT_LIKELIHOOD |
Estimates genotype likelihoods |
MIN_MAPQ |
Minimum base mapping quality |
N_CORES |
Number of cores to use, please do not set above the limits of your system |
DO_MAJORMINOR |
Estimate major/minor alleles |
DO_MAF |
Calculate per-site frequencies |
DO_POST |
Calculate the posterior probability using per-site frequencies |
These variables are specific to this method:
Variable | Function |
---|---|
GROUP_1 |
The name of the first population |
G1_SAMPLE_LIST G1_SAMPLES on dev
|
A list with the full file path to all BAM files in GROUP_1
|
G1_INBREEDING |
A list of inbreeding coefficients for GROUP_1
|
GROUP_2 |
The name of the second population |
G1_SAMPLE_LIST G2_SAMPLES on dev
|
A list with the full file path to all BAM files in GROUP_2
|
G2_INBREEDING |
A list of inbreeding coefficients for GROUP_2
|
The parameters for this method can be tweaked as necessary, they have been set for optimal generalized function:
Parameter | Function |
---|---|
DO_SAF |
Creates a site frequency spectrum |
BLOCKSIZE |
Number of sites in each chunk |
OVERRIDE |
If true , will recalculate files that already exist |
RELATIVE ] |
Set if input are absolute counts or relative frequencies, used with ngsPopGen only |
MAX_LIKE ] |
Method for computing maximum likelihood estimate, used with ngsPopGen only |
Naming Scheme | Contents |
---|---|
GROUP_1_Intergenic.arg |
Details of arguments for GROUP_1
|
GROUP_1_Intergenic.saf[.gz] |
Site frequency data for GROUP_1 in old (.saf ) and new (saf.gz ) formats |
GROUP_1_Intergenic.saf.idx |
Index of site frequency data for GROUP_1
|
GROUP_1_Intergenic.mafs.gz |
Minor allele frequencies for GROUP_1
|
GROUP_1_Intergenic.saf.pos.gz |
Position data of the saf file GROUP_1
|
GROUP_2_Intergenic.arg |
Details of arguments for GROUP_2
|
GROUP_2_Intergenic.saf[.gz] |
Site frequency data for GROUP_2 in old (.saf ) and new (saf.gz ) formats |
GROUP_2_Intergenic.saf.idx |
Index of site frequency data for GROUP_2
|
GROUP_2_Intergenic.mafs.gz |
Minor allele frequencies for GROUP_2
|
GROUP_2_Intergenic.saf.pos.gz |
Position data of the saf file for GROUP_2
|
shared.pos |
Overlapping sites of both groups |
2DSFS_Intergenic.GROUP_1.GROUP_2.sfs |
2DSFS results |
GROUP_1.GROUP_2.spectrum.txt |
The prior spectrum data for FST estimations |
GROUP_1.GROUP_2.fst |
Fst results in a tab-delimited table |
The 2DSFS_Intergenic.GROUP_1.GROUP_2.sfs
file will contain the 2DSFS results in matrix form. More recent versions of ANGSD automatically calculate 2DSFS on intersecting regions.
The .fst outfile will contain results in a tab-delimited file with columns "A, AB, f, FST, Pvar; where A is the expectation of genetic variance between populations, AB is the expectation of the total genetic variance, f is the correcting factor for the ratio of expectations, FST is the per-site FST value, Pvar is the probability for the site of being variable" (mfumagalli). There are no headers in the .fst
outfile.
ANGSD FST does not have visualization.
This method estimates FST with ngsPopGen. This method includes a Shiny graphing interface. To use this method, please run the following command:
git checkout ngsPopGen_Fst
You will have to run FST from this branch to generate files that can be visualized with the Shiny graphing interface.
Naming Scheme | Contents |
---|---|
GROUP_1_Intergenic.arg |
Details of arguments for GROUP_1
|
GROUP_1_Intergenic.saf[.gz] |
Site frequency data for GROUP_1 in old (.saf ) and new (saf.gz ) formats |
GROUP_1_Intergenic.saf.idx |
Index of site frequency data for GROUP_1
|
GROUP_1_Intergenic.mafs.gz |
Minor allele frequencies for GROUP_1
|
GROUP_1_Intergenic.saf.pos.gz |
Position data of the saf file GROUP_1
|
GROUP_2_Intergenic.arg |
Details of arguments for GROUP_2
|
GROUP_2_Intergenic.saf[.gz] |
Site frequency data for GROUP_2 in old (.saf ) and new (saf.gz ) formats |
GROUP_2_Intergenic.saf.idx |
Index of site frequency data for GROUP_2
|
GROUP_2_Intergenic.mafs.gz |
Minor allele frequencies for GROUP_2
|
GROUP_2_Intergenic.saf.pos.gz |
Position data of the saf file for GROUP_2
|
shared.pos |
Overlapping sites of both groups |
2DSFS_Intergenic.GROUP_1.GROUP_2.sfs |
2DSFS results |
GROUP_1.GROUP_2.spectrum.txt |
The prior spectrum data for FST estimations |
GROUP_1.GROUP_2.fst |
Fst results in a tab-delimited table |
The 2DSFS_Intergenic.GROUP_1.GROUP_2.sfs
file will contain the 2DSFS results in matrix form. More recent versions of ANGSD automatically calculate 2DSFS on intersecting regions.
The .fst outfile will contain results in a tab-delimited file with columns "A, AB, f, FST, Pvar; where A is the expectation of genetic variance between populations, AB is the expectation of the total genetic variance, f is the correcting factor for the ratio of expectations, FST is the per-site FST value, Pvar is the probability for the site of being variable" (mfumagalli). There are no headers in the .fst
outfile.
GROUP_1.GROUP_2.fst
can be visualized with the Shiny graphing interface. A web browser with a graphical user interface is required.