-
Notifications
You must be signed in to change notification settings - Fork 4
Principle Component Analysis
This method performs a principle component analysis (PCA) using ANGSD and ngsPopGen for PCA calculation. Please see NGSPopGen for full details on this method.
To run this method, use the following command
angsd-wrapper PCA Principal_Component_Analysis_Config
where Principal_Component_Analysis_Config
is the full path to the configuration file for the PCA.
All inputs should be specified in Principal_Component_Analysis_Config
.
This method does make use of Common_Config
, those that are used are listed below:
Variable | Function |
---|---|
SAMPLE_LIST GROUP_SAMPLES on dev
|
A list of samples to be used in calculations |
PROJECT |
Name given to all outputs in ANGSD-wrapper |
SCRATCH |
Place to store files, the full path is SCRATCH/PROJECT/PCA
|
REGIONS |
Limit the scope of ANGSD-wrapper to certain regions |
GT_LIKELIHOOD |
Estimates genotype likelihoods |
N_CORES |
Number of cores to use, please do not set above the limits of your system |
DO_MAJORMINOR |
Estimate major/minor alleles |
DO_GENO |
Call genotypes and setup the output |
DO_MAF |
Calculate per-site frequencies |
DO_POST |
Calculate the posterior probability using per-site frequencies |
This method has no method-specifc variables
The parameters for this method can be tweaked as necessary, they have been set for optimal generalized function:
Parameter | Function |
---|---|
CALL |
Call genotype from maximum probability |
N_SITES |
Set the maximum number of sites to use. If left undefined, wrapper uses all valid sites. |
Naming Scheme | Contents |
---|---|
PROJECT_PCA.arg |
Details of arguments |
PROJECT_PCA.covar |
Results of the principle component analysis |
PROJECT_PCA.geno |
Genotype calls |
PROJECT_PCA.mafs.gz |
Per-site frequencies |
PROJECT_PCA.backup.graph.me.[timestamp] |
ngsPopGen doesn't voluntarily overwrite existing data files, so ANGSD-Wrapper moves existing *.graph.me files to a unique backup within the same directory in order to prevent repeated executions. |
PROJECT_PCA.covar
(renamed to PROJECT_PCA.graph.me
during processing) can be visualized with the Shiny graphing interface. A web browser with a graphical user interface is required.
To compare clusters of known subgroups in the samples, create a .clst
clusters file that contains labels for each sample. The .clst
file should be formatted as follows, with the CLUSTER
value determining the color of the data point within the final plot. The header line should be included exactly as shown as well.
FID IID CLUSTER
[Sample Name] 1 [Group Name]
WBDC_348 1 wild
Morex_jh100 1 elite
... ... ...