Skip to content

Configuration

Anton Ligterink edited this page Jun 25, 2021 · 5 revisions

Below, you can find the full list of customizable parameters included in the configuration file. Note that before running the toolkit, you will also need to change the SLURM settings at the top of the prstoolkit.sh file. Also, make sure to remove the '/' (forward-slash) at the end of any directory variable.

General settings

Parameter Description
PRSMETHOD Indicate what method to use [PLINK/RAPIDOPGS/PRSCS/PRSICE/NONE]. Pick NONE if you only whish to perform quality control.
PROJECTNAME Name of the project.
PROJECT_DIR Path to where the main analysis directory resides.
OUTPUT_DIRNAME Name of the output directory within the PROJECT_DIR directory.
SUBPROJECT_DIR_NAME Name of (sub)project -- this will be used to create subfolders within the OUTPUTDIR.
MAIN_WORKDIR_NAME Name of the working directory within the main analysis directory, used for temporary files.
LOG_DIRNAME Name of the subdirectory of the PROJECT_DIR directory used for storing log files.
QC Indicate whether quality control should be applied according to the MAF and INFO parameters. [YES/NO]
MAF Minimum minor allele frequency to keep variants, e.g. "0.005".
INFO Minimum imputation quality score to keep variants, e.g. "0.3".
KEEP_TEMP_FILES Keep the files temporarily generated by the toolkit at the end of the job. [TRUE/FALSE]
SAVE_CONFIG Save a copy of this configuration file along with the results. [TRUE/FALSE]

Input settings

Parameter Description RapidoPGS PRS-CS PRSice PLINK
BASEDATA Path to the file containing the base data. R R R R
BF_BUILD Build of the base file, e.g. "hg19" or "hg38". R
BF_ID_COL Name of the SNP ID column in the base file. R R R R
BF_CHR_COL Name of the chromosome column in the base file. R R
BF_POS_COL Name of the position column in the base file. R R
BF_EFFECT_COL Name of the effect allele column in the base file. R R R R
BF_NON_EFFECT_COL Name of the non-effect allele column in the base file. R R R
BF_STAT Type of measure in the BF_STAT_COL, either "beta" or "or". * R R
BF_STAT_COL Name of the beta/OR/effect size column in the base file. R R R R
BF_FRQ_COL Name of the effect allele frequency column in the base file. R/O**
BF_SE_COL Name of the column of the standard error of the beta/OR value. R
BF_PVALUE_COL Name of the column containing the P-values of the assocation test. R R R
BF_SBJ_COL Name of the column containing the sample size for each variant. R/O***
BF_SAMPLE_SIZE Sample size of the GWAS R/O*** R
BF_TARGET_TYPE "cc" for a case control trait, "quant" for a quantative trait R
LDDATA Path to the linkage disequilibrium reference data. PRS-CS and PRSice require a different format. R**** O*****
VALIDATIONDATA Path to the directory containing the validation data, e.g. /hpc/data/_ae_originals. R R R R
VALIDATIONPREFIX Prefix of the validation files in BGEN format v1.2, excluding the chr-number and extension, e.g. aegs_combo_1kGp3GoNL5_RAW_chr. R R R R
VAL_REF_POS Position of the reference allele in the BGEN files relative to the alternative allele, ref-first, ref-last or ref-unknown. R R R
SAMPLE_FILE Path to the sample file. A description of the sample file format can be found here. R R R R
PRSICE_PHENOTYPE Phenotype which will be used by PRSice to find the best fitted set of polygenic scores, this phenotype must be present in the sample file. R
PRSICE_PHENOTYPE_BINARY [TRUE/FALSE] indicating whether PRSICE_PHENOTYPE contains a binary phenotype. R
STATS_FILE Path to the stats file. O O O O
STATS_ID_COL Name of the stats file column containing the SNP IDs, these IDs must match the IDs that occur in the base file. O O O O
STATS_MAF_COL Name of the stats file column containing the minor allele frequency. O O O O
STATS_INFO_COL Name of the stats file column containing the imputation score. O O O O
Parameter Description RapidoPGS PRS-CS PRSice PLINK
``
``
``
``
``
``
``
``
``
``
``
``
Clone this wiki locally