-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Welcome to the py-icare wiki!
Example datasets are provided at the data/
directory of this repository. Users can use them to explore the different features available in iCARE and examine the outputs that they generate.
-
bc_model_covariates_info.json
: a JSON file containing the metadata about the risk factors present in the covariate dataset (reference_covariate_data.csv
). For each risk factor, the provided metadata includes information such as— the data type (type
; either"continuous"
or"discrete"
), a list of categories if the variable is discrete (levels
), and optionally the reference category of the discrete variable (ref
). If a reference category is not provided for a discrete variable, the first listed category under (levels
) is assumed to be the reference value. -
bc_model_formula.txt
: a patsy formula, which is a symbolic description of the covariate model to be fitted. Patsy is a Python substitute for R's formula class objects. If you are an R programmer, please read the patsy manual since it is not a perfect drop-in replacement for R's formula syntax. -
bc_72_snps.csv
: published SNP information (odds ratios and SNP frequencies). Reference: Michailidou, Kyriaki, et al. "Association analysis identifies 65 new breast cancer risk loci." Nature 551.7678 (2017): 92-94..
Variable name | Description | Value encoding |
---|---|---|
id |
Subject ID | A unique identifier for each individual. |
famhist |
Family history (of breast cancer among first degree relatives) | {0: "absence" (reference), 1: "presence"} |
menarche_dec |
Age at menarche (years) | {1: <=11 (reference), 2: 11-11.5, 3: 11.5-12, 5: 12-13, 8: 13-14, 9: 14-15, 10: >=15} |
parity |
Parity (number of full-term pregnancies) | {0: nulliparous (reference), 1: 1, 2: 2, 3: 3, 4: >=4} |
birth_dec |
Age at first child birth (years) | {1: <=19 (reference), 2: 19-22, 3: 22-23, 4: 23-25, 7: 25-27, 8: 27-30, 9: 30-34, 10: 34-38, 11: >=38} |
agemeno_dec |
Age at menopause (years) | {1: <=40 (reference), 2: 40-45, 3: 45-47, 4: 47-48, 5: 48-50, 6: 50-51, 7: 51-52, 8: 52-53, 9: 53-55, 10: >=55} |
height_dec |
Height (meters) | {1: <=1.55 (reference), 2: 1.55-1.57, 3: 1.57-1.60, 4: 1.60-1.61, 5: 1.61-1.63, 6: 1.63-1.65, 7: 1.65-1.66, 8: 1.66-1.68, 9: 1.68-1.71, 10: >=1.71} |
bmi_dec |
Body mass index (kg/m2) | {1: <=21.5 (reference), 2: 21.5-23, 3: 23-24.2, 4: 24.2-25.3, 5: 25.3-26.5, 6: 26.5-27.8, 7: 27.8-29.3, 8: 29.3-31.4, 9: 31.4-34.6, 10: >=34.6} |
rd_menohrt |
Use of Hormone Replacement Therapy (HRT) | {0: "pre-menopausal" (reference), 1: "post-menopausal and never HRT user", 2: "post-menopausal and ever HRT user"} |
rd2_everhrt_e |
Use of estrogen-only therapy | {0: "otherwise" (reference), 1: "post-menopausal and ever user of estrogen-only therapy"} |
rd2_everhrt_c |
Use of estrogen + progesterone combined therapy | {0: "otherwise" (reference), 1: "post-menopausal and ever user of combined therapy"} |
rd2_currhrt |
Current use of HRT | {0: "otherwise" (reference), 1: "post-menopausal and current HRT user"} |
alcoholdweek_dec |
Alcohol (drinks/week) | {1: "none" (reference), 4: 0-0.4, 5: 0.4-0.8, 6: 0.8-1.5, 7: 1.5-3.2, 8: 3.2-5.7, 9: 5.7-9.8, 10: >9.8} |
ever_smoke |
Smoking status | {0: "never" (reference), 1: "ever"} |
-
new_covariates_profile.csv
: a test dataset specifying the risk factors (same variables as in the reference covariate datasetreference_covariate_data.csv
) of three hypothetical individuals. -
new_snp_profile.csv
: a test dataset specifying the allele dosages for the breast cancer-associated SNPs (same SNPs as in thebc_72_snps.csv
file) of three hypothetical individuals. Note that some of the SNPs for some individuals are missing. -
validation_cohort_data.csv
: a simulated dataset of a full cohort study of 50,000 individuals. This dataset helps illustrate the model validation capabilities of iCARE. The variables in this dataset are as follows.
Variable name | Description | Value encoding |
---|---|---|
study_entry_age |
Age at study entry (years) | continuous (integer) |
study_exit_age |
Age at study exit (years) | continuous (integer) |
observed_outcome |
Disease status | {0: "normal", 1: "case"} |
time_of_onset |
Time (in years) from study entry to the development of the disease. Set to Inf if the subject did not develop the disease during the follow-up period. |
continuous (float) |
observed_followup |
Number of years that the subject was followed-up in the study i.e. the difference between the age at study entry and the age at study exit. | continuous (integer) |
-
validation_nested_case_control_data.csv
: a simulated dataset of a case-control study of 5,285, nested within a cohort study. Inclusion?