Skip to content
Jeya Balaji Balasubramanian edited this page Jan 17, 2023 · 11 revisions

Welcome to the py-icare wiki!

Example data

Example datasets are provided at the data/ directory of this repository. Users can use them to explore the different features of iCARE and examine the outputs that they generate.

Variable name Description Value encoding
id Subject ID A unique identifier for each individual.
family_history Family history of breast cancer among first degree relatives. {0: "absence" (reference), 1: "presence"}
age_at_menarche Age at menarche (years) {<=11 (reference), 11-11.5, 11.5-12, 12-13, 13-14, 14-15, >=15}
parity Parity (number of full-term pregnancies) {nulliparous (reference), 1, 2, 3, >=4}
age_at_first_child_birth Age at first child birth (years) {<=19 (reference), 19-22, 22-23, 23-25, 25-27, 27-30, 30-34, 34-38, >=38}
age_at_menopause Age at menopause (years) {<=40 (reference), 40-45, 45-47, 47-48, 48-50, 50-51, 51-52, 52-53, 53-55, >=55}
height Height (meters) {<=1.55 (reference), 1.55-1.57, 1.57-1.60, 1.60-1.61, 1.61-1.63, 1.63-1.65, 1.65-1.66, 1.66-1.68, 1.68-1.71, >=1.71}
bmi Body mass index (kg/m2) {21.5 (reference), 21.5-23, 23-24.2, 24.2-25.3, 25.3-26.5, 26.5-27.8, 27.8-29.3, 29.3-31.4, 31.4-34.6, >=34.6}
menopause_hrt Use of Hormone Replacement Therapy (HRT) {0: "pre-menopausal" (reference), 1: "post-menopausal and never HRT user", 2: "post-menopausal and ever HRT user"}
menopause_hrt_e Use of estrogen-only therapy {0: "otherwise" (reference), 1: "post-menopausal and ever user of estrogen-only therapy"}
menopause_hrt_c Use of estrogen + progesterone combined therapy {0: "otherwise" (reference), 1: "post-menopausal and ever user of combined therapy"}
current_hrt Current use of HRT {0: "otherwise" (reference), 1: "post-menopausal and current HRT user"}
alcohol_consumption Alcohol (drinks/week) {"none" (reference), 0-0.4, 0.4-0.8, 0.8-1.5, 1.5-3.2, 3.2-5.7, 5.7-9.8, >9.8}
smoking_status Smoking status {"never" (reference), "ever"}
Variable name Description Value encoding
study_entry_age Age at study entry (years) continuous (integer)
study_exit_age Age at study exit (years) continuous (integer)
observed_outcome Disease status {0: "normal", 1: "case"}
time_of_onset Time (in years) from study entry to the development of the disease. Set to Inf if the subject did not develop the disease during the follow-up period. continuous (float)
observed_followup Number of years that the subject was followed-up in the study i.e. the difference between the age at study entry and the age at study exit. continuous (integer)
inclusion Is the individual selected for nested case-control study? If so, the sample is included in validation_nested_case_control_data.csv {0: "no", 1: "yes"}
  • validation_nested_case_control_data.csv: a simulated dataset of a case-control study of 5,285, nested within a cohort study (see validation_cohort_data.csv). In addition to the variables in the cohort study, this dataset contains the allele dosages of the 72 breast cancer-associated SNPs (see breast_cancer_72_snps_info.csv).
Clone this wiki locally