Skip to content
Jeya Balaji Balasubramanian edited this page Jan 16, 2023 · 11 revisions

Welcome to the py-icare wiki!

Example data

Example datasets are provided at the data/ directory of this repository. Users can use them to explore the different features available in iCARE and examine the outputs that they generate.

Variable name Description Value encoding
id Subject ID A unique identifier for each individual.
famhist Family history (of breast cancer among first degree relatives) {0: "absence" (reference), 1: "presence"}
menarche_dec Age at menarche (years) {1: <=11 (reference), 2: 11-11.5, 3: 11.5-12, 5: 12-13, 8: 13-14, 9: 14-15, 10: >=15}
parity Parity (number of full-term pregnancies) {0: nulliparous (reference), 1: 1, 2: 2, 3: 3, 4: >=4}
birth_dec Age at first child birth (years) {1: <=19 (reference), 2: 19-22, 3: 22-23, 4: 23-25, 7: 25-27, 8: 27-30, 9: 30-34, 10: 34-38, 11: >=38}
agemeno_dec Age at menopause (years) {1: <=40 (reference), 2: 40-45, 3: 45-47, 4: 47-48, 5: 48-50, 6: 50-51, 7: 51-52, 8: 52-53, 9: 53-55, 10: >=55}
height_dec Height (meters) {1: <=1.55 (reference), 2: 1.55-1.57, 3: 1.57-1.60, 4: 1.60-1.61, 5: 1.61-1.63, 6: 1.63-1.65, 7: 1.65-1.66, 8: 1.66-1.68, 9: 1.68-1.71, 10: >=1.71}
bmi_dec Body mass index (kg/m2) {1: <=21.5 (reference), 2: 21.5-23, 3: 23-24.2, 4: 24.2-25.3, 5: 25.3-26.5, 6: 26.5-27.8, 7: 27.8-29.3, 8: 29.3-31.4, 9: 31.4-34.6, 10: >=34.6}
rd_menohrt Use of Hormone Replacement Therapy (HRT) {0: "pre-menopausal" (reference), 1: "post-menopausal and never HRT user", 2: "post-menopausal and ever HRT user"}
rd2_everhrt_e Use of estrogen-only therapy {0: "otherwise" (reference), 1: "post-menopausal and ever user of estrogen-only therapy"}
rd2_everhrt_c Use of estrogen + progesterone combined therapy {0: "otherwise" (reference), 1: "post-menopausal and ever user of combined therapy"}
rd2_currhrt Current use of HRT {0: "otherwise" (reference), 1: "post-menopausal and current HRT user"}
alcoholdweek_dec Alcohol (drinks/week) {1: "none" (reference), 4: 0-0.4, 5: 0.4-0.8, 6: 0.8-1.5, 7: 1.5-3.2, 8: 3.2-5.7, 9: 5.7-9.8, 10: >9.8}
ever_smoke Smoking status {0: "never" (reference), 1: "ever"}
  • validation_cohort_data.csv: a simulated dataset of a full cohort study of 50,000 women. This dataset helps illustrate the model validation capabilities of iCARE. The variables in this dataset are as follows.
Variable name Description Value encoding
study_entry_age Age at study entry (years) continuous (integer)
study_exit_age Age at study exit (years) continuous (integer)
observed_outcome Disease status {0: "normal", 1: "case"}
time_of_onset Time (in years) from study entry to the development of the disease. Set to Inf if the subject did not develop the disease during the follow-up period. continuous (float)
observed_followup Number of years that the subject was followed-up in the study i.e. the difference between the age at study entry and the age at study exit. continuous (integer)
Clone this wiki locally