TyG-er: an Ensemble Regression Forest Approach for Identification of Clinical Factors related to Insulin Resistance Condition using Electronic Health Records
published in Computers in Biology and Medicine by M. Bernardini, M. Morettini, L. Romeo, E. Frontoni and L. Burattini:
@article{bernardini2019tyg, title={TyG-er: An ensemble Regression Forest approach for identification of clinical factors related to insulin resistance condition using Electronic Health Records}, author={Bernardini, Michele and Morettini, Micaela and Romeo, Luca and Frontoni, Emanuele and Burattini, Laura}, journal={Computers in biology and medicine}, volume={112}, pages={103358}, year={2019}, publisher={Elsevier} }
The aim of this work was to propose a high-interpretable ensemble Regression Forest model combined with data imputation strategies, able to extract clinical factors from EHR data for providing early-preventive knowledge of glucose tolerance deterioration representing risk condition for type 2 diabetes.
We tested the reliability of the proposed approach, named TyG-er, on the Italian Federation of General Practitioners dataset, named FIMMG_obs dataset, publicly available at the following link: http://vrai.dii.univpm.it/content/fimmgobs-dataset
We tested the TyG-er appraoch in 3 different experimental procedures:
- Tenfold cross validation (CV-10);
- Tenfold cross validation over subjects (CVOS-10);
- Leave Last Records Out (LLRO).
Each experimental procedure consists of 4 different experiments:
a) Baseline; b) Extra values imputation; c) Median imputation; d) KNN imputation.
Matlab code to replicate all the experiments is provided by the authors.