From 8d46e5e22f09066c1e493942d80c25bcb4aef3d7 Mon Sep 17 00:00:00 2001 From: Ashok Palaniappan <8831741+apalania@users.noreply.github.com> Date: Thu, 14 Mar 2024 18:23:24 +0530 Subject: [PATCH] Update README.md --- README.md | 67 ++++++++++++++++++++++++++++++------------------------- 1 file changed, 37 insertions(+), 30 deletions(-) diff --git a/README.md b/README.md index 8e411fb..d8519a4 100644 --- a/README.md +++ b/README.md @@ -1,45 +1,52 @@ # BC-Predict_Histological -Machine learning models for predicting the histological subtype of breast cancers +## Machine learning models for predicting the histological subtype of breast cancers -BC-Predict: Mining of signal biomarkers and multilevel validation of cascade classifier for early-stage breast cancer subtyping and prognosis +A resource to accompany: +Muthamilselvan S && Palaniappan A. [BC-Predict](https://apalania.shinyapps.io/BC-Predict): Mining of signal biomarkers and multilevel validation of cascade classifier for early-stage breast cancer subtyping and prognosis. 2024 (submitted) -BC-Predict_Histological -The BC-Predict web-server is built on Rshiny and deployed for academic research at https://apalania.shinyapps.io/BC-Predict. All predictions are accompanied by prediction probabilities to provide confidence for the predicted class. BC-Predict is written in R and meant only for academic use. For any commercial use, please contact: Authors (Dr Ashok Palaniappan). - -Confusion Matrix: -This is the performance of the ensemble model for the external validation described in our manuscript. The inconclusive events from the two models XGBoost and neural network (1 layer) were omitted. 11 such instances were ignored in constructing the confusion matrix. +## [BC-Predict](https://apalania.shinyapps.io/BC-Predict) +[BC-Predict](https://apalania.shinyapps.io/BC-Predict) is the primary resource translating the results from the above cited study into a unified predictive model of multiple problems in breast cancer heterogeneity. It provides access to all the models developed in the study. All predictions are accompanied by prediction probabilities to provide confidence for the predicted class. BC-Predict is available for purely academic research. For any use not indicated above, please contact: [Authors](mailto:apalania@scbt.sastra.edu). +[BC-Predict_Histological](https://github.com/apalania/BC-Predict_Histological) is a command-line interface to one of the models in the BC-Predict architecture, namely the Invasive Ductal v/s Invasive Lobular carcinoma. Since this problem was the least tractable of the different problems addressed, we are sharing the source code and the model objects developed, with a view to accelerating research in this area. The standalone interface is a refinement over the webserver -Confusion matrix -Ground Truth -D -L -Predicted -D -91 -6 -L -0 -7 +### Histological_subtype Model Performance +#### Confusion Matrix: +This is the performance of the ensemble model for the external validation described in our manuscript. The inconclusive events from the two models XGBoost and neural network (1 layer) were omitted. 11 such instances were ignored in constructing the confusion matrix. +| *Ref/Pred* |D |L | +|:---:|---|---| +| __D__ |91 |6 | +| __L__ | 0 |7 | where Reference (ground truth) in columns & Predicted class in rows; D: Ductal, L: Lobular. This yields a balanced accuracy of ~ 0.76. -Histology_subtype.R -> source Histology_subtype.R -Requests sample input from user, containing gene expression values of selected biomarkers (for a sample dataset, please see 'Datasets' below). -Loads the XGB.rds and model_neuralNet_1layer.rds model object and predicts the sample class (Ductal and lobular) along with the probability of the predicted class. The inconclusive events - -If the prediction class is not same from two model, then the predicted class is deemed 'Inconclusive'. -Provides a refined command-line interface for: BC-Predict webserver for histology subtype classification. +EnsembleClassifier_HistologicalSubtype.R +----------- + + > source EnsembleClassifier_HistologicalSubtype.R + +* Requests sample input from user, containing gene expression values of selected biomarkers (for a sample dataset, please see 'Datasets' below). +* Loads the model_XGB.rds and model_neuralNet_1layer.rds objects, and predicts the sample class (Ductal or Lobular) along with the probability of the predicted class. + - If the two models do not agree on the prediction class, then the prediction is deemed '_Inconclusive_'. +* Provides a refined command-line interface for the Histological Subtype model in: [BC-Predict](https://apalania.shinyapps.io/BC-Predict) webserver. + * suitable for further investigations and model improvement. + * Models -We provide the RDS objects of the best-performing models from our work (refer the Citation). These could be used in an Ensemble Classifier model for academic purposes. +----- +We provide the RDS objects of the best-performing Histological Subtyping models from our work (refer the Citation). These could be used in an Ensemble Classifier model for academic purposes (as implemented in [BC-Predict](https://apalania.shinyapps.io/BC-Predict) webserver). Both models were trained on the TCGA BRCA dataset. -1. XGB.rds: The XGBoost model built on the full TCGA BRCA dataset that is at the heart of [BC-Predict](https://apalania.shinyapps.io/BC-Predict) -2. model_neuralNet_1layer.rds: One of the other best-performing models based on neural network(refer the Citation) +1. model_XGB.rds: Optimized XGBoost model. +2. model_neuralNet_1layer.rds: Optimized 1-layer neural network model. -Test.csv: File format used as input to BC-Predict, both the web-server and command-line tool. Expression values of the biomarkers are provided one sample per line, in a comma-separated format, with a header line indicating the order of the biomarkers. +### Sample Datasets -Citing us: + +1. Test.csv: File format used as input to BC-Predict, both the web-server and command-line HistologicalSubtype. Expression values of the biomarkers are provided one sample per line, in a comma-separated format, with a header line indicating the order of the biomarkers. + +## Citing us: Muthamilselvan S, Palaniappan A. BC-Predict: Mining of signal biomarkers and multilevel validation of cascade classifier for early-stage breast cancer subtyping and prognosis (2024). Submitted +## Copyright & License + + +Copyright (c) 2024, the Authors @ [SASTRA University](https://www.sastra.edu). GPL-3.0 License (only this repo).