This repository contains Pytorch implementation of training image representations and performance evaluation of the approach introduced in I. Zingman, B. Stierstorfer, C. Lempp, F. Heinemann. "Learning image representations for anomaly detection: application to discovery of histological alterations in drug development", Medical Image Analysis, 2024. It is also available on ArXiv and temporary has a free access on Elsevier.
The paper develops a method for anomaly detection in whole slide images of stained tissue samples in order to routinely screen histopathological data for abnormal alterations in tissue.
Figure above shows detection of adverse drug reactions by the Boehinger Ingelheim Histological Network (BIHN) based anomaly detection. A: The developed Anomaly Detection (AD) method detects induced tissue alterations in the liver of mouse after administration an experimental compound. The fraction of abnormal tiles increases with the the dosage of the compound. The compound was previously found to have toxic side effects in toxicological screening by pathologists. Each dot corresponds to a single Whole Slide Image (WSI). Three arrows correspond to three WSI examples given in B. Stars on the top of the graph show statistical significance of the change compared to the mean of control group. B: Examples of detected anomalies. In the control group (left image) blood and a few other not pathological structures result in a low level of false positives. Detections in compound treated groups (two right images) correspond to pathological alterations and were confirmed by a pathologist.
Requirements
PyTorch
, NumPy
, Pillow
, scikit-learn
The code in the repository was tested under Python 3.9
with GPU 11GB and packages' listed in the requirements.txt
.
It, however, should also run with earlier Python versions and smaller GPU memory.
Experiments (training image representations and performance evaluation)
Setting up dataset
-
The training dataset with normal tissue of different species, organs, and staining can be downloaded from
data/train/
folder from https://osf.io/gqutd/. This dataset was used for training image representations. -
The evaluation dataset with normal mouse liver tissue and mouse tissue with Non-Alcoholic Fatty Liver Disease (NAFLD) can be downloaded from
data/test/
folder from https://osf.io/gqutd/ -
Due to large sizes of zip files it is recommended to download each zip file separately.
-
Create the folder structure shown below under the root folder of your repository with the cloned code or in any other location. In the last case set
_prj_root
variable to the chosen location inconfigs/cfg_training_cnn.py
andconfigs/cfg_anomaly_detector.py
configuration files. We use *.py configuration files, not e.g. yaml, which allows more flexibility and is convenient for prototyping. -
Unzip downloaded data files to the corresponding folders within the created folders structure
-
If you want to use pre-trained models (instead of training yourself)
- download them from
trained models/
folder from https://osf.io/gqutd/. - unzip and save
EfficientNet_B0_320_HE_Liver_Mouse_acc0.9762.pt
,EfficientNet_B0_320_Masson_Liver_Mouse_acc0.9755.pt
CNN models, the correspondingEfficientNet_B0_320_Masson_Liver_Mouse_acc0.9755.pkl
andEfficientNet_B0_320_HE_Liver_Mouse_acc0.9762.pkl
anomaly detection models (One-cass SVM classifiers), and the correspondingEfficientNet_B0_320_HE_Liver_Mouse_acc0.9762_training_configuration.pkl
andEfficientNet_B0_320_Masson_Liver_Mouse_acc0.9755_training_configuration.pkl
configuration files into e.g.BIHN_models
folder under the project root.
- download them from
Folders structure for project's input
.
├── data
├── test
│ ├── NAFLD_anomaly_he_mouse_liver
│ ├── NAFLD_anomaly_mt_mouse_liver
│ ├── normal_he_mouse_liver
│ └── normal_mt_mouse_liver
└── train
├── he_mouse_brain
├── he_mouse_heart
├── he_mouse_kidney
├── he_mouse_liver
├── he_mouse_lung
├── he_mouse_pancreas
├── he_mouse_spleen
├── he_rat_liver
├── mt_mouse_brain
├── mt_mouse_heart
├── mt_mouse_kidney
├── mt_mouse_liver
├── mt_mouse_lung
├── mt_mouse_pancreas
├── mt_mouse_spleen
└── mt_rat_liver
Training
- Set variable
data_staining
inconfigs/cfg_training_cnn.py
to eitherMasson
(Massosn's Trichrome staining) orHE
(H&E staining) values, which will adjust training image representations for anomaly detection in images of tissue stained correspondingly. If you store the training data in your own location, updatepath_to_data
accordingly. - Run
python train_cnn.py --config configs/cfg_training_cnn.py
- The code generates
train_results/stamp
folder with trained models (models for each epoch and the best one), confusion matrix, configuration and log files, where stamp is a unique number that is set for each run. You can redefine the output folder in the configuration fileconfigs/cfg_training_cnn.py
, if needed, by updatingpath_to_results
.
- The code generates
Evaluation
- Set
cnn_model
variable inconfigs/cfg_anomaly_detector.py
to the relative to root path to the trained CNN model, which was generated in foldertrain_results/stamp/model_name.pt
during the training step above. Alternatively, you can set an arbitrary path to the downloaded from https://osf.io/gqutd pre-trained CNN model*.pt
. - If you've downloaded an anomaly model
*.pkl
from https://osf.io/gqutd/, setad_model
to its location. Alternatively, if you want to train anomaly model on your own (once-class classifier), setad_model
to empty string""
or to"CNN_location"
. - Run
python anomaly_detector.py --config configs/cfg_anomaly_detector.py
. The code will output evaluation results totest_results
folder. If anomaly model (once-class classfier) was trained, it will be saved to the folder where CNN model is.
Expected performance of anomaly detection with BIHN models
Staining | Balanced accuracy | AU-ROC | F1 score |
---|---|---|---|
H&E | 94.20% | 97.33% | 94.09% |
Masson Trichrome | 97.51% | 99.03% | 97.51% |
- To evaluate other algorithms from Anomalib library on our dataset with NAFLD pathology,
please consult Anomalib section Custom Dataset. Particularly, one needs to set appropriate paths in yaml configuration files of the chosen method located at
anomalib_root/anomalib/models/method/config_file.yaml
. The paths fields to be set in yaml arenormal_dir
,abnormal_dir
,normal_test_dir
, which should point to./data/train/*mouse_liver/
,./data/test/NAFLD_anomaly_*_mouse_liver
,./data/test/normal_*_mouse_liver
data paths correspondingly. The star in paths refers to a particular staining type,mt
orhe
you want to experiment with. Thetask
field should be set to "classification". - To evaluate DPA appraoch we adapted
Camalyon16Dataset
class, reading images from NAFLD dataset. We obtained our best results for DPA usingcamelyon16
wo_pg_unsupervsed
default configuration with the following parameters tunedinner_dims: 16, latent_dim:16
(for both decoder and encoder, same values for all layers as in the default configuration),initial_image_res:256, max_image_res:256, crop_size: 256
. Batch size was reduced to 64 to be able to run on 256x256 size images.
Use of pretrained BIHN models in your own projects
In oder to use pretrained BIHN models (*.pt files that can be downloaded from https://osf.io/gqutd/) to generate
feature representations of histopathological images (Masson or H&E) for your own tasks, you can consult the code example in model_use_example.py
.
Citing
@article{zingman2022anomaly,
title={Learning image representations for anomaly detection: application to discovery of histological alterations in drug development},
author={Igor Zingman and Birgit Stierstorfer and Charlotte Lempp and Fabian Heinemann},
year={2022},
journal={CoRR},
volume={abs/2210.07675},
eprinttype = {arXiv},
url = {https://arxiv.org/abs/2210.07675}
}
@online{NAFLD_dataset,
author = {Igor Zingman and Birgit Stierstofer and Fabian Heinemann},
title = {{NAFLD} pathology and healthy tissue samples},
year = {2022},
url = {https://osf.io/gqutd/},
}