Skip to content
/ VLSA Public

Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology

License

Notifications You must be signed in to change notification settings

liupei101/VLSA

Repository files navigation

VLSA: Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology

[Preprint] | [VLSA Walkthrough] | [Awesome Papers of Pathology VLMs] | [Zhihu (中文)] | [WSI Preprocessing] | [Acknowledgements] | [Citation]

Abstract: Histopathology Whole-Slide Images (WSIs) provide an important tool to assess cancer prognosis in computational pathology (CPATH). While existing survival analysis (SA) approaches have made exciting progress, they are generally limited to adopting highly-expressive architectures and only coarse-grained patient-level labels to learn prognostic visual representations from gigapixel WSIs. Such learning paradigm suffers from important performance bottlenecks, when facing present scarce training data and standard multi-instance learning (MIL) framework in CPATH. To overcome it, this paper, for the first time, proposes a new Vision-Language-based SA (VLSA) paradigm. Concretely, (1) VLSA is driven by pathology VL foundation models. It no longer relies on high-capability networks and shows the advantage of data efficiency. (2) In vision-end, VLSA encodes prognostic language prior and then employs it as auxiliary signals to guide the aggregating of prognostic visual features at instance level, thereby compensating for the weak supervision in MIL. Moreover, given the characteristics of SA, we propose i) ordinal survival prompt learning to transform continuous survival labels into textual prompts; and ii) ordinal incidence function as prediction target to make SA compatible with VL-based prediction. Notably, VLSA's predictions can be interpreted intuitively by our Shapley values-based method. The extensive experiments on five datasets confirm the effectiveness of our scheme. Our VLSA could pave a new way for SA in CPATH by offering weakly-supervised MIL an effective means to learn valuable prognostic clues from gigapixel WSIs.


📚 Recent updates:

On updating. Stay tuned.

VLSA Walkthrough

Please refer to our Notebook - VLSA Walkthrough. It provides the detail of

  • individual incidence function prediction in VLSA models;
  • and prediction interpretation using our Shapley values-based method.

👩‍💻 Running the Code

Pre-requisites

All experiments are run on a machine with

  • one NVIDIA GeForce RTX 3090 GPU
  • python 3.8 and pytorch==1.11.0+cu113

Detailed package requirements:

  • for pip or conda users, full requirements are provided in requirements.txt.
  • for Docker users, you could use our base Docker image via docker pull yuukilp/deepath:py38-torch1.11.0-cuda11.3-cudnn8-devel and then install additional essential python packages (see requirements.txt) in the container.

Training models

Use the following command to load an experiment configuration and train the VLSA model (5-fold cross-validation):

python3 main.py --config config/IFMLE/tcga_blca/cfg_vlsa_conch.yaml --handler VLSA --multi_run

All important arguments are explained in config/IFMLE/tcga_blca/cfg_vlsa_conch.yaml.

For the traditional SA models only using visual features, use this one:

python3 main.py --config config/IFMLE/tcga_blca/cfg_sa_base_conch.yaml --handler SA --multi_run

Training Logs

We advocate open-source research. Our full training logs for VLSA models can be accessed at Google Drive.

🔥 Awesome Papers of Pathology VLMs

Foundational VLMs for computational pathology:

Model Architecture Paper Code Data
PLIP (NatMed'23) CLIP A visual language foundation model for pathology image analysis using medical twitter Github 208,414 pathology images paired with natural language descriptions from twitter
Quilt-Net (NeurIPS'23) CLIP Quilt-1M: One million image-text pairs for histopathology Github 802,148 image and text pairs from YouTube
CONCH (NatMed'24) CoCa A Vision-Language Foundation Model for Computational Pathology Github over 1.17 million image-caption pairs
CPLIP (CVPR'24) CLIP CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment Github Many-to-many VL alignment on ARCH dataset
PathAlign (arXiv'24) BLIP-2 PathAlign: A vision-language model for whole slide images in histopathology - over 350,000 WSIs and diagnostic text pairs

VLM-driven computational pathology tasks:

Model Subfield Paper Code Base
TOP (NeurIPS'23) WSI Classification The rise of ai language pathologists: Exploring two-level prompt learning for few-shot weakly-supervised whole slide image classification Github Few-shot WSI classification
FiVE (CVPR'24) WSI Classification Generalizable whole slide image classification with fine-grained visual-semantic interaction Github VLM pretraining for WSI classification
ViLa-MIL (CVPR'24) WSI Classification Vila-mil: Dual-scale vision language multiple instance learning for whole slide image classification Github Dual-scale features for WSI classification
VLSA (arXiv'24) WSI Survival Analysis Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology Github VLM-driven vision-language survival analysis
QPMIL-VL (arXiv'24) WSI Classification Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification Github VLM-driven Incremental Learning for WSIs

NOTE: please open a new PR if you want to add your work into this table.

WSI Preprocessing

Following CONCH, we first divide each WSI into patches of 448 * 448 pixels at 20x magnification. Then we adopt the image encoder of CONCH to extract patch features.

Our complete procedure in WSI preprocessing follows Pipeline-Processing-TCGA-Slides-for-MIL. You could move to it for a detailed tutorial.

Acknowledgements

Some parts of codes in this repo are adapted from the following amazing works. We thank the authors and developers for their selfless contributions.

  • CONCH: our VLSA is driven by this great pathology VLM.
  • OrdinalCLIP: adapted for survival prompt learning.
  • SurvivalEVAL: used for performance evaluation (D-cal and MAE computation).
  • Patch-GCN: we follow its all data splits in 5-fold cross-validation.

License and Terms of Use

ⓒ UESTC. The models and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of the VLSA model and its derivatives is prohibited and requires prior approval. If you are a commercial entity, please contact the corresponding author.

📝 Citation

If you find this work helps your research, please consider citing our paper:

@misc{liu2024interpretablevisionlanguagesurvivalanalysis,
    title={Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology}, 
    author={Pei Liu and Luping Ji and Jiaxiang Gou and Bo Fu and Mao Ye},
    year={2024},
    eprint={2409.09369},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2409.09369}, 
}

About

Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology

Topics

Resources

License

Stars

Watchers

Forks