GitHub - BioX-NKU/scBackdoor: Backdoor attacks in single-cell pretrained models

Unveiling potential threats: backdoor attacks in single-cell pretrained models

🙋 Please let us know if you find out a mistake or have any suggestions!

🌟 If you find this resource helpful, please consider to star this repository and cite our research:

Sicheng Feng, Siyu Li, Luonan Chen, Shengquan Chen. Unveiling potential threats: backdoor attacks in single-cell pretrained models. 2024.

Requirements and Installation

We use python 3.9 from Anaconda. We provide two conda environments for the experiments: base.yml and geneformer.yml. The base.yml is for the scGPT and scBERT experiments, while the geneformer.yml is for the GeneFormer experiments.

To install all dependencies:

conda env create -f base.yml

# or
conda env create -f geneformer.yml

Datasets

Example datasets from [scGPT]
Example datasets from [GeneFormer]
Datasets from [Tabula Sapiens Single-Cell Dataset]

Place the downloaded contents under Yourpath4Dataset to reproduce the experiments.

Pretrained Models

You can download the pretrained models from [scGPT] (whole-human), [scBERT] and [GeneFormer], then place the downloaded contents under Yourpath4PretrainedModels to reproduce the experiments.

Quick Demos

Download datasets and pretrained models, then place them under rightpath and adjust the path-params in the scripts.
Then you can try to reproduce the experiments with the provided scripts. For example, you can evaluate on Human Pancreas datasets by:

nohup ./run.sh & # for scGPT_Exp

Details of Experiments

The commands to run the experiments are as follows:

nohup ./run.sh & # for scGPT_Exp
nohup ./run.sh & # for scBERT_Exp
nohup ./run.sh & # for GeneFormer_Exp
...

# or you can run the experiments in tmux or screen
./run_diff_batch.sh # for scGPT_Exp
./run_diff_feature.sh # for scGPT_Exp
...

The poison-related code is in the poison_utils.py or poison_trigger.py. You can find them in each experiment's folder.

The folder tree is as follows:

├── LICENSE
├── README.md                             -- introduction about the project
├── figures                               -- use for show up
│   └── fig1.png
├── requirements.txt                      -- requirements for installation
│── scGPT_Exp                             
│   ├── test                              -- the attack pipeline
│   │   ├── run.sh
│   │   ├── run_diff_batch.sh             -- explore the impact of batch effects
│   │   ├── run_diff_feature.sh           -- explore the impact of feature selection
│   │   ├── run_3datasets.sh              
│   │   └── scBackdoor.py
│   └── utils                             -- the scGPT items
│       ├── detect_tools.py
│       ├── poison_trigger.py
│       ├── preprocess.py
│       ├── print_tools.py
│       └── tools.py
├── GeneFormer_Exp 
│   ├── geneformer                        -- the GeneFormer items
│   │   ├── __init__.py
│   │   ├── classifier.py
│   │   ├── classifier_utils.py
│   │   ├── collator_for_classification.py
│   │   ├── emb_extractor.py
│   │   ├── evaluation_utils.py
│   │   ├── gene_median_dictionary.pkl
│   │   ├── gene_name_id_dict.pkl
│   │   ├── in_silico_perturber.py
│   │   ├── in_silico_perturber_stats.py
│   │   ├── perturber_utils.py
│   │   ├── poison_utils.py
│   │   ├── pretrainer.py
│   │   ├── token_dictionary.pkl
│   │   └── tokenizer.py
│   ├── run.sh                            -- the attack pipeline
│   └── geneformer_scBackdoor.py          
└── scBERT_Exp
    ├── attn_sum_save.py
    ├── finetune.py
    ├── lr_baseline_crossorgan.py
    ├── performer_pytorch                 -- the scBERT items
    │   ├── __init__.py
    │   ├── performer_pytorch.py
    │   └── reversible.py
    ├── poison_utils.py
    ├── predict.py
    ├── preprocess.py
    ├── pretrain.py
    ├── run.sh                            -- the attack pipeline
    ├── run_3datasets.sh
    └── utils.py

Acknowledgement

We sincerely thank the authors of the following open-source projects:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unveiling potential threats: backdoor attacks in single-cell pretrained models

Requirements and Installation

Datasets

Pretrained Models

Quick Demos

Details of Experiments

Further Reading

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
GeneFormer_Exp		GeneFormer_Exp
figures		figures
scBERT_Exp		scBERT_Exp
scGPT_Exp		scGPT_Exp
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
base.yml		base.yml
geneformer.yml		geneformer.yml

License

BioX-NKU/scBackdoor

Folders and files

Latest commit

History

Repository files navigation

Unveiling potential threats: backdoor attacks in single-cell pretrained models

Requirements and Installation

Datasets

Pretrained Models

Quick Demos

Details of Experiments

Further Reading

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages