TCRBagger is a multi-instance learning based efficient tool for identifying the immunogenic neoantigens presented by HLA-I molecular by bagging TCR profile.
This model is a combination of CNN extraction layer and gate-attention mechanism which can visualize the importance of each TCR.
The overview of TCRBagger algorithm construction and application are described as follows:
- python == 3.6.2
- numpy == 1.18.5
- tensorflow == 2.3.0
- pandas == 1.1.5
- scipy == 1.4.1
* Note : if you want to use GPU, you must install CUDA and cuDNN version compatible with the tensorflow version Version Searching.
Usage: ./run_TCRBagger.py [OPTIONS]
Required:
-o --output STRING: the directory path to the output files
Optional:
-b --bags STRING: the path to individual's constructed bags (*.pkl) (default: not used). (recommended)
-p --peplist STRING: the path to input individual's peptide list file (*.txt) (default: not used)
-t --tcrlist STRING: the path to input individual's tcr list file (*.txt) (default: not used)
-r1 --rnaseq1 STRING: the path to individual's RNAseq data 1 (like *.fastq.gz) (defaul: not used)
-r2 --rnaseq2 STRING: the path to individual's RNAseq data 2 (like *.fastq.gz) (defaul: not used)
-v --vcf STRING: the path to individual's vcf data (*.vcf) (default: not used)
-a --alleles STRING: the individual's HLA alleles, comma separated (default: not used)
-c --cthread INT: the number of threads used to bag embedding (default: 1)
We provide some examples show you how to use our already trained TCRBagger model to predict the probability to be a neo-epitope for a peptide.
* Note: The trained TCRBagger model can be downloaded from this google drive. Please add this downloaded model into ./Models directory.
We recommand this type of input, because users can design their own bags based on their personalized algorithm.
If you have test bags constructed by yourself, then model can directly predict the probability score for each peptide. Bag format specification should be like this in python data format:
We also provide an example test bag data which is the indenpent experiment 5 validation data set used in our paper.
python ./Scripts/run_TCRBagger.py -b ./Data/ExampleBags.pkl -o ./Outputs/Condition1
If you have a peptide list and a TCR profile, then TCRBagger can help you construct the bag for each peptide and predict the probability score for each peptide.
The peptide list and TCR profile specification we provide an example for illustration.
python ./Scripts/run_TCRBagger.py -p ./Data/ExamplePeptideList.txt -t ./Data/ExampleTcrList.txt -o ./Outputs/Condition2
If you only have a peptide list without TCR profile, then TCRBagger can be compatible with the MiXCR which is a developed tool used to predict TCR profile based on the example individual RNA-seq which can be obtained in SRR5811748. We downloaded the SRR5811748 fastq files in ./Data/ directory path. Therefore, the epitope probability score can alse be calculated by predicted TCR profile.
python ./Scripts/run_TCRBagger.py -p ./Data/ExamplePeptideList.txt -r1 ./Data/SRR5811748_1.fastq.gz -r2 ./Data/SRR5811748_1.fastq.gz -o ./Outputs/Condition3
If you don't have a peptide list and a TCR profile, while individual sequencing data is available, then TCRBagger can be the calibration model of candidate peptide list generated by the peptide prediction model, such as MuPeXI, Neopepsee and so on. The peptide list can be predict based on the VCF file of each individual and the TCR profile can also be predicted by the RNA-seq data. Our TCRBagger can be compatible with these tools and give each predicted peptide a probability score of being a neo-epitope.
python ./Scripts/run_TCRBagger.py -v ./Data/soma.vcf.gz -r1 ./Data/SRR5811748_1.fastq.gz -r2 ./Data/SRR5811748_2.fastq.gz -a HLA-A01:01,HLA-B08:01 -o ./Outputs/Condition4
* Note: the neoantigen prediction tools we used in the TCRBagger this version is only compatible with MuPeXI.
Based on the result from the probability score of each peptide, deeper analysis for the interpretation of user's intreseted bags can also be performed. We can extract the Attention block in TCRBagger and each TCR attention weight can be calculated.
Usage: ./TCRBaggerVisual.py [OPTIONS]
Required:
-i --input STRING: the path to one constructed bag (*.pkl) (default: not used)
-o --output STRING: the directory path to output files
We also provided two simple bag example for illustration.
python ./Scripts/TCRBaggerVisual.py -i ./Data/ExampleOneBag2.pkl -o ./Outputs/TCRAttention
Also, we provide the tutorial for the insteseted readers to train their own TCRBagger from scratch or fine-tune our model based on the larger data they may have.
Usage: ./TCRBagger.py [OPTIONS]
Required:
-i1 --training STRING: the path to constructed training bags (*.pkl) (default: not used)
-i2 --testing STRING: the path to constructed testing bags (*.pkl) (default: not used)
-l1 --traininglabels STRING: the path to constructed training labels (*.pkl) (default: not used)
-l2 --testinglabels STRING: the path to constructed testing labels (*.pkl) (default: not used)
-o --output STRING: the directory path to output new TCRBagger model
Optional:
-c --cthread INT: the number of threads used to bag embedding (default: 1)
We give a simple example data to show how to train the TCRBagger.
python ./Scripts/TCRBagger.py -i1 ./Data/ExampleTraining.pkl -i2 ./Data/ExampleTesting.pkl -l1 ./Data/ExampleTrainingLabels.pkl -l2 ./Data/ExampleTestingLabels.pkl -o ./Outputs/NewModel