For questions about input file format, see example folder.
For test runs with files in example folder, users will need to modify directories of fin_matrices
, etc.
When starting from standard output of rMATS, users should use this step to 1) reformat splice junction counts into a PSI (percent-spliced-in) value matrix, and 2) index and 3) move the PSI matrix for IRIS screening (when -d is enabled).
usage: IRIS format [-h] -t {SE,RI,A3SS,A5SS} -n DATA_NAME -s {1,2}
[-c COV_CUTOFF] [-i] [-e] [-d IRIS_DB_PATH] [--novelSS]
[--gtf GTF]
rmats_mat_path_manifest rmats_sample_order
required arguments:
rmats_mat_path_manifest
txt manifest of path(s) to rMATS output folder(s)
rmats_sample_order TXT file manifest of corresponding rMATS input sample
order file(s). Required input for rMATS
-t {SE,RI,A3SS,A5SS}, --splicing-event-type {SE,RI,A3SS,A5SS}
String of splicing event types based on rMATS
definition (SE,RI,A3SS,A5SS).Used to name output file
-n DATA_NAME, --data-name DATA_NAME
Defines dataset name (disease state, study name, group
name etc.). Used during IRIS screening
-s {1,2}, --sample-name-field {1,2}
Specifies sample name field (1- SJ count file name, 2-
SJ count folder name), for each sample the name should
match their name in "rmats_sample_order"
optional arguments:
-h, --help show this help message and exit
-c COV_CUTOFF, --cov-cutoff COV_CUTOFF
Average coverage filter for merged matrix (Default is
10)
-i, --sample-based-filter
Coverage filter by individual sample not by entire
input group. (Default is disabled)
-e, --merge-events-only
Do not perform matrix merge, only merge events list
-d IRIS_DB_PATH, --iris-db-path IRIS_DB_PATH
Path to store the formatted/indexed AS matrix.
Strongly recommend to store the AS matrix to the IRIS
db by setting the path to the directory containing
folders of pre-index AS reference
("full_path/IRIS_data.vX/db"). Default is current
location.
--novelSS Enable formatting events with splice junctions
containing novelSS. (Different and a subset of rMATS
novelSS definition. Default is False)
--gtf GTF Path to the Genome annotation GTF file. Required input
when novelSS is enabled.
This step takes a user-defined screening parameter file (example/NEPC_test.para), and performs comparisons against reference databases, and returns tumor-associated, tumor-recurrent, and tumor-specific AS events based on user-defined criteria.
When the -t option is enabled, the screening step translates identified tumor AS events into peptide sequences that can be used in the prediction step.
usage: IRIS screen [-h] -p PARAMETER_FIN
[--splicing-event-type {SE,RI,A3SS,A5SS}] -o OUTDIR [-t]
[-g GTF] [--all-orf] [--ignore-annotation]
[--remove-early-stop] [--min-sample-count MIN_SAMPLE_COUNT]
[--use-existing-test-result]
required arguments:
-p PARAMETER_FIN, --parameter-fin PARAMETER_FIN
File of 'IRIS screen' parameters
--splicing-event-type {SE,RI,A3SS,A5SS}
String of splicing event types based on rMATS
definition (SE,RI,A3SS,A5SS).Used to name output file.
(Default is SE event)
-o OUTDIR, --outdir OUTDIR
Directory of IRIS screening results
optional arguments:
-h, --help show this help message and exit
-t, --translating Translates IRIS-screened tumor splice junctions into
peptides
-g GTF, --gtf GTF The Genome annotation GTF file. Required by IRIS
translate option.
--all-orf Perform the 3 ORF translation. ORF known in the
UniProtKB will be labeled as uniprotFrame in the bed
file (Default is to use the known ORF ONLY)
--ignore-annotation Perform 3 ORF translation without annotating known ORF
from the UniProtKB (Default is disabled)
--remove-early-stop Discard the peptide if containing early stop codon
(Default is keep the truncated peptide)
--min-sample-count MIN_SAMPLE_COUNT
The minimum number of non-missing sample in the input
group for an event to be considered for testiing. Once
specified, removed events will be written to "notest"
file. (Default is no minimum)
--use-existing-test-result
Skip testing and use existing testing result (Default
is run full testing steps)
Additionally, screen_sjc
can be performed as part of the 'tumor-specificity' screen.
usage: IRIS screen_sjc [-h] -p PARAMETER_FIN
--splicing-event-type {SE,RI,A3SS,A5SS}
-e EVENT_LIST_FILE -o OUTDIR
[--use-existing-test-result]
[--tumor-read-cov-cutoff TUMOR_READ_COV_CUTOFF]
[--normal-read-cov-cutoff NORMAL_READ_COV_CUTOFF]
Optionally, screen_cpm
can be performed to as a higher stringent 'tumor-association' screen or less stringent 'tumor-specificity' by using normalized splice junction counts (in CPM).
usage: IRIS screen_cpm [-h] -p PARAMETER_FIN
--splicing-event-type {SE,RI,A3SS,A5SS}
-e EVENT_LIST_FILE -o OUTDIR
[--use-existing-test-result]
This step takes the screening result and performs annotation of extracellular and HLA-binding epitope predictions to discover immunotherapy targets.
IRIS prediction of HLA-binding epitopes is a massive prediction job that can utilize a compute cluster. The prediction
step will create scripts to perform subtasks. If properly configured, those subtask scripts can be executed concurrently by snakemake.
usage: IRIS predict [-h] --task-dir TASK_DIR -p PARAMETER_FIN
[-t {SE,RI,A3SS,A5SS}] [--iedb-local IEDB_LOCAL]
[-m MHC_LIST] [--extracellular-only] [--tier3-only]
[--gene-exp-matrix GENE_EXP_MATRIX] [-c DELTAPSI_COLUMN]
[-d DELTAPSI_CUT_OFF] [-e EPITOPE_LEN_LIST] [--all-orf]
[--extracellular-anno-by-junction]
IRIS_screening_result_path
required arguments:
IRIS_screening_result_path
Directory of IRIS screening results
--task-dir TASK_DIR Directory to write individual task scripts
-p PARAMETER_FIN, --parameter-fin PARAMETER_FIN
File of parameters used in 'IRIS screen'
-t {SE,RI,A3SS,A5SS}, --splicing-event-type {SE,RI,A3SS,A5SS}
String of splicing event types based on rMATS
definition (SE,RI,A3SS,A5SS).Used to name output file.
(Default is SE event)
optional arguments:
-h, --help show this help message and exit
--iedb-local IEDB_LOCAL
Specify local IEDB location (if installed)
-m MHC_LIST, --mhc-list MHC_LIST
List of HLA/MHC types among samples. HLA type follows
seq2HLA format
--extracellular-only Only predict CAR-T Targets. Will not predict HLA
binding.
--tier3-only To only run predict on events passing all screen
tiers, which is the tier3 output. Will be much faster
when both the tier1 and tier3 were used.
--gene-exp-matrix GENE_EXP_MATRIX
Tab-delimited matrix of gene expression vs. samples
-c DELTAPSI_COLUMN, --deltaPSI-column DELTAPSI_COLUMN
Column of deltaPSI value in matrix, 1-based (Default
is 5th column)
-d DELTAPSI_CUT_OFF, --deltaPSI-cut-off DELTAPSI_CUT_OFF
Defines cutoff of deltaPSI (or other metric) to select
tumor-enriched splice form (Default is 0)
-e EPITOPE_LEN_LIST, --epitope-len-list EPITOPE_LEN_LIST
Epitope length for prediction (Default is 9,10,11)
--all-orf Perform prediction based on 3 ORF translation
peptides. Enable this if translation/screening used
this option (Default is False)
--extracellular-anno-by-junction
By default, CAR-T targets are annotated by association
of event with extracellular domain. This option
annotates target based on a junction (not recommended)
usage: IRIS epitope_post [-h] -p PARAMETER_FIN -o OUTDIR
[-t {SE,RI,A3SS,A5SS}] -m MHC_BY_SAMPLE
[-e GENE_EXP_MATRIX] [--tier3-only] [--keep-exist]
[--epitope-len-list EPITOPE_LEN_LIST]
[--no-match-to-canonical-proteome]
[--no-uniqueness-annotation]
[--ic50-cut-off IC50_CUT_OFF]
required arguments:
-p PARAMETER_FIN, --parameter-fin PARAMETER_FIN
File of parameters used in IRIS screen
-o OUTDIR, --outdir OUTDIR
Directory of IRIS screening results
-t {SE,RI,A3SS,A5SS}, --splicing-event-type {SE,RI,A3SS,A5SS}
String of splicing event types based on rMATS
definition (SE,RI,A3SS,A5SS).Used to name output file
(Default is SE event)
-m MHC_BY_SAMPLE, --mhc-by-sample MHC_BY_SAMPLE
Tab-delimited matrix of HLA/MHC type vs. samples. HLA
type follows seq2HLA format
-e GENE_EXP_MATRIX, --gene-exp-matrix GENE_EXP_MATRIX
Tab-delimited matrix of gene expression vs. samples
optional arguments:
-h, --help show this help message and exit
--tier3-only Only predict tier3 events. Will be much faster.
--keep-exist Do not rewrite a new postive prediction file when the
file existed. Default is False
--epitope-len-list EPITOPE_LEN_LIST
Epitope length for prediction (Default is 9,10,11)
--no-match-to-canonical-proteome
Matches epitopes to UniProt canonical protein
sequences as an annotation.
--no-uniqueness-annotation
Matches epitopes to all IRIS translated junction
peptides in the same analysis as an annotation.
--ic50-cut-off IC50_CUT_OFF
Specifies IC50 cut-off to define HLA-binding epitopes
(Default is 500)
When starting from a fastq file, users should use this step to perform RNA-Seq alignment and quantification. This module uses STAR and cufflinks. This module only takes one sample (can be multiple fastq files) for each run. Users are recommended to run this module in parallel (use makesubsh_mapping
for snakemake).
usage: IRIS process_rnaseq [-h] --starGenomeDir STARGENOMEDIR --gtf GTF -p
SAMPLEID_OUTDIR [--db-length DB_LENGTH] [--mapping]
[--quant] [--sort]
readsFilesRNA
required arguments:
--starGenomeDir STARGENOMEDIR
The path to the STAR indexed reference genome. Pass to
the "genomeDir" parameter in STAR
--gtf GTF Path to the Genome annotation GTF file
-p SAMPLEID_OUTDIR, --sampleID-outdir SAMPLEID_OUTDIR
Output directory where sample ID will be used as the
output folder name
--db-length DB_LENGTH
Pass to the "sjdbOverhang" parameter in STAR. Default
is 100
readsFilesRNA Specify the path to the paired-end FASTQ files for the
sample. Files are seperated eperated by ",".
optional arguments:
-h, --help show this help message and exit
--mapping Only perform reads mapping
--quant Only perform gene expression and AS quantification
--sort Only perform BAM file sorting
Run process_rnaseq
jobs in parallel on HPC or cloud based on snakemake.
usage: IRIS makesubsh_mapping [-h] [--fastq-folder-dir FASTQ_FOLDER_DIR]
--starGenomeDir STARGENOMEDIR --gtf GTF
--data-name DATA_NAME --outdir OUTDIR
--label-string LABEL_STRING --task-dir TASK_DIR
required arguments:
--fastq-folder-dir FASTQ_FOLDER_DIR
Specify the path to the higher level of all folders
containing FASTQ files
--starGenomeDir STARGENOMEDIR
The path to the STAR indexed reference genome. Pass to
the "genomeDir" parameter in STAR
--gtf GTF Path to the Genome annotation GTF file
--data-name DATA_NAME
Data set name used to name submission shell scripts
files.
--outdir OUTDIR Output directory for folders of aligned BAM files
--label-string LABEL_STRING
String in the fastq file name between the reads pair
number and "fastq/fq". This is used to recognize
paired-end reads. e.g. For FASTQ_file_L1_R2.fastq.gz,
the label string is the "." between "2" and "fastq".
--task-dir TASK_DIR Directory to write individual task scripts
optional arguments:
-h, --help show this help message and exit
After running process_rnaseq
, this step can be used to prepare files to run rMATS-turbo in parallel.
usage: IRIS makesubsh_rmats [-h] --rMATS-path RMATS_PATH --bam-dir BAM_DIR
[--bam-prefix BAM_PREFIX] --gtf GTF --data-name
DATA_NAME --task-dir TASK_DIR [--novelSS]
[--read-length READ_LENGTH]
required arguments:
--rMATS-path RMATS_PATH
Path to the rMATS-turbo script.
--bam-dir BAM_DIR The path one level higher to folders containing BAM
file generated by "process_rnaseq".
--bam-prefix BAM_PREFIX
BAM file prefix (Default is
"Aligned.sortedByCoord.out")
--gtf GTF Path to the Genome annotation GTF file
--data-name DATA_NAME
Data set name used to name submission shell scripts
--task-dir TASK_DIR Directory to write individual task scripts
optional arguments:
-h, --help show this help message and exit
--novelSS Enable rMATS novelSS option to include novel splice
site detected from the RNA-seq data (Default is False)
--read-length READ_LENGTH
User defined read length instead of using STAR maaping
log file to define automatically.
After running makesubsh_rmats
, this step can be used to merge files to generate final rMATS-turbo results.
usage: IRIS makesubsh_rmatspost [-h] --rMATS-path RMATS_PATH --bam-dir BAM_DIR
--gtf GTF --data-name DATA_NAME [--novelSS]
--task-dir TASK_DIR
required arguments:
--rMATS-path RMATS_PATH
Path to the rMATS-turbo scripte
--bam-dir BAM_DIR The path one level higher to folders containing BAM
file generated by "process_rnaseq".
--gtf GTF Path to the Genome annotation GTF file
--data-name DATA_NAME
Data set name used to name submission shell scripts
--task-dir TASK_DIR Directory to write individual task scripts
optional arguments:
-h, --help show this help message and exit
--novelSS Enable rMATS novelSS option to include novel splice
site detected from the RNA-seq data (Default is False)
After running process_rnaseq
, if samples of interest are all processed, users can use this script to generate a gene expression matrix, which will be used as annotations in downstream IRIS prediction and/or proteomics reports.
usage: IRIS exp_matrix [-h] [--exp-cutoff EXP_CUTOFF] -o OUTDIR -n DATA_NAME
gene_exp_file_list
required arguments:
gene_exp_file_list A txt manifest of path(s) of cufflinks gene expression
output(s).
-n DATA_NAME, --data-name DATA_NAME
Name of the dataset (disease state, study name, group
name etc.).
optional arguments:
-h, --help show this help message and exit
--exp-cutoff EXP_CUTOFF
Gene expression cut-off based on FPKM (Default is 1)
-o OUTDIR, --outdir OUTDIR
Output directory for IRIS exp_matrix
This step is incorporated by formatting. For users who already have a matrix of AS PSI values (generated by rMATS or another tool), this command could finish the indexing and other steps to prepare for IRIS screening.
usage: IRIS index [-h] -t {SE,RI,A3SS,A5SS} -n DATA_NAME [-c COV_CUTOFF]
[-o OUTDIR]
splicing_matrix
required arguments:
splicing_matrix Tab-delimited matrix of splicing events (row) vs.
sample IDs (col)
-t {SE,RI,A3SS,A5SS}, --splicing-event-type {SE,RI,A3SS,A5SS}
String of splicing event types based on rMATS
definition (SE,RI,A3SS,A5SS).Used to name output file
-n DATA_NAME, --data-name DATA_NAME
Name of data matrix (disease state, study name, group
name, etc.) being indexed. Used by IRIS during
screening
optional arguments:
-h, --help show this help message and exit
-c COV_CUTOFF, --cov-cutoff COV_CUTOFF
For the naming purpose, Input average coverage cutoff
used when generating the PSI matrix (Default is 10)
-o OUTDIR, --outdir OUTDIR
Output directory for IRIS database
usage: IRIS translate [-h] -g REF_GENOME -t {SE,RI,A3SS,A5SS} --gtf GTF -o
OUTDIR [--all-orf] [--ignore-annotation]
[--remove-early-stop] [-c DELTAPSI_COLUMN]
[-d DELTAPSI_CUT_OFF] [--no-tumor-form-selection]
[--check-novel]
as_input
required arguments:
as_input Inputs AS event coordinates and delta PSI values
-g REF_GENOME, --ref-genome REF_GENOME
Specifies reference genome (FASTA format) location
-t {SE,RI,A3SS,A5SS}, --splicing-event-type {SE,RI,A3SS,A5SS}
String of splicing event types based on rMATS
definition (SE,RI,A3SS,A5SS).Used to name output file
--gtf GTF Path to the Genome annotation GTF file. Used to define
exon ends for microexons
-o OUTDIR, --outdir OUTDIR
Defines IRIS translation output directory
optional arguments:
-h, --help show this help message and exit
--all-orf Perform the 3 ORF translation. ORF known in the
UniProtKB will be labeled as uniprotFrame in the bed
file (Default is to use the known ORF ONLY)
--ignore-annotation Perform 3 ORF translation without annotating known ORF
from the UniProtKB (Default is disabled)
--remove-early-stop Discard the peptide if containing early stop codon
(Default is keep the truncated peptide)
-c DELTAPSI_COLUMN, --deltaPSI-column DELTAPSI_COLUMN
Column of deltaPSI value in matrix, 1-based (Default
is 5th column)
-d DELTAPSI_CUT_OFF, --deltaPSI-cut-off DELTAPSI_CUT_OFF
Defines cutoff of deltaPSI (or other metric) used to
select tumor-enriched splice form (Default is 0)
--no-tumor-form-selection
Translates splicing junctions derived from both
skipping and inclusion forms (Default is False)
--check-novel Translates splicing junctions derived from novel
splice sites only using information passed from
screen_novelss (Default is False)
This step uses the RNA-Seq FASTQ file to infer the HLA type of a sample.
usage: IRIS makesubsh_hla [-h] [--fastq-folder-dir FASTQ_FOLDER_DIR]
--data-name DATA_NAME -o OUTDIR --label-string
LABEL_STRING --task-dir TASK_DIR
required arguments:
--fastq-folder-dir FASTQ_FOLDER_DIR
Specify the path to the higher level of all folders
containing FASTQ files
--data-name DATA_NAME
Data set name used to name submission shell scripts.
-o OUTDIR, --outdir OUTDIR
Output directory for folders of seq2hla result
--label-string LABEL_STRING
String in the fastq file name between the reads pair
number and "fastq/fq". This is used to recognize
paired-end reads. e.g. For FASTQ_file_L1_R2.fastq.gz,
the label string is the "." between "2" and "fastq".
--task-dir TASK_DIR Directory to write individual task scripts
optional arguments:
-h, --help show this help message and exit
This module is a wrapper of prediction tools (IEDB) for predicting peptide-HLA binding. The prediction
and epitope_post
modules can generate scripts to run this module in parallel and summarize the result into one TCR target report.
usage: IRIS pep2epitope [-h] [-e EPITOPE_LEN_LIST] [-a HLA_ALLELE_LIST] -o
OUTDIR [--iedb-local IEDB_LOCAL]
[--ic50-cut-off IC50_CUT_OFF]
junction_pep_input
required arguments:
junction_pep_input Inputs junction peptides
-e EPITOPE_LEN_LIST, --epitope-len-list EPITOPE_LEN_LIST
Epitope length for prediction (Default is 9,10,11)
-a HLA_ALLELE_LIST, --hla-allele-list HLA_ALLELE_LIST
List of HLA types (Default is HLA-A*01:01,
HLA-B*08:01, HLA-C*07:01)
-o OUTDIR, --outdir OUTDIR
Define output directory of pep2epitope
--iedb-local IEDB_LOCAL
Specify local IEDB location (if installed)
--ic50-cut-off IC50_CUT_OFF
Cut-off based on median value of consensus-predicted
IC50 values (Default is 500)
optional arguments:
-h, --help show this help message and exit