IgemRNA is transcriptome analysis MATLAB software with a graphical user interface (GUI) designed for the analysis of transcriptome data directly or the analysis of context-specific models generated from the provided model reconstruction, transciptome and optional medium composition data files. IgemRNA facilitates some of the Cobra Toolbox 3.0 constraint-based modelling functionalities for context-specific model generation and performing optimisation methods like FBA or FVA on them. Furthermore, IgemRNA can be used to validate transcriptomics data taking into account the interconnectivity of biochemical networks, steady state assumptions and Gene-Protein-Reaction (GPR) relationship. The result context-specific models can then be further analysed and compared to other phenotypes. The main advantage of IgemRNA software is that no previos programming skills are required due to the integrated user-friendly GUI which allows to select data files and data processing options in the IgemRNA form (see fig. 1).
fig. 1 - Full IgemRNA form
The IgemRNA tool initially consists of 2 root folders (Data, Scripts), additional result folders are created when specific analysis tasks have been performed (Results non-optimization, Results post-optimization) and an IgemRNA.m file which calls the user graphical interface form. Data folder stores all input data files used for this tutorial and Scripts folder consists of all the script files that are being executed by the IgemRNA software according user-selected options in the IgemRNA form as well as the test cases provided in this demonstration. The Results non-optimization (results of direct transcriptome analysis) and Results post-optimization (results of context-specific model analysis) are folders where all the result files are saved.
In order to run IgemRNA the user must first have the MATLAB environment installed and started as well as have the IgemRNA software downloaded and the files extracted. Then the user can navigate to the root folder of IgemRNA in MATLAB where the IgemRNA.m file is located and run it.
To access all options in the IgemRNA form, the user must supply input data files. This can be done by pressing the Open button in the corresponding file upload row and selecting the files via File Explorer. Transcriptomics data file (see fig. 2B) is required to run non-optimization tasks but an additional model reconstruction file is required to access the post-optimization tasks. Medium composition data file (see fig. 2A) is optional, the selection of this data file does not extend the form, but specifies the provided exchange reaction constraints (upper and lower bounds) in the model for post-optimization tasks analysis.
Transcriptomics data and medium composition data can be provided in .xls or .xlsx formats, where columns are named respectively (see fig. 2) and sheet names correspond to dataset and phenotype names (dataSetName_phenotypeName). The model reconstruction file can be provided in .xls, .xml or other formats supported by Cobra Toolbox 3.0. In case the transcriptomics data consists of multiple datasets, the genes listed in each dataset must be in the same order!
fig. 2 - Input data file structure; A - Medium data file structure; B - Transcriptomics data file structure
Data files used for this tutorial can be found in the Data folder:
- MediumData.xlsx (medium composition data)
- Yeast_8_4_0.xls (the yeast consensus genome-scale model reconstruction)
- TranscriptomicsData.xlsx (RNA-seq measurements), source here.
Non-optimization tasks include several transcriptomics data analysis tasks:
- filter highly and lowly expressed genes,
- filter lowly expressed genes,
- filter up/down regulated genes between different phenotypes or data sets.
The results of each task are stored in a different folder within the Results non-optimization folder: Gene expression level comparison, Highly-lowly expressed genes, Lowly expressed genes (see fig. 3).
fig. 3 - Non-optimization results folder
Non-optimization task Filter highly and lowly expressed genes generates result Excel files for each provided transcriptomics dataset. File names are assigned based on the provided dataset and phenotype names (from transcriptomics data), the applied thresholding approach (GT1, LT1, LT2, more on thresholding: ThresholdingGeneMappingOptions.docx) and provided global thresholds values (see fig. 4). Each result Excel file contains one sheet with the list of genes provided by transcriptomics data and 4 columns: GeneId, Data (expression value), ExpressionLevel and ThApplied. The ExpressionLevel column contains the expression levels determined based on the selected thresholding approach, provided global and for thresholding approaches LT1 and LT2 calculated local thresholds. Column ThApplied displays whether a local or a global threshold for a specific gene was applied (see fig. 5).
fig. 4 - Filter highly and lowly expressed genes folder | fig. 5 - Filter highly and lowly expressed genes result file |
There are two ways to perform this test case:
1. Using GUI - upload transciptomics data file, select a thresholding approach, input threshold value/s, select non-optimization tasks option Filter highly and lowly expressed genes and press OK.
2. Run test case files from the Scripts folder of IgemRNA tool:
- TestCase_findHighlyLowlyExpressedGenesGT1.m;
- TestCase_findHighlyLowlyExpressedGenesLT1.m;
- TestCase_findHighlyLowlyExpressedGenesLT2.m.
Non-optimization task Filter lowly expressed genes generates separate Excel result files for each dataset provided in transcriptomics data file. These result files contain filtered gene lists including genes with expression value below the given threshold value/s based on the selected thresholding approach. File names include dataset and phenotype names (from transcriptomics data file), the applied thresholding approach (GT1, LT1, LT2, more on thresholding: ThresholdingGeneMappingOptions.docx) and provided global threshold value/s (see fig. 6). Each result file consists of 4 columns GeneId, Data (expression value), ExpressionLevel (in this case, only Low) and ThApplied to show whether a global or a local threshold was applied for a specific gene. Only those genes with expression values below the given threshold (depending on which thresholding approach is applied) are listed in the result files (see fig. 7).
fig. 6 - Filter lowly expressed genes folder | fig. 7 - Filter lowly expressed genes result file |
There are two ways to perform this test case:
1. Using GUI - upload transciptomics data file, select a thresholding approach, input threshold value/s, select non-optimization tasks option Filter lowly expressed genes and press OK.
2. Run test case files from the Scripts folder of IgemRNA tool:
- TestCase_findGenesBelowThresholdGT1.m;
- TestCase_findGenesBelowThresholdLocal1.m;
- TestCase_findGenesBelowThresholdLocal2.m.
Non-optimization task Filter up/down regulated genes between phenotypes generates result Excel files in the Gene expression level comparison folder. Result file name contains dataset and phenotype names for both transcriptomics datasets that have been compared (see fig. 8). Result Excel data files contain a full gene list from the target dataset, expression values for both target and source datasets as well as the determined up/down regulation status (see fig. 9).
fig. 8 - Filter up/down regulated genes between phenotypes folder | fig. 9 - Filter up/down regulated genes between phenotypes result file |
There are two ways to perform this test case:
1. Using GUI - upload transciptomics data file with multiple datasets, select a thresholding approach, input threshold value/s, select non-optimization tasks option Filter up/down regulated genes between phenotypes, choose phenotypes to compare, and press OK.
2. Run test case file from the Scripts folder of IgemRNA tool: TestCase_findUpDownRegulatedGenes.m.
Context-specific models generated by IgemRNA post-optimization tasks as well as the results of the analysis performed on these models are saved in the Results post-optimization folder (see fig. 10). The results of post-optimization tasks are saved in folders with corresponding names: Flux-shifts, Non-flux reactions and Rate limiting reactions.
fig. 10 - Post-optimization results folder
There are two ways to generate models used in this tutorial:
1. Using GUI - upload transciptomics data file, model reconstrucion file and optionally a medium composition file, select a thresholding approach, input threshold value/s, select gene mapping approach and constraining options, choose one or more post-optimization tasks options and press OK.
2. Run test case file from the Scripts folder of IgemRNA tool: TestCase_createContextSpecificModel.m
Since model generation and optimisation takes some time, especially for multiple transcriptomics datasets, context-specific model files used for this demonstration have already been provided in the Results post-optimization/Context-specific models folder.
Post-optimization task Filter non-flux reactions performs an analysis on created context-specific models of the same phenotype, the name of the phenotype is included in the result file name (see fig. 11). Each result Excel file contains a list of reactions that carry no flux in the result context-specific model (lower and upper bound equals 0). An additional sheet for all the common non-flux reactions of the same phenotype is also provided in the sheet Common_(phenotypeName) (see fig. 12).
fig. 11 - Filter non-flux reactions folder | fig. 12 - Filter non-flux reactions result file |
There are two ways to perform this test case:
1. Using GUI - upload transciptomics data file, model reconstrucion file and optionally a medium composition file, select a thresholding approach, input threshold value/s, select gene mapping approach and constraining options and choose post-optimization task Filter non-flux reactions options. If context-specific models have already been generated, choose the Use existing context-specific models option. Press OK. More on thresholding and gene mapping: ThresholdingGeneMappingOptions.docx.
2. Run test case file from the Scripts folder of IgemRNA tool: TestCase_filterNonFluxReactions.m
Post-optimization task Filter rate limiting reactions performs analysis on the generated context-specific models of the same phenotype, the phenotype name is included in the result files (see fig. 13).
fig. 13 - Filter rate limiting reactions folder
An analysis of these context-specific models have been performed in order to filter reactions that have reached the maximal flux value (MaxFlux calculated by FVA) based on the upper bound constraint set according to transcriptomics data and GPR associations. An additional sheet Common_(phenotypeName) for common rate limiting reactions has also been added to the result file where rate limiting reactions that are present in all context-specific models of the same phenotype (see fig. 14).
fig. 14 - Filter rate limiting reactions result file
There are two ways to perform this test case:
1. Using GUI - upload transciptomics data file, model reconstrucion file and optionally a medium composition file, select a thresholding approach, input threshold value/s, select gene mapping approach and constraining options and choose post-optimization task Filter rate limiting reactions options. If context-specific models have already been generated, choose the Use existing context-specific models option. Press OK. More on thresholding and gene mapping: ThresholdingGeneMappingOptions.docx.
2. Run test case file from the Scripts folder of IgemRNA tool: TestCase_filterRateLimittingReactions.m.
Post-optimization task Calculate flux shifts between phenotypes compares flux values (calculated by FVA on the context-specific models) between two different phenotypes. In this demonstration flux shifts analysis task was performed on the S47D phenotype datasets using GT1 thresholding approach with the lower global threshold value of 0, phenotype SRR8994358_WT was compared to the wild type dataset SRR8994357_WT of the same thresholding approach and threshold values (see fig. 15).
fig. 15 - Calculate flux shifts between phenotypes folder
Each result file contains a full reaction list that corresponds to the Reaction List sheet in the provided model reconstruction file as well as additional columns for the calculation results: MinFlux and MaxFlux values (phenotype that is compared, fig. 16 L, M columns), MinFlux/MaxFlux(dataset name)_(phenotype name) phenotype that is used for comparison (fig. 16 N, O columns) and the MinFlux/MaxFluxRatio between these two phenotypes (fig. 16. P, Q columns).
fig. 16 - Calculate flux shifts between phenotypes result file
There are two ways to perform this test case:
1. Using GUI - upload transciptomics data file, model reconstrucion file and optionally a medium composition file, select a thresholding approach, input threshold value/s, select gene mapping approach and constraining options and choose post-optimization task Calculate flux shifts between phenotypes options. If context-specific models have already been generated, choose the Use existing context-specific models option. Press OK. More on thresholding and gene mapping: ThresholdingGeneMappingOptions.docx.
2. Run test case file from the Scripts folder of IgemRNA tool: TestCase_calculateFluxShifts.m.
[1] Kristina Grausa, Ivars Mozga, Karlis Pleiko, Agris Pentjuss, Integrative Gene Expression and Metabolic Analysis tool IgemRNA, https://doi.org/10.1101/2021.08.02.454732