- Code was tested on Ubuntu 20.04.3 LTS and Windows 10
- Matlab (tested with 2020a/b)
- COBRA toolbox 3.0
- RAVEN toolbox 2.0
- Gurobi solver (tested with version 9.1.1)
- Set up the COBRA toolbox following the installation instructions
- Set up the RAVEN toolbox following the installation instructions
- Set up the Gurobi solver and connect it with the COBRA toolbox using install instructions that can be found here
- S. cerevisiae:
correct_kcats_yeast
- E. coli:
correct_kcats_ecoli
get_rawecmod()
will generate among others raw enzyme constrained models (no kcat adaption, no manual modifications, no protein pool constraint) in the model folder of the respective GECKO path e.gGECKO_S_cerevisiae/model/ecYeast/rawYeast.mat
- call
get_GKOmod()
orget_GKOmod_ecoli()
to generate all condition specific adapted GECKO models using the experimentally determinded protein content, growth rate and uptake rates where available and also save them to themodel
folder .
- create ecModel using the GECKO toolbox
- create input data files
protAbcFile
This file contains the measured protein abundances in mmol/gDW. Give the protein UniProt IDs as row labels in the first column and the measurements of all conditions in the subsequent columns with condition IDs as column headers in the first row.
growthDataFile
The first column of this file contains the IDs of exchange reactions, the second one contains the name or note associated with the ID, and the subsequent columns contain the measued growth rates [h-1] and exchange fluxes [mmol gDW-1 h-1] for the same conditions as in protAbcFile
.
Example:
exchangeRxn | note | Condition_1 | Condition_2 | ... |
---|---|---|---|---|
growth | biomass | 0.1 | 0.15 | ... |
ex_glc | glucose | -10 | -12 | ... |
ex_CO2 | CO2 | 3 | 4 | ... |
... | ... | ... | ... | ... |
ptotFile
This is a file has two columns. The first one contains the condition IDs and the second one contains the total protein content in g gDW-1. The first row contains column header.
maxKcatFile
This file contains the reference kcat values and has five columns, which contain (1) EC number (2) substrate (3) lineage of the organism (4) kcat [s-1] (5) "*" (see also GECKO/databases/max_KCAT.txt).
mwFile (optional)
This file contains the molecular weights of all proteins in the model. If the file name is empty, the file does not exist or the dimensions of the protMW
model field and the enzymes
field do not match, the file will be generated using the UniProt API.
modelFile
Give here the path to the ecModel generated using GECKO (without pool constaint).
batchModelFile
Path to the "batch model", which contains the protein pool constraint.
- make adjustments to parameters and input file names in the configuration script
Parameter | Explanation |
---|---|
orgName | name of the organism |
orgBasename | model basename |
modelFile | file of the ecModel as Matlab workspace (.mat) |
cobraSolver | preferred solver for linear optimization problems (default: gurobi) |
runParallel | whether the correction should be run on multiple threads (default: true) |
ncpu | number of threads (default: 20) |
epsilon | upper limit for fold-change of kcat values |
lambda | weight for the minimization of absolute difference between measured and predicted growth rate(s) |
theta | upper limit for the difference between measured and predicted growth rate(s) |
GAM | value of growth-associated maintencance (put NaN if unknown, will be fitted using all provided conditions) |
f | mass fraction of all proteins accounted for by the model (see GECKO publication) |
f_n | mass fraction of unmeasured proteins in the ecModel (for inclusion of unmeasured proteins; put NaN if unknown, will be fitted for each condition separately) (see GECKO publication) |
sigma | average saturation of enzymes in the model (see GECKO publication); can be fitted using GECKO sigmaFitter) |
nIter | number of iterations for k-fold cross-validation |
geckoDir | path to the organism-specific GECKO directory |
- run updated configuration file (step 3)
- run
cvLambdaFitting
to estimate the optimal weighting parameter$\lambda$
[relErr,errVar,sumsDelta,objVal,avJD,corrKcatProts] = cvLambdaFitting(...
model,... % GECKO ecModel
expGrowth,... % experimental growth rates for all conditions
PTot,... % total protein contents for all conditions
E,... % enzyme abundance matrix (#model proteins x #conditions)
lambdaParams,... % array of lambda parameters to be explored
nutrExch,... % nutrient exchange rates
'kfold', kfold,... % (optional) number of folds for cross-validation
'nIter', nIter,... % (optional) number of iterations for k-fold cross-validation
'epsilon', epsilon,... % (optional) maximum allowed fold change of k<sub>cat</sub> values
'theta', theta,... % (optional) maximum allowed relative error
'runParallel', runParallel,... % (optional) whether to run the cross-validation on multiple workers
'enzMetPfx', enzMetPfx,... % (optional) prefix for protein metabolites (by default prot_ as added by GECKO)
'enzRxnPfx', enzRxnPfx,... % (optional) prefix for protein draw reactions (by default prot_ as added by GECKO)
'enzBlackList',enzBlackList % (optional) list of protein IDs that should be excluded from the correction
'K', K,... % (optional) maximum allowed k<sub>cat</sub> value after correction
'negCorrFlag', negCorrFlag % (optional) if true, a second step is added, which attempts to find negative corrections for k<sub>cat</sub> values
'GAM', GAM,... % (optional) growth associated maintenance
'f', f,... % (optional) f factor for protein pool (see GECKO paper or description above)
'sigma', sigma... % (optional) sigma factor for protein pool (see GECKO paper or description above)
);
- adjust ecModel to experimental conditions using
adjBaseModel
adj_models = adjBaseModel(...
model,... % GECKO ecModel
P,... % total protein contents for all conditions
nutrExch,... % nutrient exchange rates
GAM... % growth associated maintenance
);
- run
PRESTO
to obtain kcat corrections
[solution,corr_models,relError,changeTab,LP] = PRESTO(...
adj_models,... % enzyme-constraint metabolic model(s)
expGrowth,... % experimental growth rates for all conditions
E... % enzyme abundance matrix (#model proteins x #conditions)
'lambda', lambda % (optional) weighting parameter lambda
'epsilon', epsilon,... % (optional) maximum allowed fold change of k<sub>cat</sub> values
'theta', theta,... % (optional) maximum allowed relative error
'enzBlackList', enzBlackList % (optional) list of protein IDs that should be excluded from the correction
'enzMetPfx', enzMetPfx,... % (optional) prefix for protein metabolites (by default prot_ as added by GECKO)
'enzRxnPfx', enzRxnPfx,... % (optional) prefix for protein draw reactions (by default prot_ as added by GECKO)
'negCorrFlag', negCorrFlag % (optional) if true, a second step is added, which attempts to find negative corrections for k<sub>cat</sub> values
);
Wendering, P., Arend, M., Razaghi-Moghadam, Z. et al. Data integration across conditions improves turnover number estimates and metabolic predictions. Nat Commun 14, 1485 (2023). https://doi.org/10.1038/s41467-023-37151-2