Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulating a quantified phosphoproteome for software benchmarking and algorithm development #8

Open
vtsiamis88 opened this issue Sep 20, 2019 · 0 comments

Comments

@vtsiamis88
Copy link

Abstract

Signal transduction relies on a tightly time-controlled combination of phosphorylation/dephosphorylation events that are difficult to capture and integrate. Their large-scale characterization using bottom-up mass spectrometry necessitates phospho-peptide enrichment prior analysis and presents specific analytical challenges such as increased search space, need for confident modification localization, and extrapolation of proteoform quantitative behavior from a single peptide. Most of these studies provide low protein peptide coverages and thus require statistical sound methods to estimate quantitative changes at proteoform-level. This consists in translating quantitative changes of (phosphorylated) peptides into changes of both the protein and its phosphorylated isoforms, and calculate their relative stoichiometry when modified and unmodified versions of the same peptide are available. To our knowledge there are no suitable data sets that simulate phospho-regulations at the proteoform level, which prevents benchmarking of available computational methods on the basis of real ground truth. In this project, we will build an artificial quantitative phosphoproteomics data set simulating the influence of digestion, sample enrichment, spectra quality, wrong identifications and localizations, as well as technical and biological variance, that can be used for benchmarking of phosphoproteomics (and other PTMomics) data analysis algorithms.

Work plan

Main tasks

  • Implementation: Develop a computational tool for generating a “perfect” in silico data set from a FASTA file to a peptide-spectrum match (PSM) table with simulated MS intensities.

  • Parametrization: Determine the different parameters that will be implemented in the tool: list of “regulated” sites, peptidases, digestion efficiency, enrichment efficiency, technical/biological variance, detection threshold, … This includes defining their range and their error.

  • Community engagement: Develop a web interface that provides a simulated phosphoproteomics data set with parameters defined by the user.

  • Assessment: Collect several data sets that resemble common PTMomics experiments to be used as comparison to define the range of input parameters and test the quality of the simulated data.

These tasks will be discussed on the first day prior to their implementation. Depending on the skills and interest of the participants, we may define working groups for addressing them in the following days.

Preliminary time plan

Tuesday afternoon
Presentation of problem : Short presentation of the project.
Implementation scheme : Create modular mock-up of the processes that will be used to create the simulated data.

Wednesday
Implementation of different modules: Depending on the number of participants, we will form subgroups that will work on implementing modules that simulate:

  • Digestion: digest a FASTA file to a representation of (modified) peptides.

  • Identification: identification scores and false positives.

  • Quantification: measured peptide intensities.

Thursday

  • Integration of the different modules into one software.

  • Testing and comparison of the main features of the simulated data to experimental data.

  • If time permits, create prototype for web service that creates parametrized simulated data.

Expected results

At the end of the developer’s meeting, we expect to have a tool for generating a simulated PSM table with quantitative MS data containing modified and non-modified peptides corresponding to artificially regulated phospho-proteins. Depending on the number of participants and our progress, we can also expect to have a basic web interface, and to integrate simple parameters such as which protease(s) to use, digestion efficiency, …

Follow up

After the developer’s meeting, we expect to use the simulated data in ongoing and future projects and hope that they also will be used for benchmarking by bioinformaticians working with PTMomics data.

Technical details

  • The programming language(s) that will be used: Not all the tasks of this project involve programming. For the ones that do, we recommend R or Python as this project requires operating on quantitative data that is not too big for these languages.

  • Existing software that will be featured: None

  • (Public) datasets that will be used and their availability
    Here, we provide an example of publications and associated data that will be used for the project:

    Phosphoproteomics data set with label-free quantification:
    Sharma, K., D’Souza, R. C. J., Tyanova, S., Schaab, C., Wiśniewski, J. R., Cox, J., & Mann, M. (2014). Ultradeep Human Phosphoproteome Reveals a Distinct Regulatory Nature of Tyr and Ser/Thr-Based Signaling. Cell Reports, 8(5), 1583–1594. https://doi.org/10.1016/J.CELREP.2014.07.036 (PRIDE: PXD000612)

    Phosphoproteomics data set with TMT quantification:
    Brubaker, D. K., Paulo, J. A., Sheth, S., Poulin, E. J., Popow, O., Joughin, B. A., … Haigis, K. M. (2019). Proteogenomic Network Analysis of Context-Specific KRAS Signaling in Mouse-to-Human Cross-Species Translation. Cell Systems. https://doi.org/10.1016/J.CELS.2019.07.006 (PRIDE: PXD013922)

    Example of AP-MS experiments with phospho- and non-phosphorylated peptides from co- immunoprecipitated proteins:
    Reginald, K., Chaoui, K., Roncagalli, R., Beau, M., Goncalves Menoita, M., Monsarrat, B., … Malissen, B. (2015). Revisiting the Timing of Action of the PAG Adaptor Using Quantitative Proteomics Analysis of Primary T Cells. Journal of Immunology (Baltimore, Md. : 1950), 195(11), 5472–5481. https://doi.org/10.4049/jimmunol.1501300

Contact information

Marie Locard-Paulet
Novo Nordisk Foundation Center for Protein Research
Blegdamsvej 3
2200 København N / Denmark
marie.locard-paulet@cpr.ku.dk

Veit Schwämmle
Protein Research Group
Department for Biochemistry and Molecular Biology
University of Southern Denmark
Campusvej 55
5230 Odense M / Denmark
veits@bmb.sdu.dk

Vasileios Tsiamis
Protein Research Group
Department for Biochemistry and Molecular Biology
University of Southern Denmark
Campusvej 55
5230 Odense M / Denmark
vasileios@bmb.sdu.dk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant