Skip to content

Modelling the relationship between missingness and intensity in label-free shotgun proteomics data

License

Notifications You must be signed in to change notification settings

Mengbo-Li/protDP

Repository files navigation

R-CMD-check Codecov

Missing values are informative in label-free shotgun proteomics data

The relationship between missingness and intensity is evaluated in label-free shotgun proteomics data. Using a series of publicly available datasets, we first empirically investigate the missingness-intensity relationship with observed data. We also propose a model for detection probability which describes the probability of an observation being detected given its underlying intensity for label-free shotgun proteomics data.

Reference

Li, M. and Smyth, G.K., 2023. Neither random nor censored: estimating intensity-dependent probabilities for missing values in label-free proteomics. Bioinformatics, 39(5), btad200.

Installation

To install the package, use the following script in R:

# install.packages("devtools")
devtools::install_github("Mengbo-Li/protDP")

Quick start

If you are interested in investigating the relationship between intensity and detection/missing values on your own proteomics dataset, then you can try the following as a quick start:

dpcfit <- dpc(dat)
plotDPC(dpcfit)

where dat is the log2-intensity matrix, with rows being precursors/proteins and columns being the samples, and dat contains some missing values as NA.

The plotDPC() function then visualises the detection probability curve (DPC) from which you can inspect whether missingness is dependent on the underlying intensity values on your dataset. This tells you whether missingness is missing not at random (MNAR).

More examples

See data examples at https://mengbo-li.github.io/protDP/articles/protDP.html.

Contact

Open an issue should there be any questions.

About

Modelling the relationship between missingness and intensity in label-free shotgun proteomics data

Resources

License

Stars

Watchers

Forks