-
Notifications
You must be signed in to change notification settings - Fork 1
04 Statistical Inference
[Add Cover Figure (landscape format)]
Once the data has been processed, statistically significant differences between conditions can be determined. The next step of summary visualizations is useful to gain deeper insights into the set(s) of proteins, termini, or PTM sites within and across experimental conditions.
SQuAPP
employs the linear model from the limma
package1 for statistical testing, one of the most flexible and popular methods employed in proteomics. (However, more methods are under development to provide different options for users.) As of version 0.28, the reported p-values are adjusted according to the default Bonferroni-Hochberg (FDR) method. More multiple testing corrections methods are available through the p.adjust()
function from the stats
package, and they can be selected from the dropdown menu.
SQuAPP
implements the weighting option, originally implemented in limma
1 to control quality weights in the microarray data analysis2. SQuAPP
can place different weights on original/imputed values when the weighting option is selected with testing for differential abundance. This is particularly relevant in scenarios where a PTM or terminus is genuinely absent in one condition but not consistently detected in the other condition. By giving a low weight to imputed values, the measured values become the determining factor when calculating the mean and variance in conditions where many but not all samples have a measurement. At the same time, in conditions where a feature has not been measured, imputed values dominate regardless of their low weight.
After selecting the data level for statistical testing, you can select a grouping factor to apply the linear model with unique groups. SQuAPP
provides sliders to select adjusted p-value and fold-change thresholds. Choose weighting or blocking options to access use and the preferences for either selection. If the weighting option is activated, SQuAPP
asks you to select a weighting number to assign, so the effect of the imputed values is multiplied by that weight.
When all the configuration for testing is done, you can click the “Run Statistical Analysis” button to initiate the analysis, and produce plots and result datasets. The visualizations are on the top box, and the data tables are on the bottom. All the plots and data tables can be downloaded individually for reference.
SQuAPP
automatically saves and attaches the statistical result table to the original data level for the analysis. If there is a need to change or update your statistical test with a different configuration, regardless of changes in this section or previous sections, re-running the test by clicking the button will update the statistical result data.
Another essential step in a data analysis workflow is to create a list of interesting biological matches from the proteomic data. SQuAPP
employs the help of the gprofiler2
package4, which is an API for the web-based g:Profiler
tool4.
(Additional methods using other Enrichment Analysis software are under development to provide users with more alternatives.)
SQuAPP
requires statistical testing to be run for the selected data level to run the enrichment analysis. SQuAPP
allows you to configure the following when running the analysis:
- p-value threshold selection
- multiple testing correction method selection
- enrichment sources selection
- running significance groups as separate or together
- usage of identifier proteins as custom background
This section is still under development
- Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47.
- Ritchie ME, Diyagama D, Neilson J, van Laar R, Dobrovic A, Holloway A, et al. Empirical array quality weights in the analysis of microarray data. BMC Bioinformatics. 2006;7:261.
- Kolberg L, Raudvere U, Kuzmin I, Vilo J, Peterson H. gprofiler2 -- an R package for gene list functional enrichment analysis and namespace conversion toolset g:Profiler. F1000Res. 2020;9.
- Raudvere U, Kolberg L, Kuzmin I, Arak T, Adler P, Peterson H, et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 2019;47:W191–8.