Skip to content

GEO analysis with Shiny

Garrett Dancik edited this page Mar 22, 2015 · 9 revisions

GEO Analysis with Shiny

Summary: Developing a web tool using Shiny to analyze gene expression data from the Gene Expression Omnibus (GEO).

Description: The Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) is a public repository of gene expression data. Although GEO has its own tool, called GEO2R, for simple data analysis, statistical and bioinformatics expertise is required for more comprehensive analyses. For example, it is not straightforward to determine whether a single gene is differentially expressed across two groups. This project will involve the development of a web tool, using the R web framework Shiny, to provide users with an interface for analyzing GEO datasets.

Related work: From within R, users can download GEO datasets and extract relevant information using the 'GEOquery' library (http://bioconductor.org/packages/release/bioc/html/GEOquery.html). Outside of R, additional tools are available. In particular, Oncomine (https://www.oncomine.org/) contains many gene expression datasets and options for gene expression analysis, but is limited to cancer datasets and analysis of standard groups (e.g., tumor vs. normal), requires a paid subscription for more complex analyses (such as survival analyses), and does not allow filtering of samples. Galaxy (http://galaxyproject.org/) is a web-based platform for biomedical research but does not contain a pipeline to GEO and therefore analysis of GEO datasets would not be straightforward.

Question from Toby Dylan Hocking, 10 March 2015 Interesting project idea, but isn't there much more related work? What other web applications are available? Can you discuss why Galaxy in particular isn't good enough?

Reply from Garrett Dancik, 21 March 2015 Toby, thank you for your question. I have added additional Related Work in response.

Potential tasks (to be implemented within a web interface using Shiny):

  • Download a GEO series selected by the user, extract gene expression and phenotypic information
  • Pull out the gene expression values for a desired gene
  • Assign individuals to two (or more) groups to determine whether the gene is differentially expressed
  • Visualize expression using boxplots
  • Implement additional statistical analyses

Skills required: literate programming experiences, so decent R experience is needed. Knowledge of Shiny and/or gene expression data is not required, but is recommended.

Test: Using the GEOquery library and the getGEO function, download the series 'GSE13' from within R and extract the expression data and the phenotypic data using the appropriate GEOquery functions.

Mentor: Garrett Dancik ([@](mailto:dancikg {at} easternct {dot} edu)) and Yuanbin Ru([@](mailto:ruyuanbin {at} gmail {dot} com))

Test Solution: A test solution is provided by Jasmine Dumas https://github.com/jasdumas/GEO-AWS_Test