Skip to content
Cristina Yenyxe Gonzalez Garcia edited this page Mar 31, 2014 · 28 revisions

Welcome to the OpenCGA wiki!

OpenCGA is an open-source project that aims to provide a Big Data storage engine and analysis framework for genomic scale data analysis of hundreds of terabytes or even petabytes. For users, its main features will include uploading and downloading files to a repository, storing their information in a generic way (non-dependant of the original file-format) and retrieving this information efficiently. For developers, it will be a platform for supporting the most used bioinformatics file formats and accelerating the development of visualization and analysis applications.

There already exist some platforms with the same objectives as OpenCGA. However, they are focused on representing strictly the information from the files they stored. OpenCGA, on the other side, not only includes this information, but its generic databases include extra fields of interest and allow to combine data from different studies seamlessly.

Plain access to the files stored in the system is simply not fast enough for giving a real-time, interactive user experience. For this reason, we are exploring and using the most modern advanced technologies from different fields:

  • Different NoSQL databases for storage. Users can choose which database fits bets its current infrastructure and data size
  • Apache Hadoop for big data processing and storage
  • High-performance Computing (HPC) for computation-intensive analysis
  • HTML5 and RESTful web services for information retrieval and data visualization

If you want to see some projects where OpenCGA has been successfully used, please visit the Use Cases section.

Platform Overview

Technical Documentation Overview

At this section you can find some useful links and information for researchers and software developers who are planning to deploy and/or integrating OpenCGA services with their software applications and tools. These are working documents:

  • Data models: For a description of data models for representing Variant and alignment data visit...
  • Architecture: For a description of the technologies and architecture of OpenCGA and some other implementation details visit the [architecture] (https://github.com/opencb/opencga/wiki/Architecture) section.
  • Storage implementation:
  • Download and install
  • Releases and Roadmap: Do you want to know what's coming next? Please visit our

##Getting Involved Examples: http://www.chromium.org/getting-involved

http://tomcat.apache.org/getinvolved.html

Clone this wiki locally