The Innovation Center for Biomedical Informatics (ICBI) at Georgetown University was launched in 2012 as an academic hub for innovative research in the field of data science and biomedical informatics, with the goal of enabling individualized approaches to healthcare through data science and informatics approaches. ICBI’s mission is to enhance clinical and translational research at Georgetown University Medical Center as well as its partners and, in turn, attract and educate the next generation of scientists and physicians for whom biomedical informatics and health data science will be an integral part of both biomedical research and clinical practice.
The research information technology group at ICBI develops innovative scientific software to enable translational research. Our projects include muti-omics data analysis, vaccine safety research, clinical data analysis, high definition data visualization, natural language processing, and mobile application development.
As of 2013, we have a total of 78 github repositories (private and public)
- G-DOC .
Our Flagship precision medicine platform that enables the integrative analysis of multiple data types to understand disease mechanisms- To access our platform, click here: https://gdoc.georgetown.edu/gdoc
- Publications: Bhuvaneshwar et al (2016), Madhavan et al (2011)
- Github pages: https://github.com/ICBI/gdoc
- G-DOC Tutorials and webinar recordings are available here: https://gdoc.georgetown.edu/tutorials
- Team: @subhamadhavan et al
- CINdex .
A Bioconductor Package for Analysis of Chromosome Instability in DNA Copy Number Data- Package: http://bioconductor.org/packages/CINdex/
- Publication: Song et al (2017)
- Team: @leisong483, @KrithikaB472, @subhamadhavan, @yugusev
- viGEN .
An Open Source Pipeline for the Detection and Quantification of Viral RNA in Human Tumors- Link to code in github: https://github.com/ICBI/viGEN
- Publication: Bhuvaneshwar et al (2018)
- R package: in preparation
- Team: @KrithikaB472,@leisong483, @subhamadhavan, @yugusev
- Multi-Med .
A Bioconductor package for Testing multiple biological mediators simultaneously- Package: http://bioconductor.org/packages/MultiMed/
- Team: @SiminaB et al.
- Fdr-regression .
A github repository that contains code for 'A direct approach to estimating false discovery rates conditional on covariates'- Link to code in github:https://github.com/SiminaB/Fdr-regression
- Link to paper in Biorxiv https://doi.org/10.1101/035675
- Team: @SiminaB and collaborator @JTleek
- Mentoring .
A Github repo that offers various tips and tools for students- By @SiminaB
- DMD-metabolomics .
A github repo that contains code for analysis of metabolomics data for DMD natural history study- Link to the code in github: https://github.com/SiminaB/DMD-metabolomics
- Publication: Boca et al (2016)
- Team: @SiminaB et al
- MVMA .
A github repository that contains code figures, and tables for paper "Multivariate meta-analysis with an increasing number of parameters"- Publication: Boca et al (2017)
- Team: @SiminaB et al
- MACE2K .
Molecular And Clinical Extraction: A Natural Language Processing Tool for Personalized Medicine. As part of NIH’s BD2K (“Big Data to Knowledge”) program, we received a U01 grant for the development of “MACE2K” – Molecular and Clinical Extraction to Knowledge for Precision Medicine. MACE2K is a software tool to automatically extract information and visualize it in a value added manner to can help clinicians and clinical researchers assess the overall evidence associated with biomarkers that predict response to cancer therapies- Link to website: http://mace2k.org/
- Publication: In preparation
- Team: @pmcgarvey, @shrutir, @subhamadhavan
- The REMBRANDT dataset .
The REMBRANDT study, a large collection of genomic data from brain cancer patients. It is accessible for conducting clinical translational research using the open access Georgetown Database of Cancer (G-DOC) platform. In addition, the raw and processed genomics and transcriptomics data have also been made available via the public NCBI GEO repository as a super series GSE108476. Such combined datasets would provide researchers with a unique opportunity to conduct integrative analysis of gene expression and copy number changes in patients alongside clinical outcomes (overall survival) using this large brain cancer study- To access the dataset in the G-DOC platform: https://gdoc.georgetown.edu/gdoc
- Raw data: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE108476
- MRI images: The Cancer Imaging Archive (TCIA)
- Publication: Gusev et al (2018)
- Team: @yugusev, @KrithikaB472,@subhamadhavan
- snp2sim .
A github repository that contains a workflow for Molecular Simulation of Somatic Variation- Link to github repository: https://github.com/mccoymd/snp2sim
- Team: @mccoymd et al.
- Publication: In preparation
- CPTAC Data Portal .
The CPTAC Data Portal is a centralized repository for the public dissemination of proteomic sequence datasets collected by The Clinical Proteomic Tumor Analysis Consortium (CPTAC), along with corresponding genomic sequence datasets- Link to the portal: https://proteomics.cancer.gov/data-portal
- Publication: Edwards et al (2015)
- Team: @pmcgarvey et al
- Uniprot .
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature- Link to the portal: www.uniprot.org
- Viral reference proteomes: Viral reference proteomes
- UniRef database: https://www.uniprot.org/uniref/
- ID mapping service: https://www.uniprot.org/uploadlists/
- Team: @pmcgarvey et al
- Publications: UniProt: the universal protein knowledgebase (2017), Suzek et al (2015)
- CDGnet .
CDGnet is a tool for prioritizing targeted therapies based on an individual’s tumor profile. It incorporates information from biological networks relevant to the cancer type and to the specific alterations, FDA-approved targeted cancer therapies and indications, additional gene-drug information, and data on whether given genes are oncogenes. - POPSTR .
Inference of admixed population structure based on single nucleotide polymorphisms and copy number variations- Link to software: https://sites.google.com/a/georgetown.edu/jaeil/popstr
- Publication: Ahn et al (2018)
- Team: Jaeil Ahn, Brian Conkright, @SiminaB, @subhamadhavan
ICBI is an academic hub for Data Science with the primary mission of conducting investigator initiated research in data science & Informatics, education and training. We use a variety of clinical and research data types, methods, tools, and resources and work with multiple Principal Investigators (PIs). We highlight some of our collaborators below:
- The Griffith Lab has developed CIViC for crowdsourcing of gene/variant evidence curation and is working with us on the Clingen project on somatic mutations
- Héctor Corrada Bravo is Associate Professor, at the Center for Bioinformatics and Computational Biology at Univ. of Maryland. Our team led by @SiminaB collaborates with his lab on the CDGnet project.
- Cathy Wu is Professor and Director, Center for Bioinformatics & Computational Biology (CBCB) at University of Delaware. Our team led by @pmcgarvey collaborates with Cathy's team on the MACE2K project including the eGARD natural language processing (NLP) tool.
Visit our website: https://icbi.georgetown.edu
Our github page: https://github.com/ICBI .
Connect with us on Twitter: @ICBI_Georgetown .