Skip to content

Latest commit

 

History

History
52 lines (37 loc) · 4.29 KB

File metadata and controls

52 lines (37 loc) · 4.29 KB

Project 18: Expanding FAIR database integration through elucidation and transformation of underlying graph schemas.

Table of Contents

Abstract

The integration of life science data from different biomedical resources has been a major challenge attributed to fragmented data sources, the use of multiple data formats, and the existence of multiple ontologies for a single context among others. To address this problem, we launched the BioDataFuse (BDF) project, which employs a modular framework for integrating data from different sources into context-specific knowledge graphs. Through this project, we have currently been able to integrate and harmonise data from ten databases. However, the integration of such resources requires a detailed understanding of underlying graph schemas.

In this biohackathon, we would like to streamline the data integration process such that any FAIR-compliant biological database can be easily converted to a graph. This robust process would involve two steps: first, understanding of the underlying graph schemas of data resources using the RDF-config (https://github.com/dbcls/rdf-config/) and VoID generator (https://github.com/JervenBolleman/void-generator) and second, the conversion of graph data into multiple compatible formats for improving accessibility and usability using G2G Mapper (https://g2gml.readthedocs.io/), LinkML (https://linkml.io/) and BDF (https://github.com/BioDataFuse/pyBiodatafuse). Moreover, we would test the resilience of the process by demonstrating the ease-of-integration of multiple data sources within the RDF Portal (https://rdfportal.org) and beyond. Through this test, we would essentially attract database owners to include additional biomedical data sources in BDF, thus expanding the applicability of their resource beyond the “yet-another-resource” paradigm.

Project flash presentation

overview slide

Resources

  1. Project GitHub Repo.
  2. BioDataFuse Web Interface.
  3. BioDataFuse Python package.
  4. BioDataFuse Web Interface codes.
  5. Biohackarvix.
  6. Slack - This will be the main source of communication between in-person and virtual participants throughout the hackathon.

Working ethics

  • ⚖️ The use of GitHub issues and pull requests will be done to ensure the efficient working of multiple people on the GitHub repository.
  • 🚫 No commits to be made directly to the main branch of the GitHub repository.
  • ⚙️ Adding new Python functions should inherently involve writing subsequent unit test functions and documentation for the same.
  • 🤝 The main aim of the hackathon is collaboration, so please feel free to ask questions or provide feedback whenever in doubt. We believe that there are no dumb questions that exist.
  • 📆 To ensure good communication among the team members, we would have two daily stand-ups (pre and post-hacking) allowing all participants to provide a less than 1-minute update on work done and work in the pipeline.

Leads

Name Affiliation GitHub LinkedIn
Tooba Abbassi-Daloii Maastricht University, NL @tabbassidaloii Link
Yojana Gadiya Fraunhofer ITMP ScreeningPort, DE @YojanaGadiya Link

Members