Integration, cataloguing and management of biobanking and clinical data using FAIR Genomes Metadata Schema.

Welcome to the repository that presents our work at the Bank of Biological Material of Masaryk Memorial Cancer Institute. Our team has developed an advanced, semi-automated data pipeline designed to streamline and integrate data from various sources associated with the biobank and upload metadata do the data.bbmri.cz catalogue.

Project Description

The goal of this project is to create a robust data pipeline capable of handling multiple data inputs from a range of sources, including:

Hospital Information System Exports: structured clinical and personal patient data extracted directly from hospital systems.
Biobank Data: information on biological samples stored in the biobank.
Associated Data Types: includes sequencing data, radiological images, and histopathological reports.

Key Features of the Pipeline

Data Integration: Combines diverse data types into a unified framework.
Data Cleaning and Structuring: Ensures data is thoroughly cleaned and organized into a sustainable format that facilitates analysis and research.
Secure Storage: Stores processed data securely, complying with relevant data protection standards using SensitiveCloud service provided by CERIT-SC.
Metadata Extraction: Extracts key metadata during processing and populate FAIR Genomes Metadata schema.
Metadata Publication: Publishes the extracted metadata to data.bbmri.cz, ensuring accessibility and transparency.

This pipeline enhances data handling efficiency within the biobank environment and supports further research by providing well-structured, reliable data sets.

Code - FAIRification pipeline

Code of this project is separeted into three main parts (repositories):

Pseudonymiser

This repository takes care of pseudonymisation and collecting clinical information from the BBM export.

Organiser

This repository cleans up the genomic data and organises them for later use. Also it collects structured genomic data and metadata.

Uploader

This repository uploads collected metadata to the metadata catalogue data.bbmri.cz.

Data Retrieval Application

The additional (forth) Data Retrieval repository provides tools for retrieving data from the catalog or secure storage, enabling the re-identification of original identifiers using pseudonyms and facilitating the retrieval of pseudonymized data in de-pseudonymised form from protected storage.

This repository includes an application developed for the retrieval of data, available (requiring MMCI VPN connection) at sequencing.int.mou.cz. The application offers two main functionalities:

Pathology Data Retrieval
Accessible at sequencing.int.mou.cz/pathology-data-retrieval, this tool is designed for the pathology department. Pathologists can input a predictive identifier that the application automatically converts into a pseudonym and vice versa. This feature enables the department to seamlessly retrieve pseudonymized data in its original, non-pseudonymized form.
Sample Verification for Biobank Staff
The second feature available at sequencing.int.mou.cz/bbm-sequencing-upload supports biobank staff by allowing them to upload an Excel export containing sample information. The application checks each sample and appends information directly into the Excel file, indicating whether sequencing data is available for each sample.

Links for Further Documentation

Internal Documentation

https://gitlab.int.mou.cz/bbmri/sequencingdata - requiring MMCI VPN connection - contains documentation for servers BBMRI-ANON1 and BBMRI-KATALOG and its workflow
https://gitlab.ics.muni.cz/groups/bbmri.cz/-/wikis/MMCI/Anon-server - requiring autentication and granted access - contains documentation for BBMRI-ANON1 and SensitiveCloud connection

Documents

This repository stores supplementary files that are intended to be published together with the paper "Integration, Cataloguing and Management of Biobanking and Clinical Data Using FAIR Genomes Metadata Schema". Those files are located in "documents" folder.

Supplementary File 1 & 2 (folder mindmaps_of_sequencer_outputs): contains schemas created when mapping files inside sequencer's output to distinguish important files from those which can be removed.
Supplementary File 3, 4, 5 (folder metadata_catalogue_SW_review): contains files completed during the process of reviewing multiple cataloguing softwares when deciding which software to use in the project.
Supplementary File 6 (supplementary_file_6_schema_mapping.xlsx): Excel file containing one sheet per each module within FAIR Genomes Metadata schema with all parameters listed in this module. Every single parameter is then mapped to MMCI values with the exact location od this value within MMCI data sources.
Supplementary File 7 (supplementary_file_7_NGS_pipeline_flowchart.png): working version of flow chart scatched before the actual implementation of designed sequencing pipeline at MMCI.
Supplementary File 8 (supplementary_file_8_FAIR_evaluation.xlsx): Excel file containing the results of final evaluation of reached FAIRness using the FAIR Data Maturity Model.

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
documents		documents
.gitignore		.gitignore
LICENSE		LICENSE
NGS-files-data-cleaning.md		NGS-files-data-cleaning.md
NGS-files-storage-structure.md		NGS-files-storage-structure.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Integration, cataloguing and management of biobanking and clinical data using FAIR Genomes Metadata Schema.

Project Description

Key Features of the Pipeline

Code - FAIRification pipeline

Data Retrieval Application

Links for Further Documentation

Documents

About

Releases 1

Packages

License

BBMRI-cz/NGS-data-FAIRification

Folders and files

Latest commit

History

Repository files navigation

Integration, cataloguing and management of biobanking and clinical data using FAIR Genomes Metadata Schema.

Project Description

Key Features of the Pipeline

Code - FAIRification pipeline

Data Retrieval Application

Links for Further Documentation

Documents

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Packages