TransFAIR

A ready-to-use tool (turnkey solution) for data integration for medical institutions. Instead of creating own ETL processes by hand, this tool facilitates certain data integration tasks like:

Extraction from source systems
Transformation into target schemata
Loading into target systems
Linkage of IDs / Pseudonymization
Filtering of datasets

TransFAIR allows low-effort, fully automatic data transfer among software systems and data structures used in network medical research in Germany, in particular:

Tumor Documentation Systems based on the ADT/GEKID dataset, as found in German Comprehensive Cancer Centers and connected via the German Cancer Consortium (DKTK), e.g. CREDOS, GTDS, Onkostar
The CentraXX biobanking solution often found in German biobanks, which are networked under the umbrella of the German Biobank Node
Data Integration Centers, as established by the Medical Informatics Initiative / Netzwerk Universitätsmedizin, based on the MII Core Dataset in FHIR
Bridgeheads as used in the above networks as well as the European Biobanking and BioMolecular Research Infrastructure (BBMRI-ERIC)

TransFAIR is designed to

minimize effort for personnel at the sites (since they no longer have to do the data integration themselves)
continuously update itself with new dataset/mapping definitions
thus accelerate and facilitate rollout of new features and dataset extensions
provide more consistent data quality (because as long as the source data is okay, errors within TransFAIR's mappings can be fixed centrally)

Quickstart (for Bridgehead sites)

If you are part of a German University Hospital with a Bridgehead (e.g. via BBMRI-ERIC, GBN, DKTK, CCP/C4 or nNGM), you already have TransFAIR as part of your Bridgehead, usually preconfigured with sane default values and mappings by the respective network. The most straightforward way to use it is to just activate it.

To do so, specify the required configuration (see Configuration) in a new environment file (e.g. my.transfair). Then, execute bridgehead transfair mytransfair and observe the output on the screen.

Configuration

TransFAIR is configured using environment variables:

Variable	Description	Default
`TF_FHIR_SERVER_SOURCE_ADDRESS`	HTTP Address of the `SOURCE` datastore	(required)
`TF_FHIR_SERVER_TARGET_ADDRESS`	HTTP Address of the `TARGET` datastore	(required)
`TF_FHIR_SERVER_(SOURCE/TARGET)_USERNAME`	Basic Auth User
`TF_FHIR_SERVER_(SOURCE/TARGET)_PASSWORD`	Basic Auth Password
`TF_PROFILE`	Identifier of the TransFAIR profile to execute (see Profiles)	(required)
`TF_RESOURCES_START`	(`Patient`/`Specimen`) Starts collection resources on the specified level.	`Patient`
`TF_RESOURCES_FILTER`	Set to export only the specified resources.	none, will export all ressources
`TF_RESOURCES_WHITELIST`	Transfers only resources according to the Filters.
`TF_RESOURCES_BLACKLIST`	ignores resources according to the Filters.
`TF_PSEUDONYMIZATION_ADDR`	HTTP Address pointing to a service to map `SOURCE` IDs to `TARGET` IDs (see Pseudonymization)	none, IDs will be unchanged

Profiles

As of now, TransFAIR supports the following transformation profiles:

FHIR2FHIR will transfer all ressources from SOURCE to TARGET unchanged. This can be used to perform filtering and/or pseudonymization across FHIR servers.
MII2BBMRI will read the MII Core Dataset from SOURCE (usually a FHIR server/fassade providing the MII Core Dataset) and transfer all data required by BBMRI-ERIC into TARGET (= BBMRI-ERIC Bridgehead)
BBMRI2MII will load biosample information from SOURCE (BBMRI-ERIC Bridgehead), transform into MII Core Dataset to TARGET (e.g. FHIR Store with MII Core Dataset)

Filters

TransFAIR supports many filters to customize the ETL process. Filters are coded with json. For example here we provide a filter that either bans or only transfers the ids.

{"patient": {
  "ids": ["1"]
  }
}

Pseudonymization

TransFAIR supports various ways to map patient/sample IDs between source and target stores, e.g. pseudonymization solutions (Mainzelliste, GPAS) or a plain mapping file in CSV format. Mapping works as follows:

Whenever TransFAIR encounters an ID from the SOURCE system, it will ask the service defined in PSEUDONYMIZATION_ADDR for the corresponding ID in the TARGET system (or vice-versa). We are currently defining a simple, implementation-independent API format in cooperation with pilot biobanks and will update this section once finished.

Outlook

We have created TransFAIR with the specific use-case of bringing German biobanks and data integration centers closer together. Perspectively, we intend TransFAIR to become a toolbox with easily reusable components for use with HL7 FHIR, OMOP and other well-known SQL, CSV and XML schemata.

License

Copyright 2021 - 2022 The Samply Community Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
.github		.github
docs/testing		docs/testing
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TransFAIR

Quickstart (for Bridgehead sites)

Configuration

Profiles

Filters

Pseudonymization

Outlook

License

About

Releases

Packages

Contributors 4

Languages

License

DavidCroftDKFZ/TransFAIR

Folders and files

Latest commit

History

Repository files navigation

TransFAIR

Quickstart (for Bridgehead sites)

Configuration

Profiles

Filters

Pseudonymization

Outlook

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages