A ready-to-use tool (turnkey solution) for data integration for medical institutions. Instead of creating own ETL processes by hand, this tool facilitates certain data integration tasks like:
- Extraction from source systems
- Transformation into target schemata
- Loading into target systems
- Linkage of IDs / Pseudonymization
- Filtering of datasets
TransFAIR allows low-effort, fully automatic data transfer among software systems and data structures used in network medical research in Germany, in particular:
- Tumor Documentation Systems based on the ADT/GEKID dataset, as found in German Comprehensive Cancer Centers and connected via the German Cancer Consortium (DKTK), e.g. CREDOS, GTDS, Onkostar
- The CentraXX biobanking solution often found in German biobanks, which are networked under the umbrella of the German Biobank Node
- Data Integration Centers, as established by the Medical Informatics Initiative / Netzwerk Universitätsmedizin, based on the MII Core Dataset in FHIR
- Bridgeheads as used in the above networks as well as the European Biobanking and BioMolecular Research Infrastructure (BBMRI-ERIC)
TransFAIR is designed to
- minimize effort for personnel at the sites (since they no longer have to do the data integration themselves)
- continuously update itself with new dataset/mapping definitions
- thus accelerate and facilitate rollout of new features and dataset extensions
- provide more consistent data quality (because as long as the source data is okay, errors within TransFAIR's mappings can be fixed centrally)
If you are part of a German University Hospital with a Bridgehead (e.g. via BBMRI-ERIC, GBN, DKTK, CCP/C4 or nNGM), you already have TransFAIR as part of your Bridgehead, usually preconfigured with sane default values and mappings by the respective network. The most straightforward way to use it is to just activate it.
To do so, specify the required configuration (see Configuration) in a new environment file (e.g. my.transfair
). Then, execute bridgehead transfair mytransfair
and observe the output on the screen.
TransFAIR is configured using environment variables:
Variable | Description | Default |
---|---|---|
TF_FHIR_SERVER_SOURCE_ADDRESS |
HTTP Address of the SOURCE datastore |
(required) |
TF_FHIR_SERVER_TARGET_ADDRESS |
HTTP Address of the TARGET datastore |
(required) |
TF_FHIR_SERVER_(SOURCE/TARGET)_USERNAME |
Basic Auth User | |
TF_FHIR_SERVER_(SOURCE/TARGET)_PASSWORD |
Basic Auth Password | |
TF_PROFILE |
Identifier of the TransFAIR profile to execute (see Profiles) | (required) |
TF_RESOURCES_START |
(Patient /Specimen ) Starts collection resources on the specified level. |
Patient |
TF_RESOURCES_FILTER |
Set to export only the specified resources. | none, will export all ressources |
TF_RESOURCES_WHITELIST |
Transfers only resources according to the Filters. | |
TF_RESOURCES_BLACKLIST |
ignores resources according to the Filters. | |
TF_PSEUDONYMIZATION_ADDR |
HTTP Address pointing to a service to map SOURCE IDs to TARGET IDs (see Pseudonymization) |
none, IDs will be unchanged |
As of now, TransFAIR supports the following transformation profiles:
FHIR2FHIR
will transfer all ressources fromSOURCE
toTARGET
unchanged. This can be used to perform filtering and/or pseudonymization across FHIR servers.MII2BBMRI
will read the MII Core Dataset fromSOURCE
(usually a FHIR server/fassade providing the MII Core Dataset) and transfer all data required by BBMRI-ERIC intoTARGET
(= BBMRI-ERIC Bridgehead)BBMRI2MII
will load biosample information fromSOURCE
(BBMRI-ERIC Bridgehead), transform into MII Core Dataset toTARGET
(e.g. FHIR Store with MII Core Dataset)
TransFAIR supports many filters to customize the ETL process. Filters are coded with json. For example here we provide a filter that either bans or only transfers the ids.
{"patient": {
"ids": ["1"]
}
}
TransFAIR supports various ways to map patient/sample IDs between source and target stores, e.g. pseudonymization solutions (Mainzelliste, GPAS) or a plain mapping file in CSV format. Mapping works as follows:
Whenever TransFAIR encounters an ID from the SOURCE
system, it will ask the service defined in PSEUDONYMIZATION_ADDR
for the corresponding ID in the TARGET
system (or vice-versa). We are currently defining a simple, implementation-independent API format in cooperation with pilot biobanks and will update this section once finished.
We have created TransFAIR with the specific use-case of bringing German biobanks and data integration centers closer together. Perspectively, we intend TransFAIR to become a toolbox with easily reusable components for use with HL7 FHIR, OMOP and other well-known SQL, CSV and XML schemata.
Copyright 2021 - 2022 The Samply Community Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.