Electronic Health Record (EHR) data must be tested for data quality when being shared for research. Data quality is typically measured in three categories: Conformance, Completeness, and Plausibility (Kahn et al., 2016 eGEMS). Many CTSA institutions have harmonized their EHR data to the Observational Medical Outcomes Partnership (OMOP) data model, yet no publicly available tool with a standard operating procedure (SOP) exists to easily assess and visualize data quality tests, particularly across institutions. This project will launch a publically available data quality testing tool and SOP, configurable to any database environment for N OMOP datasets.
EHR data must be tested for data quality when being shared for research. Data quality is typically measured in three categories: Conformance, Completeness, and Plausibility (Kahn et al., 2016 eGEMS). Harmonized datasets need to conform to an established standard format and vocabulary before any analysis can be done. They need to have a bare minimum threshold of completeness (i.e., what percentage of values are null or empty). They also need to prove a certain level of plausibility (i.e., do the data make sense for what is expected, are they believable and credible). To date, most data sharing networks have developed internal protocols and tools to manage data harmonization, but no publicly available tool with a standard operating procedure exists to easily assess and visualize data quality tests across institutions. Therefore, data quality remains a problem that is inconsistently tackled and only by high level analytic teams if available.
TODO see here
Point person (github handle) | Site | Program Director |
---|---|---|
Kari Stephens (@kstephen0909) | UW | Sean Mooney (@sdmooney) |
Lead(s) (github handle) | Site |
---|---|
Kari Stephens (@kstephen0909) | UW |
Adam Wilcox (@abwilcox) | UW |
Team members can be found here
Originally Develop DQe-c Tool
https://github.com/data2health/DQe-c
Ongoing Re-Engineering of DQe-c Tool
https://github.com/data2health/DQe-c-v2
- Data quality testing tool (DQe-c) available to CTSA hubs and affiliates
- Data quality testing tool standard operating procedures and documentation supporting local configuration
- List of recommended minimum level data quality tests to help with data sharing assurance
View the project milestones here
View the Evaluation component here
View the education component here.
View the engagment component here
Team collaborative working folder can be found here
#data-quality is accessible to participants that have been onboarded