Skip to content

Terminology Generation

alexmbennett2 edited this page May 4, 2022 · 5 revisions

Terminology Generation

FHIR has strict rules on the format of terminology but flexible with the content of the terminology system. In FHIR there are two main terminology objects: CodeSystem and ValueSet. CodeSystem comprises the source codes from a single system while a ValueSet compiles codes from one to many CodeSystems.

To convert MIMIC terminology into the FHIR format there are a couple steps:

  1. Generate terminology tables from the MIMIC source tables
  2. Generate CodeSystems and ValueSets from the terminology code tables
  3. Output CodeSystems and ValueSets jsons

1. Generate terminology tables

By default the terminology tables should be generated when you run the initial create_fhir_jsons.sql to generate all mimic-fhir tables. If you need to rerun the terminology table generation, run create_fhir_terminology.sql.

The create_fhir_terminology.sql script will call ~50 SQL scripts to generate terminology tables from the MIMIC source. As the version of MIMIC changes, this can be rerun to capture the latest MIMIC terminology in FHIR.

The terminology tables are all stored in the schema fhir_trm. Each table will have a combination of the three columns:

  • code - code that identifies the concept
  • display - text to display to user
  • system - the source system for codes

The majority of terminology tables will just have a code or code/display combo. In a few select cases the ValueSet is derived from multiple CodeSystems so system is needed (ie ValueSet-procedure-icd is derived from CodeSystem-procedure-icd9 and CodeSystem-procedure-icd10)

2. Generate CodeSystems and ValueSets

First get the py_mimic_fhir package setup by following the instructions in step 4 of the Quickstart.

py_mimic_fhir has a terminology mode. The arguments are:

  • --version specifies the version of the terminology. This should be synced up with the MIMIC-IV version
  • --status specifies the status either as draft or complete. Defaults to complete

An example call is: py_mimic_fhir terminology --version 0.4 --status complete.

The terminology systems are written out as json to the MIMIC_TERMINOLOGY_PATH specified in your .env).

3. Post CodeSystems and ValueSets to server

With HAPI-FHIR it is necessary to post the terminology to properly expand all the ValueSets. To do so, use py_mimic_fhir:

  • py_mimic_fhir terminology --post

As well if you want to generate and post the terminology immediately you can use the command:

  • py_mimic_fhir terminology --generate_and_post

Terminology extra notes

The majority of mimic-fhir's ValueSets are just references to the whole source CodeSystem. There are 2 scenarios where this is different:

  1. ValueSet is subset of CodeSystem
    • The ICU d items CodeSystem holds the codes for all ICU events including chartevents, datetimeevents, outputevents, and procedureevents
    • Each ICU event table will needs to reference these items, so ValueSets are created that point to the base D Items CodeySystem but only take a subset of the codes
  2. ValueSet with double CodeSystem
    • Procedure-icd pulls codes from both procedure-icd9 and procedure-icd10 CodeSystems
    • Same goes for diagnosis-icd