-
Notifications
You must be signed in to change notification settings - Fork 13
GCP Validation
Exploring the GCP healthcare API using the MIMIC-IV on FHIR dataset.
Sources used:
- Create dataset: https://cloud.google.com/healthcare-api/docs/how-tos/datasets
- Create FHIR store in dataset: https://cloud.google.com/healthcare-api/docs/how-tos/fhir
- https://cloud.google.com/healthcare-api/docs/how-tos/fhir-import-export
- Add FHIR implementation guide: https://cloud.google.com/healthcare-api/docs/how-tos/fhir-profiles
- Allowing Healthcare API access to other GCP services: https://cloud.google.com/healthcare-api/docs/how-tos/permissions-healthcare-api-gcp-products
Enable the Healthcare API: https://console.cloud.google.com/healthcare/
This creates a service account for the healthcare API with a name service-PROJECT_NUMBER@gcp-sa-healthcare.iam.gserviceaccount.com
.
Project number can be found at this link: https://console.cloud.google.com/iam-admin/settings?project=PROJECT_NAME_HERE
We'll use a few variables throughout
export GCP_PROJECT_NUMBER=<project number>
export GCP_PROJECT_ID=<project id>
export GOOGLE_LOCATION=<project location>
export GOOGLE_BILLING_ACCOUNT=<billing account info>
export GOOGLE_DATASET=mimic-iv-fhir-dataset
export GOOGLE_DATASTORE=mimic-iv-fhir-v2-demo
export GOOGLE_TOPIC=mimic-fhir-bundles
export GOOGLE_IG_FOLDER='gs://mimic-fhir/implementation-guides/mimic-iv-on-fhir-ig/'
# gcloud beta services identity create --service=healthcare.googleapis.com --project=$GCP_PROJECT_ID
gcloud projects add-iam-policy-binding ${GCP_PROJECT_ID} \
--member=serviceAccount:service-${GCP_PROJECT_NUMBER}@gcp-sa-healthcare.iam.gserviceaccount.com \
--role=roles/storage.objectViewer
# pick the GCP project
gcloud init
# create healthcare dataset to host the project
gcloud healthcare datasets create $GOOGLE_DATASET
# create a FHIR store
gcloud healthcare fhir-stores create $GOOGLE_DATASTORE --dataset=$GOOGLE_DATASET --version=R4 --enable-update-create
- when data store is created through UI there is a setting to "Allow update create" which is not on by default...
With the FHIR store created, we'll need to import an IG so we have the custom profiles we need to validate against. Loading the IG is 5 steps:
- Add the
global
var to the implementation guide json file as described on this page.
- Effectively you are adding a list of all the profiles with the FHIR resource type
"global": [
{
"type": "Patient",
"profile": "http://mimic.mit.edu/fhir/mimic/StructureDefinition/mimic-patient"
},
...
]
- Upload IG to the google bucket. Only update StructureDefinition, ValueSets, CodeSystems, and ImplementationGuide
import IG - go to the folder with the IG (ie mimic-profiles/output after generating the IG with the publisher)
gsutil -m cp -r ImplementationGuide-kindlab.fhir.mimic.json "StructureDefinition*.json" "ValueSet*.json" "CodeSystem*.json" $GOOGLE_IG_FOLDER
gcloud healthcare fhir-stores import gcs $GOOGLE_DATASTORE \
--dataset=$GOOGLE_DATASET \
--gcs-uri=${GOOGLE_IG_FOLDER}* \
--content-structure=resource-pretty
- Enable the implementation guide
curl -X PATCH -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" -H "Content-Type: application/fhir+json; charset=utf-8" --data '{"validationConfig": {"enabledImplementationGuides":["http://mimic.mit.edu/fhir/mimic/ImplementationGuide/kindlab.fhir.mimic"], "disableProfileValidation": false}}' "https://healthcare.googleapis.com/v1beta1/projects/$GCP_PROJECT/locations/$GOOGLE_LOCATION/datasets/$GOOGLE_DATASET/fhirStores/$GOOGLE_DATASTORE?updateMask=validationConfig"
The default import function for the FHIR Store does not have validation. We'll need to set up a pub/sub service to work as a queue for all the FHIR bundles being inserted into the FHIR Store.
- Create a topic for your pub/sub:
gcloud pubsub topics create mimic-fhir-bundles
Create Cloud Function to respond to Topic and insert bundles into Healthcare FHIR Store. Whenever a bundle is posted to the Topic it will be processed by the Cloud Function.
-
The Cloud Function will accomplish the following:
- Post bundles for validation to the Google Healthcare FHIR Store (cloud function script to post bundles)
- Log validation errors into a BigQuery table (create table script)
- Save failed bundles into Cloud Storage for debugging/reprocessing (describe where to point this, based on Cloud Function script)
-
Create the Cloud Function:
- From the main directory of the mimic-fhir repo run the following command to create the cloud function:
gcloud functions deploy bundle_processor \
--runtime=python39 \
--region=$GOOGLE_LOCATION \
--source=gcp/functions/bundle_processor/. \
--entry-point=bundler \
--trigger-topic=$GOOGLE_TOPIC \
--timeout=300
- This follows the documentation for Python pubsub cloud functions
- From the main directory of the mimic-fhir repo run the commands to set up the schema and tables:
bq mk mimic_fhir_log
bq query < gcp/bigquery/bundle_pass.sql --use_legacy_sql=false
bq query < gcp/bigquery/bundle_error.sql --use_legacy_sql=false
- The bundle_error table will include:
column | description |
---|---|
logtime | time the error was logged |
bundle_group | bundle group name (lab, medication etc) |
bundle_id | A unqiue id that is a combination of the bungle_group and a random UUID4 |
bundler_dir | The location in Cloud Storage where the errors are written. |
error_text | The main error message from the Healthcare API |
error_diagnostics | A longer explanation of the error |
error_expression | The element location the error occurred in the resource |
- The bundle_pass table will include:
column | description |
---|---|
logtime | Time the bundle validation results were logged |
patient_id | Patient identifier from mimic-fhir |
bundle_group | bundle group name (lab, medication etc) |
bundle_id | A unqiue id that is a combination of the bungle_group and a random UUID4 |
bundler_dir | The location in Cloud Storage where the errors are written. |
starttime | Time the bundle validation request was sent |
endtime | Time the Healthcare API responded with a successful validation |
- Update your .env file to specify "GCP" as the VALIDATOR environment variable
- Run
source .env
to update your environment variables
- Run
- Run py_mimic_fhir validation!
py_mimic_fhir validate --num_patients=10 --num_cores=7
- num_patients: Select how many patients you want to validate
- num_cores: Select cores at max to be one less than your computer's max cores (or everything will get really slow on your computer)
- Check BigQuery Tables. Some useful queries below:
- Bundle errors:
SELECT * FROM
kind-lab.mimic_fhir_log.bundle_error;` - Bundles that passed validation:
- Full table:
SELECT * FROM
kind-lab.mimic_fhir_log.bundle_pass;` - Summary of validation run:
- Full table:
- Bundle errors:
SELECT
bundle_dir,
MAX(endtime)-MIN(starttime) AS deltaT,
COUNT(DISTINCT patient_id) AS pat_count,
COUNT(DISTINCT CONCAT(patient_id, '-',bundle_group) ) AS bundle_count,
COUNT(bundle_id) AS total_bundles_posted
FROM `kind-lab.mimic_fhir_log.bundle_pass`
GROUP BY bundle_dir
- Check Cloud Storage to dive deeper into any errors
- From BigQuery you can find the name of the directory from bundle_dir to find the spot in Cloud Storage
- Open specific bundle_id file and investigate the issue further