Skip to content

Analytics on a single machine using Docker

williamito edited this page Feb 21, 2023 · 22 revisions

The repository includes a "Single Machine" Docker Compose configuration which brings up the FHIR Pipelines Controller plus a Spark Thrift server on a single machine so you can more easily run Spark SQL queries on the Parquet files output by the Pipelines Controller. Before using this single machine configuration, see Try out the FHIR Pipelines Controller to learn how the Pipelines Controller works on its own.

This guide assumes you already have a HAPI FHIR server configured to use Postgres as its database. Alternatively, you can try it out with a local test server following the instructions for a HAPI source server with Postgres. You also need Docker Compose installed on the host machine. All file paths are relative to the root of the FHIR Data Pipes repository cloned on the host machine.

Configure the FHIR Pipelines Controller

  1. Open docker/config/application.yaml and edit the value of fhirServerUrl to match the FHIR server you are connecting to.
  2. Open docker/config/hapi-postgres-config_local.json and edit the values to match the FHIR server you are connecting to.

If you are trying the Single Machine configuration using the provided local test servers, things should work with the default values. Alternatively, use the ip address of the Docker default bridge network. To find it, run the following command and use the "Gateway" value:

docker network inspect bridge --format='{{json .IPAM.Config}}'

Run the Single Machine configuration

To bring up the configuration, run:

docker-compose -f docker/compose-controller-spark-sql.yaml up --force-recreate

If you have run this container in the past and want to include new changes pulled into the repo, add the --build flag to rebuild the binaries.

Once fully up, the Pipelines Controller is available at http://localhost:8090 and the Spark Thrift server is at http://localhost:10001.

The first time you run the Pipelines Controller, you must manually start a Full Pipeline run. In a browser go to http://localhost:8090 and click the Run Full button.

View and analyze the data using Spark Thrift server

Connect to the Spark Thrift server using a client that supports Apache Hive. For example, if using the JDBC driver, the URL should be jdbc:hive2://localhost:10001.