Skip to content

Commit

Permalink
Merge pull request #162 from awell-health/tp-987-add-documentation-on…
Browse files Browse the repository at this point in the history
…-big-query-in-developer-hub

TP-987: Document the Big Query dataset in the developer hub
  • Loading branch information
ebomcke-awell authored Jan 29, 2024
2 parents be6e9a5 + 433a8ca commit 5ebaa06
Show file tree
Hide file tree
Showing 18 changed files with 796 additions and 353 deletions.
212 changes: 90 additions & 122 deletions content/awell-orchestration/docs/data/data-repository.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,148 +3,116 @@ title: Data repository
description: Understand what is available in the data repository
---

We use Elastic as our data repository to store, index, and search pathway data sets. You can either connect your own BI tool to our data repository in Elastic or use our analytics & dashboarding tool: Kibana.
## Understanding the data schema

The data repository contains two different types of data: collected data and generated data.
To use the data from the Awell platform, it's essential to understand some Awell terminology first.

**Collected data** refers to any atomic piece of data collected from a user or system in care flows (e.g. form responses, calculation results, etc.). These are the datapoints that you can find in the [data catalog](https://help.awellhealth.com/en/articles/4791428-data-catalog) in Awell Studio and that you can use to build care flows.
Below, we explain the terminology that you'll find back in the data schema. If you have any further questions about this, please contact your Awell Customer Success Manager.

**Generated data** refers to data generated by the orchestration of the care flow itself, i.e. data about which actions were orchestrated when in care flows.
### Care flow definition vs Care flow

## Collected data
In the Awell domain, a care flow definition is a care flow template designed in the Awell Studio, representing the general structure and components of a care plan without being tied to any specific patient.

All the collected data is stored in a single index: `orchestration-datapoint`.
A care flow on the other hand is a patient-specific instance derived from a care flow definition that tracks and manages an individual patient's care journey. For example, if you created the care flow definition "post-operative follow-up" in Awell Studio, all your patients that get included in this care flow definition will have an individual care flow.

This index uses a normalized structure to handle the fact that the collected data can have different value types (string, boolean, numeric, date etc), and can come from many different sources (forms, calculations, Extensions, etc.).
### Data point definition vs Data point

<Alert type="info">
<p class="mb-1">
When building a care flow in Awell Studio you have the ability to configure human readable identifiers for all data points as well as the source behind these data points. Our system uses that information to generate a `qualified_key` for each data point which combines the source key and the data point key. Let's imagine that you are building a patient form to collect the patient's weight and height, with the intent of calculating a BMI score. You set the form key to <em>bmi</em>, and set the question keys to <em>height</em> and <em>weight</em> respectively.
</p>
<p>
This will result in data points with the following qualified keys:
<ul>
<li><em>bmi.height</em> contains the answer to the <em>height</em> question in the <em>bmi</em> form</li>
<li><em>bmi.weight</em> contains the answer to the <em>weight</em> question in the <em>bmi</em> form</li>
</ul>
</p>
<p>
Use the <a href="https://help.awellhealth.com/en/articles/4791428-data-catalog">
Data Catalog
</a> to get an overview of all the collected data points, including the qualified keys and where in your care flow they are collected.
</p>
</Alert>
Similar to care flow definitions and care flows, a data point definition is a care flow component as designed in Awell Studio while a data point is the patient-specific instance of that data point definition. For example, if you collect the weight of a patient in a care flow, the data point for a specific patient could be 80 kg (or 176lbs).

<DataPointIndexSpecs />

<br />


### Sample queries

Given that the data point index is normalised, you can use any combination of the following properties to narrow down the query results:
- Use `data_set_id` or `pathway.id` to narrow down results to the data collected in a specific pathway
- Use `patient.id` to narrow down results to data collected for any pathway started for a specific patient
- Use `data_point_definition.category` to narrow down results to data collected from a specific source category
- Use `data_point_definition.qualified_key` to narrow down results to a specific data point
- Use `data_point_definition.qualified_key` with wildcards to narrow down results to data points that logically belong together

<figure className="w-full flex flex-col justify-center text-center relative">
<div className="dark:bg-[#F0F6FF] p-4 w-3/4 mx-auto rounded-md">
<a
href="https://res.cloudinary.com/da7x4rzl4/image/upload/v1680707183/Developer%20portal/Data_repository_-_Query_data_points_by_pathway_wln7wj.png"
target="_blank"
title="Querying data point by pathway"
>
<img
src="https://res.cloudinary.com/da7x4rzl4/image/upload/v1680707183/Developer%20portal/Data_repository_-_Query_data_points_by_pathway_wln7wj.png"
alt="Querying data point by pathway"
/>
</a>
</div>
<figcaption className="dark:text-slate-400 pt-2">
Querying data point by pathway
</figcaption>
</figure>
### Data point keys

<br />

<figure className="w-full flex flex-col justify-center text-center relative">
<div className="dark:bg-[#F0F6FF] p-4 w-3/4 mx-auto rounded-md">
<a
href="https://res.cloudinary.com/da7x4rzl4/image/upload/v1680707183/Developer%20portal/Data_repository_-_Query_data_points_by_category_esmmhg.png"
target="_blank"
title="Querying data point by source category"
>
<img
src="https://res.cloudinary.com/da7x4rzl4/image/upload/v1680707183/Developer%20portal/Data_repository_-_Query_data_points_by_category_esmmhg.png"
alt="Querying data point by source category"
/>
</a>
</div>
<figcaption className="dark:text-slate-400 pt-2">
Querying data point by source category
</figcaption>
</figure>
To avoid having to work with randomly generated IDs for data point definitions, we allow users to define a human-readable identifier in Awell Studio for all data point definitions. In the data repository, we combine this human-readable identifier with the source to form a 'key'. Let's imagine that you are building a patient form to collect the patient's weight and height, with the intent of calculating a BMI score. You set the form key to <em>bmi</em>, and set the question keys to <em>height</em> and <em>weight</em> respectively. This will result in data point definitions with the following keys:
- <em>bmi.height</em> contains the answer to the <em>height</em> question in the <em>bmi</em> form
- <em>bmi.weight</em> contains the answer to the <em>weight</em> question in the <em>bmi</em> form

<br />

<figure className="w-full flex flex-col justify-center text-center relative">
<div className="dark:bg-[#F0F6FF] p-4 w-3/4 mx-auto rounded-md">
<a
href="https://res.cloudinary.com/da7x4rzl4/image/upload/v1680707183/Developer%20portal/Data_repository_-_Query_data_points_by_key_bugi7y.png"
target="_blank"
title="Querying data point by key"
>
<img
src="https://res.cloudinary.com/da7x4rzl4/image/upload/v1680707183/Developer%20portal/Data_repository_-_Query_data_points_by_key_bugi7y.png"
alt="Querying data point by key"
/>
</a>
</div>
<figcaption className="dark:text-slate-400 pt-2">
Querying data point by key
</figcaption>
</figure>
### Release vs. Version

In Awell Studio you can see the list of published versions for a given care flow definition with an auto incremented version number. This version number is only used for display purposes. Behind the scenes we assign a unique release identifier to each published version, which you can retrieve through the [Get published pathway definitions query](/awell-orchestration/api-reference/queries/get-published-pathways).

The release identifier is guaranteed to be globally unique so it can be safely used as input to build analytics query on the data set.

## Schema

The data repository contains three different types of data: data points, orchestration data and patient data.

**Data points** refers to any atomic piece of data collected from a user or system in care flows (e.g. form responses, calculation results, etc.). These are the datapoints that you can find in the [data catalog](https://help.awellhealth.com/en/articles/4791428-data-catalog) in Awell Studio and that you can use to build care flows.

**Orchestration data** refers to data generated by the orchestration of the care flow itself, i.e. data about which actions were orchestrated when in care flows.

**Patient data** refers to the data you explicitly provide on the patients you enroll in your care flows. When using anonymous patients this only contains identifiers.

<br />

<figure className="w-full flex flex-col justify-center text-center relative">
<div className="dark:bg-[#F0F6FF] p-4 w-3/4 mx-auto rounded-md">
<a
href="https://res.cloudinary.com/da7x4rzl4/image/upload/v1680707183/Developer%20portal/Data_repository_-_Query_data_points_by_form_key_rgdxqu.png"
target="_blank"
title="Querying data point by form key"
>
<img
src="https://res.cloudinary.com/da7x4rzl4/image/upload/v1680707183/Developer%20portal/Data_repository_-_Query_data_points_by_form_key_rgdxqu.png"
alt="Querying data point by form key"
/>
</a>
</div>
<figcaption className="dark:text-slate-400 pt-2">
Querying data point by form key
<figure className="w-3/4 flex flex-col justify-center text-center relative m-auto">
<a
href="https://res.cloudinary.com/da7x4rzl4/image/upload/v1706515210/Developer%20portal/big-query-customer-dataset.png"
target="_blank"
className="custom-link"
>
<img
src="https://res.cloudinary.com/da7x4rzl4/image/upload/v1706515210/Developer%20portal/big-query-customer-dataset.png"
alt="Big Query - Schema diagram"
className="w-full sm:w-4/6 mx-auto rounded-lg"
/>
</a>
<figcaption className="pt-2 dark:text-slate-400">
Big Query - Schema diagram
</figcaption>
</figure>

## Generated data
### Data points

The data points are stored in two tables: `data_points` and `data_point_definitions`. The `data_points` table uses a normalized structure to handle the fact that atomic data can have different value types (string, boolean, numeric, date etc), and can come from many different sources (forms, calculations, Extensions, etc.).

#### Data Points

Table name: `data_points`.

<DataPointTableSpecs />

#### Data Point Definitions

Table name: `data_point_definitions`.

<DataPointDefinitionTableSpecs />

### Orchestration data

#### Care flows

Table name: `care_flows`.

<CareFlowTableSpecs />

#### Activities

Table name: `activities`.

<Alert type="info">
<p className="mb-1">
Activities use a generic structure to be able to describe any action that needs to be performed by a human or system.
</p>
<p>
Go to the <a href="/awell-orchestration/api-reference/overview/domain-model#activities">
Domain Model reference
</a> to find more information about how activities are structured, including examples.
</p>
</Alert>

<ActivityTableSpecs />

The generated data is stored in three indices:
### Patient data

1. `orchestration-pathway`
2. `orchestration-step`
3. `orchestration-activity`
#### Patients

The pathway and step indices provide a "snapshot" of the status of all pathways and steps that have ever been created in your Awell tenant (e.g. which are still active, which were completed, etc.), while the activity index tells you which system activities have occured to orchestrate your care flows. You can find more information about "activities" in the Awell platform (here)[https://developers.awellhealth.com/awell-orchestration/docs/getting-started/domain-model#activities].
Table name: `patients`

### Pathway (orchestration-pathway)
<PatientTableSpecs />

<PathwayIndexSpecs />
#### Patient profiles

### Step (orchestration-step)
Table name: `patient_profiles`

<StepIndexSpecs />
<PatientProfileTableSpecs />

### Activity (orchestration-activity)
## Getting access to your dataset

<ActivityIndexSpecs />
You need a service account to access your Big Query dataset. Please contact your Awell Customer Success Manager to receive your account details.
96 changes: 0 additions & 96 deletions content/awell-orchestration/docs/data/kibana/create-dashboards.mdx

This file was deleted.

Loading

1 comment on commit 5ebaa06

@vercel
Copy link

@vercel vercel bot commented on 5ebaa06 Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.