Vision Zero Database (VZD)

The Vision Zero Database (VZD) is a Postgresql database that serves as the central repository of Austin's traffic crash data. The database is fronted with a GraphQL API, powered by Hasura, which is also used to manage schema migrations.

The design supports an editing environment which enables Vision Zero program staff to edit and enrich crash data, while also allowing record updates to flow into the database from upstream sources, such as the TxDOT Crash Records Information System (CRIS).

Data sources
Common maintenance tasks
Backups
Hasura
Development and deployment
- Generating migrations and metadata changes
- Merging an approved feature branch

Data sources

TxDOT Crash Records Information System (CRIS)

The TxDOT Crash Record Information System (CRIS) is a statewide, automated database for traffic crashes reports.

CRIS data accounts for the vast majority of records in the database: the Vision Zero Editor (VZE) is designed primarily as a tool for editing and enriching data received from CRIS, and the Vision Zero Viewer (VZV) is powered entirely by enriched CRIS data.

The CRIS data in our database consists of four record types:

crashes - each row is single crash event, with attributes such as the crash timestamp, crash location, and manner of collision
units - each row describes a unit, or vehicle, involved in a crash. Each unit relates to one crash record.
people - each row describes a person involved in a crash. Each person relates to one unit.
charges - each row describes a legal charge filed by the responding law enforcement agency. Each charge relates to one person. Charges have special handling and only exist in the charges_cris, as described in detail below.

Design

The core challenge that the database schema solves is to preserve the integrity of staff members' edits while simultaneously allowing crash record updates to flow into the database from CRIS. The editing environment is achieved by managing two copies of each of the crashes, units, and people tables, so that CRIS edits can have their own table and thus be isolated from Vision Zero staff edits.

For example, the crashes records are managed in two tables:

crashes_cris: records that are created and updated by TxDOT CRIS through the CRIS import ETL
crashes: a unified version of each record which combines the values in crashes_cris plus any edits created by Vision zero staff through the Vision Zero Editor web app. This table functions as the final source of truth, where user edits take precedence over CRIS values.

As pictured in the diagram below, the typical data flow for a crash record is as follows:

A new record is inserted into the crashes_cris table through the CRIS import ETL.
On insert into crashes_cris, a complete copy of the record is inserted into the crashes table.
A Vision Zero Editor user may update crash records by updating rows in the crashes table.
When an existing crashes_cris record is updated through the CRIS import ETL, a trigger function compares the old value in the crashes_cris table to the value in the unified crashes table and if they are the same, meaning there have been no user edits, the crashes value is also updated with the new value. If there has been a user edit, the updates to the crashes_cris table do not propagate to the crashes table.
Once a record is updated in the crashes table, additional trigger functions apply various business rules and enrich the row with spatial attributes based on its location. These trigger functions are reserved for values that require heavy computation—additional business rules can be applied through table views.

The "layered" editing environment of the Vision Zero Database

The process for updating units and people behaves in the same manner as crashes. To ensure proper data flow and trigger behavior, records should never be directly inserted into the unified tables.

CRIS Extract configuration and accounts

The team maintains multiple CRIS login accounts to manage the delivery of CRIS data to our AWS S3 ingest bucket. See the CRIS import ETL documentation for more information on how CRIS data is delivered to S3 and imported into our database.

The credentials for our CRIS logins are in the password store, including a note in the title indicating each login's purpose:

Production extract account: this is the account which is configured for daily delivery of extracts to S3 production.
Dev/testing extract account: this account should be used for requesting ad-hoc CRIS extracts for delivery via S3 or manual download.
Query & analyze account: this account can be used for the CRIS query interface, that enables querying and access to individual crash records.

Additional information about CRIS access can be found on the TxDOT website.

Follow these steps to configure a new extract delivery:

Login to CRIS using the appropriate account (see above): https://cris.dot.state.tx.us/
From the My Extract Requests page, click Add, and follow the extract request wizard

Extract Type: Standard
Extract Format: CSV
Include CR-3 Crash Report files in Extract: Yes (checked)

Include Crash Reports From: Specific Counties: Hays, Travis, and Williamson
Include Crash Reports From: Process Date range
- If you are backfilling, include a day before your target day as a buffer.
- If you request a process date of today, the extract will not deliver until the next day.
- To set up a recurring request, add a range of dates that ends in the future.

Any part of the range that falls in the past will be delivered in single zip that is separate from the zips that will deliver in the future. The includes - all records with process dates available including today.

Any part of the range that is in the future will create daily zips that include each day available going forward. For example, on 4/19/2024, you make a request for Process Begin Date = 01/01/2024 and Process End Date = 12/31/2024 The would receive two zips: One containing all records with process date from 01/01/2024 to 04/18/2024, and one containing all records with process date from 04/19/2024 to 04/19/2024. Going forward, you will receive one zip per day for each process date that passes

Extract password: the password called EXTRACT_PASSWORD from Vision Zero CRIS Import 1Password item
Delivery: How you want to receive it. Typically you would use the pre-configured AWS option, specifiyng the dev, staging, or prod inbox subdirectory. See the CRIS import ETL readme for more details.

CRIS data processing

We receive CRIS data from TxDOT on a nightly basis through the CRIS "automated interface", which delivers an encrypted .zip file to an S3 bucket on our AWS premise. The .zip file contains all crash records processed in the last 24 hours, and includes both CSV files and crash report PDFs (aka CR3s).

At the time of writing, this guide provides an overview of how CRIS data delivery is configured.

For more details on how we ingest CRIS data into our database, see the CRIS import ETL documentation.

Lookup tables

Lookup tables for crashes, units, and people tables are housed in the lookups schema in the database. Here's what you need to know about them:

The majority of our lookup tables are defined by CRIS and exactly match the CRIS extract schema
Some of our lookup tables contain custom lookup values, and we have a mechanism for managing custom values alongside CRIS-provided values
Some of our lookup tables are completely custom and do not exist in the CRIS extract
Because we enforce foreign key constraints against all lookup table references, the CRIS import ETL will break if our lookup tables are not periodically refreshed to ensure they match the latest CRIS schema. We have a helper script to assist with that task.

See the Common maintenance tasks section for specific details about creating and updating lookup tables.

Lookup table structure and custom lookup table values

All lookup tables follow the same table structure, with the three columns:

id - an integer primary key value that is not auto-incrementing. These ID values are defined either by CRIS or by our team member who implements a custom lookup value.
label - the text label descriptor of the lookup value
source - the entity who is the source of the lookup table value definition. should be either cris or, for custom values, vz.

In addition to the source column, constraint checks must be added to tables which use custom values, to ensure that CRIS-provided id values do not collide with custom vz-sourced values.

For example, consider the lookups.injry_sev table, which includes a custom value, KILLED (NON-ATD), which is used by staff to override the CRIS-defined injury severity of a person record:

| id  | label                    | source |
| --- | ------------------------ | ------ |
| 0   | UNKNOWN                  | cris   |
| 1   | SUSPECTED SERIOUS INJURY | cris   |
| 2   | SUSPECTED MINOR INJURY   | cris   |
| 3   | POSSIBLE INJURY          | cris   |
| 4   | FATAL INJURY             | cris   |
| 5   | NOT INJURED              | cris   |
| 95  | AUTONOMOUS               | cris   |
| 99  | KILLED (NON-ATD)         | vz     |

The original migration for this table is here.

Because the table has a custom value, it is configured with a check constraint (PostgreSQL docs) to ensure that future updates to this lookup table do not result in an ID collision:

"injry_sev_owner_check" CHECK (id < 99 AND source = 'cris' OR id >= 99 AND source = 'vz')

Any row inserted into this table must use the source vz if the id value is greater than or equal to 99. This ensures that our lookup table helper script (todo: link) will not override our custom lookup values.

Additionally, the people_cris table, which references this lookup, is configured with a check constraint that prevents CRIS from using our custom value:

 "people_cris_prsn_injry_sev_id_check" CHECK (prsn_injry_sev_id < 99)

See the Common maintenance tasks section for more details about creating and updating lookup tables.

Charges records

Charges records are provided by CRIS and describe a legal charge filed by the responding law enforcement agency. These records require special handling in our database because CRIS does not provide a unique primary key column for charges. Here's what you should know about these records:

Charge records only exist in the charges_cris table and do not have a corresponding charges table.
During the CRIS import, all charge records are deleted from the database for any crash ID present in the CRIS extract. After deletion, the CRIS import ETL inserts all charges records present in the extract.
The CRIS import ETL filters out charge records where the charge value is NO CHARGE—this reduces the number of charge records in the database by many thousands.
Because charges are subject to deletion, they are not editable through the VZE/graphql API and should be considered read-only.

Database IDs, CRIS record IDs, and primary keys

Each of the crashes, units, cris, and charges tables uses an auto-incrementing integer column called id as its primary key. CRIS provides a separate set of columns which can be used to uniquely identify records, and these columns are used to match record updates provided by CRIS to their corresponding record in the database.

For clarity, the column name crash_pk is used on tables which reference the crash id column, and the column name cris_crash_id is used to reference the CRIS-provided ID column, crash_id. Prior to Vision Zero v2.0, the name crash_id was used universally in reference to the CRIS crash ID column.

This table outlines the primary key columns in the database and how they relate to CRIS-provided identifiers.

Record type	Primary key column	CRIS row identifier	Parent record type	Parent foreign key column name	Note
crashes	`id`	`cris_crash_id` (aka `crash_id` in the CRIS extract)
units	`id`	(`unit_nbr`, `cris_crash_id`)	crashes	`crash_pk`	`crash_pk` set via `units_cris_set_unit_id` [sic] trigger function
people	`id`	(`prsn_nbr`, `unit_nbr`, `cris_crash_id`)	units	`unit_id`	`unit_id` is set via `people_cris_set_unit_id` trigger function
charges	`id`	(`prsn_nbr`, `unit_nbr`, `cris_crash_id`)	crashes, people	`crash_pk`, `person_id`	`crash_pk` and `person_id` set via `charges_cris_set_person_id_and_crash_pk` trigger function

User-created crash records, aka "temporary" records

Because there can be a lag time of weeks, even months, before law enforcement investigators submit their crash report to TxDOT, the Vision Zero team needs the ability to manually create "temporary" records so that reporting metrics are more timely/accurate.

The VZE makes this possible by allowing users to insert crash, unit, and people records directly into the database. User-created records must be inserted into the _cris tables to ensure proper data flow into the unified tables, as the database expects that every single record that exists in the unified tables exists in the _cris tables and vice versa.

User-created records do not have a cris_crash_id column. Because cris_crash_id is central to the VZE for searching and navigating to crash pages, we use a generated column, crashes.record_locator, as a pseudo-crash ID. The record_locator column is generated as either T<crashes.id> (for temp records) or <cris_crash_id> (for CRIS records), and is rendered throughout the VZE.

Audit fields

Audit fields are used through the CRIS record tables and are managed via trigger. Any new tables add to the database should follow the same convention:

created_at: the creation timestamp of the record. Default now().
updated_at: the timestamp of the last record update. Default now(), set via trigger on row update.
created_by: the email address of the user who created the record. default system.
updated_by: the email address of the user who updated the record. default system.

Change logs

The database includes an extensive change logging system that captures all edits to any of the nine tables that comprise the crash, unit, and people tables. Change log entries are created via triggers that fire after records are modified, and includes a copy of both the old and new version of each record as a JSON blob.

Each change log table follows the same structure:

column_name	data_type	description
`id`	`integer`	Auto-incrementing primary key
`record_id`	`integer`	Foreign key referencing the record's `id` column
`operation_type`	`text`	The event that triggered the change: `UPDATE` or `INSERT`
`record_json`	`jsonb`	A JSON blob of the record which contains the `old` and `new` version of the record
`created_at`	`timestamp with time zone`	The timestamp this row was created - default `now()`
`created_by`	`text`	The user who triggered this change - default `system`

The view crashes_change_log_view provides a unioned view of the unified table change logs—this view powers the change log UI in the VZE.

Austin Fire Department (AFD) and Travis County Emergency Medical Services (EMS) (todo)

Geospatial layers

We have a number of tables which function as geospatial layers which are referenced by crashes and various other records. At the Vision Zero team's request, our team is actively working to expand the number of layers available in the database as well as add new attribute columns to crash records which will be populated based on their intersection with these layers.

See also the guidance for creating a new geospatial layer in the common maintance tasks section, below.

Table	Geometry type	description	owner/source
`council_districts`	`MultiPolygon`	City of Austin council districts	ArcGIS Online authoritative layer owned by CTM GIS
`atd_jurisdictions`	`MultiPolygon`	City of Austin jurisdictions	ArcGIS Online authoritative layer owned by CTM GIS
`engineering_areas`	`MultiPolygon`	TPW traffic engineering areas	ArcGIS Online authoritative layer owned by DTS GIS
`non_coa_roadways`	`MultiPolygon`	Polygon layer covering roadways which are not maintained by the City of Austin	ArcGIS Online authoritative layer maintained by Vision Zero GIS team
`atd_txdot_locations`	`MultiPolygon`	Aka, "location polygons", these shapes are used to group crashes based on an intersection or road segment	ArcGIS Online authoritative layer maintained by Vision Zero GIS team
`signal_engineer_areas`	`MultiPolygon`	Polygon zones assigned to traffic signal engineers	ArcGIS Online authoritative layer owned by DTS GIS
`zip_codes`	`MultiPolygon`	Polygons which represent the Zone Improvement Plan (ZIP) postal code areas in the Austin metro area	ArcGIS Online authoritative layer owned by DTS GIS
`apd_sectors`	`MultiPolygon`	Polygons which represent Austin Police Department (APD) sectors and districts used for dispatching and reporting	ArcGIS Online authoritative layer owned by APD

Common maintenance tasks

Add a new CRIS-managed column to `crashes`, `units`, or `people`

Follow these steps to add a new column to the database that will be sourced from CRIS. See PR #1546 as an example.

Remember that all database operations should be deployed through migrations. See the development and deployment docs.
Add the new column to both tables of the given record type. For example, if this is a crash-level column, add the column to the crashes_cris and crashes tables.
Modify the trigger function that inserts new rows into the unified table that corresponds to the record type you are modifying: either the crashes_cris_insert_rows(), units_cris_insert_rows(), or the people_cris_insert_rows() function. Locate the part of the function that inserts into the unified table and add your column name to end of it, then locate the part that selects all values from the new _cris record and do the same. Make sure that the order of the columns in the insert and select parts of the function match up
Next, you will need to add your new column to the _column_metadata table, so that the CRIS import ETL is aware that this column should be included in imports. For example:

insert into _column_metadata (column_name, record_type, is_imported_from_cris)
values ('drvr_lic_type_id', 'people', true);

When you're ready to test the trigger behavior, you can enable debug messaging for this trigger by executing the command set client_min_messages to debug;. This will cause the trigger debug messages to log to your SQL client.
Re-apply Hasura metadata to ensure that your new column is known to the graphql API. You do not need to modify the metadata unless you want to add select, insert, and/or update permissions to this column for non-admin users.

# ./database
hasura metadata apply

You are now ready to test your new column using the CRIS import ETL. If you need to backfill this new column for old records, you will need to manually request the necessary CRIS extract zip files so that they can be processed by an ad-hoc run of the CRIS import ETL.

Add a custom column to `crashes`, `units`, or `people`

Follow these steps to add a custom, non-CRIS column to crashes, units, or people.

Remember that all database operations should be deployed through migrations. See the development and deployment docs.
Add your new column to the unified table for the given record type. For example, if this is a crash-level column, add the column to the crashes table.
Unlike adding a CRIS-managed column, you do not need to modify any trigger functions.
Next, add your new column to the _column_metadata table and indicate that it should not be imported by CRIS.

insert into _column_metadata (column_name, record_type, is_imported_from_cris)
values ('vz_custom_column', 'crashes', false);

When you're ready to test the trigger behavior, you can enable debug messaging for this trigger by executing the command set client_min_messages to debug;. This will cause the trigger debug messages to log to your SQL client.

Adding a computed or generated field to `crashes`, `units`, or `people`

Follow these steps to add a computed or generated field to crashes, units, or people.

Remember that all database operations should be deployed through migrations. See the development and deployment docs.
Computed or generated fields should be added to the unified record table only. For example, if this is a crash-level column, add the column to the crashes tables.
If your column will be modified by a trigger function or generated column definition, be sure to inspect the table's existing triggers and generated fields and understand their execution order.
Lastly, add your new column to the _column_metadata table and indicate that it should not be imported by CRIS.

insert into _column_metadata (column_name, record_type, is_imported_from_cris)
values ('my_generated_column', 'crashes', false);

When you're ready to test the trigger behavior, you can enable debug messaging for this trigger by executing the command set client_min_messages to debug;. This will cause the trigger debug messages to log to your SQL client.

Refreshing lookup tables with the latest CRIS values

Todo: see the helper script readme.

Add a custom lookup value to a CRIS-managed lookup table (todo)

Debugging record triggers

The various record insert and update trigger functions which manage the _cris to unified table data flows have debugging statements embedded. Debug messaging can be enabled on a per-client-session basis by executing the command set client_min_messages to debug; in your SQL client. Your SQL client will now log debug messages when you use it to make record inserts and updates.

Parsing change log data

This query will help you parse the change log JSON blobs to inspect what changes have occurred. Each row returned is a single column that was updated in a change event. You can replace crashes_cris with your table of interest, and you can modify the where condition to further filter on a specific record set.

You should always use a limit to avoid long-running queries against the change logs.

WITH diffs AS (
    SELECT
        id AS change_id,
        record_id,
        created_at,
        created_by,
        column_name,
        record_json -> 'old' -> column_name AS old_value,
        record_json -> 'new' -> column_name AS new_value
    FROM
        change_log_crashes_cris,
        LATERAL jsonb_object_keys(record_json -> 'new') AS column_name
    WHERE (record_json -> 'new' -> column_name) IS DISTINCT FROM (record_json -> 'old' -> column_name)
    AND operation_type = 'UPDATE'
ORDER BY
    id DESC
)
SELECT
    *
FROM
    diffs
WHERE
    column_name NOT IN('updated_at', 'created_at', 'cr3_stored_fl', 'cr3_processed_at')
LIMIT 10000;

Creating a new geospatial layer

When creating a new database table to hold feature data, polygon geometries should always be stored in a Multipolygon column type to avoid future issues. Any columns that will be populated with AGOL feature attribute data should exactly match the column names used in AGOL, except they should be lowercase.

As a best practice, tables should always be configured with created_at of type timestamptz with default now(). Layers should make use of the ArcGIS Online Layer Helper tool so that they can be easily refreshed.

Typically, any foreign key constraint that references the layer should use the ON UPDATE SET NULL directive to ensure that the rows in the layer table can be deleted and re-inserted without being blocked by foreign references.

Backups

Daily database backups are managed via AWS RDS. Backups are retained for 14 Days.

Hasura

We deployed a standard Hasura container to work as an API between Postgres and the VZE. For more information on how it works, please refer to their website and documentation.

Development and deployment

Changes to the schema and database are handled by CI (GitHub Action workflow) that applies migrations and metadata using the Hasura CLI.

Generating migrations and metadata changes

Merge the latest code from the main branch into your feature branch to make sure you have the latest migrations that are applied to the staging database
Start up the local database and Hasura engine and replicate using the latest production data dump
To make sure your local database is up to date with the current changes in master branch, run:

hasura migrate apply --envfile .env.local
hasura metadata apply --envfile .env.local

Start the local Hasura console and make any changes needed which will then reflect in your project folder

Merging an approved feature branch

We need to check the order of migrations against those in the main branch before merging a feature branch so that we can make updates to the migration version order if needed. The version refers to the timestamp in the migration folder name.

To check migrations for any conflicts with the latest migrations in the main branch:

To check migrations for any conflicts with the latest migrations in the master branch:

Make sure that your branch is up to date with master
Check to make sure no one else is actively merging their work and coordinate if needed
Update the migration versions in your branch so they are the newest migrations if needed
Start up the local database and Hasura engine and replicate using the latest production data dump
Then, run:

hasura migrate apply --envfile .env.local
hasura metadata apply --envfile .env.local

Once we see that no errors occur when applying the sequence of migrations locally, we can merge and the CI will apply the new migrations and metadata to the staging database.

Database Schema Documentation

CI exists to automatically generate a database schema documentation file. This occurs when a PR is created and subsequently when a commit is pushed onto a PR'd branch. The CI is performed by a GitHub action which does the following:

Install the Hasura CLI for graphql-engine.
Spin up a postgres database which is initially empty.
Use the hasura CLI to deploy migrations, which build up the VZ DB
Install and use the dbdocs npm tool to generate a DBML file for the DB
Use the dbdocs tool again to upload the documentation to https://dbdocs.io.

The documentation can be found at:

Staging: https://dbdocs.io/transportation.data/Vision-Zero-Staging
Production: https://dbdocs.io/transportation.data/Vision-Zero-Production

The integration with https://dbdocs.io requires a token be generated after logging into the service locally with the dbdocs CLI tool. The token is stored in 1Password under the entry named 'DB Docs (dbdocs.io).'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Vision Zero Database (VZD)

Data sources

TxDOT Crash Records Information System (CRIS)

Design

CRIS Extract configuration and accounts

CRIS data processing

Lookup tables

Lookup table structure and custom lookup table values

Charges records

Database IDs, CRIS record IDs, and primary keys

User-created crash records, aka "temporary" records

Audit fields

Change logs

Austin Fire Department (AFD) and Travis County Emergency Medical Services (EMS) (todo)

Geospatial layers

Common maintenance tasks

Add a new CRIS-managed column to `crashes`, `units`, or `people`

Add a custom column to `crashes`, `units`, or `people`

Adding a computed or generated field to `crashes`, `units`, or `people`

Refreshing lookup tables with the latest CRIS values

Add a custom lookup value to a CRIS-managed lookup table (todo)

Debugging record triggers

Parsing change log data

Creating a new geospatial layer

Backups

Hasura

Development and deployment

Generating migrations and metadata changes

Merging an approved feature branch

Database Schema Documentation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Vision Zero Database (VZD)

Data sources

TxDOT Crash Records Information System (CRIS)

Design

CRIS Extract configuration and accounts

CRIS data processing

Lookup tables

Lookup table structure and custom lookup table values

Charges records

Database IDs, CRIS record IDs, and primary keys

User-created crash records, aka "temporary" records

Audit fields

Change logs

Austin Fire Department (AFD) and Travis County Emergency Medical Services (EMS) (todo)

Geospatial layers

Common maintenance tasks

Add a new CRIS-managed column to crashes, units, or people

Add a custom column to crashes, units, or people

Adding a computed or generated field to crashes, units, or people

Refreshing lookup tables with the latest CRIS values

Add a custom lookup value to a CRIS-managed lookup table (todo)

Debugging record triggers

Parsing change log data

Creating a new geospatial layer

Backups

Hasura

Development and deployment

Generating migrations and metadata changes

Merging an approved feature branch

Database Schema Documentation

Add a new CRIS-managed column to `crashes`, `units`, or `people`

Add a custom column to `crashes`, `units`, or `people`

Adding a computed or generated field to `crashes`, `units`, or `people`