Skip to content

Latest commit

 

History

History
374 lines (268 loc) · 20.4 KB

readme.md

File metadata and controls

374 lines (268 loc) · 20.4 KB

hafas-gtfs-rt-feed

Generate a GTFS Realtime (GTFS-RT) feed by polling a HAFAS endpoint and matching the data against a GTFS Static/Schedule dataset.

npm version build status Prosperity/Apache license support me via GitHub Sponsors chat with me on Twitter

Architecture

hafas-gtfs-rt-feed consists of 3 components, connected to each other via NATS Streaming channels:

  1. monitor-hafas: Given a hafas-client instance, it uses hafas-monitor-trips to poll live data about all vehicles in the configured geographic area.
  2. match-with-gtfs: Uses match-gtfs-rt-to-gtfs to match this data against static GTFS data imported into a database.
  3. serve-as-gtfs-rt: Uses gtfs-rt-differential-to-full-dataset to aggregate the matched data into a single GTFS-RT feed, and serves the feed via HTTP.

monitor-hafas sends data to match-with-gtfs via two NATS Streaming channels trips & movements; match-with-gtfs sends data to serve-as-gtfs-rt via two channels matched-trips & matched-movements.

flowchart TB
  subgraph external[ ]
    hafas(HAFAS API):::external
    db(GTFS Static/Schedule in PostgreSQL):::external
    consumers(consumers):::external
    classDef external fill:#ffd9c2,stroke:#ff8e62
  end
  style external fill:none,stroke:none
  subgraph hafas-gtfs-rt-feed
      monitor-hafas(monitor-hafas)
      match-with-gtfs(match-with-gtfs)
      serve-as-gtfs-rt(serve-as-gtfs-rt)
  end
  style hafas-gtfs-rt-feed fill:none,stroke:#9370db
    subgraph nats[NATS Streaming]
        trips[trips channel]:::channel
        movements[movements channel]:::channel
        matched-trips[matched-trips channel]:::channel
        matched-movements[matched-movements channel]:::channel
    classDef channel fill:#ffffde,stroke:#aaaa33
    end
  style nats fill:none

    hafas-- realtime data -->monitor-hafas
    db-- static data -->match-with-gtfs
    serve-as-gtfs-rt-- GTFS-RT -->consumers

    monitor-hafas .-> trips .-> match-with-gtfs
    monitor-hafas .-> movements .-> match-with-gtfs
    match-with-gtfs .-> matched-trips .-> serve-as-gtfs-rt
    match-with-gtfs .-> matched-movements .-> serve-as-gtfs-rt
Loading

Getting Started

Some preparations are necessary for hafas-gtfs-rt-feed to work. Let's get started!

Run npm init inside a new directory to initialize an empty npm-based project.

mkdir my-hafas-based-gtfs-rt-feed
cd my-hafas-based-gtfs-rt-feed
npm init

set up NATS Streaming

Install and run the NATS Streaming Server as documented.

Note: If you run Nats Streaming on a different host or port (e.g. via Docker Compose), pass a custom NATS_STREAMING_URL environment variable into all hafas-gtfs-rt-feed components.

set up PostgreSQL

Make sure you have PostgreSQL >=14 installed and running (match-gtfs-rt-to-gtfs, a dependency of this project, needs it). There are guides for many operating systems and environments available on the internet.

Note: If you run PostgreSQL on a different host or port, export the appropriate PG* environment variables. The commands explain mentioned below will use them.

install hafas-gtfs-rt-feed

Use the npm CLI:

npm install hafas-gtfs-rt-feed
# added 153 packages in 12s

configure a hafas-client instance

hafas-gtfs-rt-feed is agnostic to the HAFAS API it pulls data from: To fetch data, monitor-hafas just uses the hafas-client you instantiate in a file, which queries one out of many available HAFAS API endpoints.

Set up hafas-client as documented. A very basic example using the Deutsche Bahn (DB) endpoint looks as follows:

// db-hafas-client.js
const createHafasClient = require('hafas-client')
const dbProfile = require('hafas-client/p/db')

// please pick something meaningful, e.g. the URL of your GitHub repo
const userAgent = 'my-awesome-program'

// create hafas-client configured to use Deutsche Bahn's HAFAS API
const hafasClient = createHafasClient(dbProfile, userAgent)

module.exports = hafasClient

build the GTFS matching database

match-with-gtfshafas-gtfs-rt-feed's 2nd processing step – needs a pre-populated matching database in order to match data fetched from HAFAS against the GTFS Static/Schedule data; It uses gtfs-via-postgres and match-gtfs-rt-to-gtfs underneath to do this matching.

First, we're going to use gtfs-via-postgres's gtfs-to-sql command-line tool to import our GTFS data into PostgreSQL.

Note: Make sure you have an up-to-date static GTFS dataset, unzipped into individual .txt files.

![TIP] The sponge command is from the moreutils package.

# create a PostgreSQL database `gtfs`
psql -c 'create database gtfs'
# configure all subsequent commands to use it
export PGDATABASE=gtfs
# import all .txt files
node_modules/.bin/gtfs-to-sql -d -u path/to/gtfs/files/*.txt \
    sponge | psql -b -v 'ON_ERROR_STOP=1'

You database gtfs should contain the static GTFS data in a basic form now.


match-gtfs-rt-to-gtfs works by matching HAFAS stops & lines against GTFS stops & lines, using their IDs and their names. Usually, HAFAS & GTFS stop/line names don't have the same format (e.g. Berlin Hbf & S+U Berlin Hauptbahnhof), so they need to be normalized.

You'll have to implement this normalization logic. A simplified (but very naive) normalization logic would look like this:

// hafas-config.js
module.exports = {
	endpointName: 'some-hafas-api',
	normalizeStopName: name => name.toLowerCase().replace(/\s+/g, ' ').trim(),
	normalizeLineName: name => name.toLowerCase().replace(/\s+/g, ' ').trim(),
}
// gtfs-config.js
module.exports = {
	endpointName: 'some-gtfs-feed',
	normalizeStopName: name => name.toLowerCase().replace(/\s+St\.$/, ''),
	normalizeLineName: name => name.toLowerCase(),
}

match-gtfs-rt-to-gtfs needs some special matching indices in the database to work. Now that we have implemented the names normalization logic, we're going to pass it to match-gtfs-rt-to-gtfs's build-gtfs-match-index command-line tool:

# add matching indices to the `gtfs` database
node_modules/.bin/build-gtfs-match-index path/to/hafas-config.js path/to/gtfs-config.js \
    sponge | psql -b -v 'ON_ERROR_STOP=1'

Note: hafas-gtfs-rt-feed is data- & region-agnostic, so it depends on your HAFAS-endpoint-specific name normalization logic to match as many HAFAS trips/vehicles as possible against the GTFS data. Ideally, the stop/line names are normalized so well that HAFAS data can always be matched to the (static) GTFS data. This is how GTFS-RT feeds are intended to be consumed: along a (static) GTFS dataset with 100% matching IDs. If the name normalization logic doesn't handle all cases, the GTFS-RT feed will contain TripUpdates & VehiclePositions whose route_id or trip_id doesn't occur in the GTFS dataset.

run it

Now that we've set everything up, let's run all hafas-gtfs-rt-feed components to check if they are working!

All three components need to be run in parallel, so just open three terminals to run them. Remember to set the NATS_STREAMING_URL & PG* environment variables (see above) in all three of them, if necessary.

They log pino-formatted log messages to stdout, so for local development, we use pino-pretty to make them more readable.

# specify the bounding box to be monitored (required)
export BBOX='{"north": 1.1, "west": 22.2, "south": 3.3, "east": 33.3}'
# start monitor-hafas
node_modules/.bin/monitor-hafas db-hafas-client.js | npx pino-pretty
# todo: sample logs
node_modules/.bin/match-with-gtfs | npx pino-pretty
# todo: sample logs
node_modules/.bin/serve-as-gtfs-rt | npx pino-pretty

inspect the feed

Your GTFS-RT feed should now be served at http://localhost:3000/, and within a few moments, it should contain data! 👏

You can verify this using many available GTFS-RT tools; Here are two of them to quickly inspect the feed:

  • print-gtfs-rt-cli is a command-line tool, use it with curl: curl 'http://localhost:3000/' -sf | print-gtfs-rt.
  • gtfs-rt-inspector is a web app that can inspect any CORS-enabled GTFS-RT feed; Paste http://localhost:3000/ into the url field to inspect yours.

After monitor.js has fetched some data from HAFAS, and after match.js has matched it against the GTFS (or failed or timed out doing so), you should see TripUpdates & VehiclePositions.

Usage

metrics

All three components (monitor-hafas, match-with-gtfs, serve-as-gtfs-rt) expose Prometheus-compatible metrics via HTTP. You can fetch and process them using e.g. Prometheus, VictoriaMetrics or the Grafana Agent.

As an example, we're going to inspect monitor-hafas's metrics. Enable them by running it with an METRICS_SERVER_PORT=9323 environment variable and query its metrics via HTTP:

curl 'http://localhost:9323/metrics'
# HELP nats_streaming_sent_total nr. of messages published to NATS streaming
# TYPE nats_streaming_sent_total counter
nats_streaming_sent_total{channel="movements"} 1673
nats_streaming_sent_total{channel="trips"} 1162

# HELP hafas_reqs_total nr. of HAFAS requests
# TYPE hafas_reqs_total counter
hafas_reqs_total{call="radar"} 12
hafas_reqs_total{call="trip"} 1165

# HELP hafas_response_time_seconds HAFAS response time
# TYPE hafas_response_time_seconds summary
hafas_response_time_seconds{quantile="0.05",call="radar"} 1.0396666666666665
hafas_response_time_seconds{quantile="0.5",call="radar"} 3.8535000000000004
hafas_response_time_seconds{quantile="0.95",call="radar"} 6.833
hafas_response_time_seconds_sum{call="radar"} 338.22600000000006
hafas_response_time_seconds_count{call="radar"} 90
hafas_response_time_seconds{quantile="0.05",call="trip"} 2.4385
# …

# HELP tiles_fetched_total nr. of tiles fetched from HAFAS
# TYPE tiles_fetched_total counter
tiles_fetched_total 2

# HELP movements_fetched_total nr. of movements fetched from HAFAS
# TYPE movements_fetched_total counter
movements_fetched_total 362

# HELP fetch_all_movements_total how often all movements have been fetched
# TYPE fetch_all_movements_total counter
fetch_all_movements_total 1

# HELP fetch_all_movements_duration_seconds time that fetching all movements currently takes
# TYPE fetch_all_movements_duration_seconds gauge
fetch_all_movements_duration_seconds 2.4

health check

serve-as-gtfs-rt exposes a health check that checks if there are any recent entities in the feed.

# healthy
curl 'http://localhost:3000/health' -I
# HTTP/1.1 200 OK
#

# not healthy
curl 'http://localhost:3000/health' -I
# HTTP/1.1 503 Service Unavailable
#

on-demand mode

Optionally, you can run your GTFS-RT feed in a demand-responsive mode, where it will only fetch data from HAFAS as long someone requests the GTFS-RT feed, which effectively reduces the long-term nr. of requests to HAFAS.

To understand how this works, remember that

  • movements fetched from HAFAS are formatted as GTFS-RT VehiclePositions.
  • trips fetched from HAFAS are formatted as GTFS-RT TripUpdates.
  • the whole monitor-hafas, match-with-gtfs & serve-as-gtfs-rt setup works like a streaming pipeline.

The on-demand mode works like this:

  • monitor-hafas is either just fetching movements (if you configured it to fetch only trips on demand) or completely idle (if you configured it to fetch both movements & trips on demand) by default.
  • monitor-hafas also subscribes to a demand NATS Streaming channel, which serves as a communication channel for serve-as-gtfs-rt to signal demand.
  • When the GTFS-RT feed is requested via HTTP,
    1. serve-as-gtfs-rt serves the current feed (which contains either VehiclePositionss only, or no entities whatsoever, depending on the on-demand configuration).
    2. serve-as-gtfs-rt signals demand via the demand channel.
    3. Upon receiving a demand signal, monitor-hafas will start fetching trips – or both movements & trips, depending on the on-demand configuration.

This means that, after a first request(s) for the GTFS-RT feed signalling demand, it will take a bit of time until all data is served with subsequent GTFS-RT feed requests; As long as there is constant for the feed, the on-demand mode will behave as if it isn't turned on.

Tell serve-as-gtfs-rt to signal demand via the --signal-demand option. You can then configure monitor-hafas's exact behaviour using the following options:

--movements-fetch-mode <mode>
    Control when movements are fetched from HAFAS.
    "on-demand":
        Only fetch movements from HAFAS when the `serve-as-gtfs-rt` component
        has signalled demand. Trips won't be fetched continuously anymore.
    "continuously" (default):
        Always fetch movements.
--movements-demand-duration <milliseconds>
    With `--movements-fetch-mode "on-demand"`, when the `serve-as-gtfs-rt` component
    has signalled demand, for how long shall movements be fetched?
    Default: movements fetching interval (60s by default) * 5
--trips-fetch-mode <mode>
    Control when trips are fetched from HAFAS.
    "never":
        Never fetch a movement's respective trip.
    "on-demand":
        Only fetch movements' respective trips from HAFAS when the `serve-as-gtfs-rt`
        component has signalled demand.
    "continuously" (default):
        Always fetch each movement's respective trip.
--trips-demand-duration <milliseconds>
    With `--trips-fetch-mode "on-demand"`, when the `serve-as-gtfs-rt` component
    has signalled demand, for how long shall trips be fetched?
    Default: movements fetching interval (60s by default) * 2

controlling the number of requests to HAFAS

Currently, there is no mechanism to influence the total rate of requests to HAFAS directly, no prioritisation between the "find trips in a bounding box" (hafas-client's radar()) and "refresh a trip" (hafas-client's trip()) requests, and no logic to efficiently use requests up to a certain configured limit.

However, there are some dials to influence the amount requests of both types:

  • By defining a smaller or larger bounding box via the BBOX environment variable, you can control the total number of monitored trips, and thus the rate of requests.
  • By setting FETCH_TILES_INTERVAL, you can choose how often the bounding box (or the vehicles within, rather) shall be refreshed, and subsequently how often each trip will be fetched if you have configured that. Note that if a refresh takes longer to than the configured interval, another refresh will follow right after, but the total rate of radar() requests to HAFAS will be lower.
  • You can throttle the total number of requests to HAFAS by throttling hafas-client, but depending on the rate you configure, this might cause the refresh of all monitored trips (as well as finding new trips to monitor) to take longer than configured using FETCH_TRIPS_INTERVAL, so consider it as a secondary tool.

exposing feed metadata

If you pass metadata about the GTFS-Static feed used, serve-as-gtfs-rt will expose it via HTTP:

serve-as-gtfs-rt \
	--feed-info path/to/gtfs/files/feed_info.txt \
	--feed-url https://data.ndovloket.nl/flixbus/flixbus-eu.zip

curl 'http://localhost:3000/feed_info.csv'
# feed_publisher_name,feed_publisher_url,feed_lang,feed_start_date,feed_end_date,feed_version
# openOV,http://openov.nl,en,20210108,20210221,20210108

curl 'http://localhost:3000/feed_info.csv' -I
# HTTP/1.1 302 Found
# location: https://data.ndovloket.nl/flixbus/flixbus-eu.zip

Related projects

There are several projects making use of hafas-gtfs-rt-server.

License

This project is dual-licensed: My contributions are licensed under the Prosperity Public License, contributions of other people are licensed as Apache 2.0.

This license allows you to use and share this software for noncommercial purposes for free and to try this software for commercial purposes for thirty days.

Personal use for research, experiment, and testing for the benefit of public knowledge, personal study, private entertainment, hobby projects, amateur pursuits, or religious observance, without any anticipated commercial application, doesn’t count as use for a commercial purpose.

Get in touch with me to buy a commercial license or read more about why I sell private licenses for my projects.

Contributing

By contributing, you agree to release your modifications under the Apache 2.0 license.