Skip to content
sheldonabrown edited this page Jul 29, 2013 · 10 revisions

The Queue Infrastructure comprises a series of message queues with an external HTTP endpoint for receiving messages from buses. The message queues include:

  • A realtime queue that forwards the real-time messages to the Inference Engines
  • An inference queue that forwards inference messages to consumers such as the Transit Data Service
  • A time prediction queue that receives messages from an external source and forwards messages to the Transit Data Service.

Why a Queue?

Enterprise Queues:

  • scale effortlessly
  • are a high-performance option to using a database, especially for real-time data
  • provide a convenient model to decouple layers and components

At least three places in the architecture can benefit from the decoupling that a message-based pattern provides:

  1. Communication from the buses to the Inference Engine (Realtime Queue)
  2. Communication from the Inference Engine to the Transit Data Service (Inference and TDS Queues)
  3. Communication from the buses and Inference Engine to the Reporting/Archiving repository (Realtime and TDS Queues)

Diagrams

Data Flow

Queue Data Flow

The above diagram show the progress of the queue data throughout the system. The Inference Output Queue and TDS Input Queue are two physical ports representing one logical queue. The Inference Engines publish to the address of the queue broker, and the TDS instances subscribe to the queue broker, decoupling the Inference Engines from the TDS.

Hamburger

Hamburger

A convenient way to visualize the publish-subscribe model works at each component. A proposal to make the Inference Engine stateless and load/balanced is included in this Powerpoint PDF document.

Networking Diagram

Queue VPC

One possible implementation of the queue infrastructure using Amazon VPC. This diagram provides for secure communications between the cellular provider and the HTTP Queue Proxy component. This configuration has been partially tested but is not currently used in production.

Design

Four modules make up the core queue infrastructure for the Bus CIS Server, each providing a utility on top of ZeroMQ's Java bindings found on github.com

  • onebusaway-nyc-queue-subscriber - classes that aid in publishing and subscribing to the queue
  • onebusaway-nyc-queue-realtime - model classes shared by other modules
  • onebusaway-nyc-queue-broker - simple queue broker
  • onebusaway-nyc-queue-http-proxy - converts HTTP POSTs to queue messages

Each module will be explained in brief detail.

onebusaway-nyc-queue-subscriber

The module presents some classes that provide convenience functions abstracting communication details about the publisher-subscriber queue, they are core classes to the http-proxy and other components that publish and subscribe to as the vehicle-tracking-webapp

onebusaway-nyc-queue-realtime

This module exists to solve a circular dependency betwee the queue-http-proxy and the queue-subscriber. Its only class is the RealtimeEnvelope class, which is the JSON envelope the real-time TCIP-JSON bus data is placed inside.

  • RealtimeEnvelope - A wrapper around the CcLocationReport TCIP-JSON real-time message. This wrapper provides storage for (1) a synthetic Id (UUID) that allows this message to be tracked throughout the system ,(2) a timestamp (timeReceived) as to when the realtime message was received and hence when this wrapper was created, and (3) the realtime message itself (ccLocationReport)
onebusaway-nyc-queue-broker

This module provides an enterprise queue broker, that acts as a data sink (subscriber) for multiple publishers, and a data pump (publisher) for multiple subscribers. It solves the problem of having N publishers know about M subscribers.

  • SimpleBroker - A thin wrapper around ZeroMQ's Forwarder (Broker) appliance. The wrapper provides for parameters to configure this per environment, with sane defaults.
onebusaway-nyc-queue-http-proxy

This module provides a proxy that takes the body of an HTTP POST and converts it to a queue message. It assumes the message payload is JSON, which it alters slightly to wrap with a synthetic id and a timestamp.

  • BHSListenerServlet - the proxy itself, implemented as a Java Servlet

High Availability

Having a single broker at the center of the queue infrastructure provides a single point of failure. This is alleviated by providing primary and secondary queue servers. Both primary and secondary servers are up and read to transmit, the only distinction is that the primary is actively handling queue messages as it is the endpoint of the queue server DNS specified by the configuration service. The secondary server is a hot spare available for immediate failover.

Clone this wiki locally