Public Temporal Streaming Data Service Framework
River View is a Public Temporal Streaming Data Service Framework (yes, that's a mouthful!). It provides a pluggable interface for users to expose temporal data streams in a time-boxed format that is easily query-able. It was built to provide a longer-lasting historical window for public data sources that provide only real-time data snapshots, especially for sensor data from public government services like weather, traffic, and geological data.
River View fetches data from user-defined Rivers at regular intervals, populating a local Redis database. This data is provided in a windowed format, so that data older than a certain configured age is lost. But the window should be large enough to provide enough historical data to potentially train machine intelligence models on the data patterns within it.
Watch this short video for a quick introduction to River View.
See online documentation at http://nupic-community.github.io/river-view/.
You must have a Redis instance available. The URL to the instance should be set in an environment variable called REDIS_URL
, something like:
export REDIS_URL=redis://127.0.0.1:6379
You may use authentication in the Redis URL string:
export REDIS_URL=redis://username:password@hostname:port
A River is a pluggable collection of public data Streams gathered from one or more origins and collected in a query-able temporary temporal pool. Rivers are declared within the rivers
directory, and consist of:
- a namespace, which is assumed based upon the directory name of the data source within the
rivers
directory - a YAML configuration file, containing:
- one or more external URLs where the data is collected, which are public and accessible without authentication
- the interval at which the data source will be queried
- when the data should expire
- a JavaScript parser module that is passed the body of an HTTP call to the aforementioned URL(s), which is expected to parse it and return a temporal object representation of the data.
Each River may produce one or many Streams of data, each collecting like data items over time. Each stream must have a unique ID, but all streams must use the same data schema (fields and meta data are defined at the River level).
For example, a city traffic data source may produce data streams for many traffic paths within the city, each identified with a unique stream ID. A US state water level data source might have unique sources for each water level sensor in the state, each with a unique stream ID.
All river streams must have a timestamp for each row of data. Other than that, they might have different primary types of data, as described below:
- spatial: integer or float values
- geospatial: latitude / longitude (floats)
- categorical: string values
The data streams will be presented differently, both in JSON and HTML, depending on the type specified in the config.yml
file.
Please see Creating a River in our wiki.
In addition to collecting and storing data from Rivers, a simple HTTP API for reading the data is also active on startup. It returns HTML, JSON, and (in some cases) CSV data for each River configured at startup.
URL | Description |
---|---|
`/index.[html | json]` |
`//props.[html | json]` |
`//keys.[html | json]` |
`///data.[html | json |
`///meta.[html | json]` |
OS X has some weird built in behaviors regarding the maximum number of open file descriptors. River-view
needs the system to handle around 1024 open descriptors to actually start up, so if you run into any sort of
file-can't-be-opened errors, check that you have an appropriate number of maximum open file descriptors by
running ulimit -n
. If this number is less than 1024, you'll need to update it.
sudo launchctl limit maxfiles 1024 unlimited
This updates the maximum number of open file descriptors your Mac will allow. This number is not persistant across reboots. To make it persistant add limit maxfiles 1024 unlimited
to /etc/launchd.conf
ulimit -n 1024
This updates the current shell you're in to be able to make use of all those file descriptors.