Skip to content

Stage 2b: Output Data Format

Dan Bramich edited this page Jan 9, 2022 · 13 revisions

The data on the loop detector locations, and the corresponding (link) roads where they are located, are stored in FITS binary table files with names of the form:

<config.output_dir>/s2b.Loop.Detector.Locations/detectors.<DATA_SOURCE>.<COUNTRY>.<CITY>.fits

where DATA_SOURCE is one of 'LD.Flow.LD.Occupancy' or 'LD.Flow.LD.Occupancy.LD.Speed'. There are 25 files, with a total of 10,150 rows, each corresponding to a unique loop detector. In each file, the rows are sorted by LATITUDE. Loop detectors were rejected at stage 2b for the following reasons:

(i) The flow-occupancy empirical fundamental diagram exhibits two or more clearly defined and separate branches with different gradients in the free-flow regime (426 loop detectors rejected)

(ii) The flow-occupancy empirical fundamental diagram exhibits a horizontal branch of points (i.e. flow independent of occupancy) at very low (or zero) flow, separate from the free-flow branch, that covers a significant fraction of the occupancy range (e.g. more than a quarter; 267 loop detectors rejected)

(iii) The flow-occupancy empirical fundamental diagram shows a near-uniform random distribution of points with at most some vague structure discernable to the eye (57 loop detectors rejected)

(iv) The flow-occupancy empirical fundamental diagram shows clear straight-line features due to interpolation of bad data over too many time intervals (164 loop detectors rejected; specifically this reason alone explains the rejection of all loop detectors for strasbourg)

(v) The flow-occupancy empirical fundamental diagram exhibits one or more repeated versions of the main relationship that have been shifted to the right along the occupancy axis (293 loop detectors rejected)

(vi) The flow-occupancy empirical fundamental diagram exhibits one or more repeated versions of the main relationship that have been shifted up the flow axis (18 loop detectors rejected)

(vii) The data points in the flow-occupancy empirical fundamental diagram appear to trace out loops indicating very strong temporal correlations that are caused by interpolation over large data gaps (202 loop detectors rejected; specifically this reason alone explains the rejection of all loop detectors for losangeles)

The columns in each file are as follows:

DETECTOR_ID:

- STRING

- Loop detector ID

- Made up exclusively from characters in the set: {'_', '-', '+', '0', ..., '9', 'a', ..., 'z', 'A', ..., 'Z'}

- All values are unique

LONGITUDE:

- FLOAT64

- Longitude (deg; WGS84) of the loop detector

- All values are in the range -180.0 to 180.0 inclusive with no bad values

LATITUDE:

- FLOAT64

- Latitude (deg; WGS84) of the loop detector

- All values are in the range -90.0 to 90.0 inclusive with no bad values

- The values are sorted into ascending order

LENGTH:

- FLOAT64

- Length (km) of the (link) road on which the loop detector is located

- All values are positive with no bad values

POSITION:

- FLOAT64

- Loop detector location as a ratio of the distance from the downstream intersection to the length of the (link) road on which the loop detector is located

- All values are in the range 0.0 to 1.0 inclusive with no bad values

ROAD_CLASS:

- STRING

- Classification of the (link) road on which the loop detector is located (from Open Street Maps)

- All values are from the set {'living_street', 'motorway', 'motorway_link', 'primary', 'primary_link', 'residential', 'secondary', 'secondary_link', 'service', 'tertiary', 'tertiary_link', 'trunk', 'trunk_link', 'unclassified'} with no bad values

SPEED_LIMIT:

- FLOAT64

- Speed limit (km/h) of the (link) road on which the loop detector is located (from Open Street Maps)

- Good values are positive, while bad values are -1.0

NLANES:

- INT32

- Number of lanes covered by the loop detector

- All values are positive with no bad values

LINK_ID:

- INT32

- An ID number for the corresponding link road

- Good values are non-negative, while bad values are -1

LINK_PTS_LONGITUDE:

- FLOAT64 100-element vector

- Longitudes (deg; WGS84) of the points mapping out the corresponding link road

- All values are in the range -180.0 to 180.0 inclusive with no bad values

LINK_PTS_LATITUDE:

- FLOAT64 100-element vector

- Latitudes (deg; WGS84) of the points mapping out the corresponding link road

- All values are in the range -90.0 to 90.0 inclusive with no bad values

LINK_PTS_FLAG:

- INT32 100-element vector

- Flags indicating which points map out the corresponding link road

- All values are 0 (ignore) or 1 (point on the link road)

The data on the loop detector measurements (raw) are stored in FITS binary table files with names of the form:

<config.output_dir>/s2b.Loop.Detector.Measurements.Raw/<DATA_SOURCE>/<COUNTRY>/<CITY>/<DETECTOR_ID>/measurements.raw.<DATA_SOURCE>.<COUNTRY>.<CITY>.<DETECTOR_ID>.fits

where DATA_SOURCE is one of 'LD.Flow.LD.Occupancy' or 'LD.Flow.LD.Occupancy.LD.Speed'. There are 10,150 files, with a total of 168,183,675 rows (all unique), of which 147,034,646 rows correspond to good measurements. For a specific "CITY + DETECTOR_ID" (i.e. for a specific file), any rows with duplicated "DATE + INTERVAL_START" entries are flagged with "ERROR_FLAG = 1". In each file, the rows are sorted by DATE and then INTERVAL_START. The columns in each file are as follows:

DATE:

- STRING

- Date (YYYY-MM-DD) on which the measurement was taken (local time)

- All values are valid dates with no bad values

- The values are sorted into ascending order

INTERVAL_START:

- INT32

- Time at the start of the aggregation time-interval (seconds after midnight local time)

- All values are in the range 0 to 86400 inclusive with no bad values

- For any specific DATE value, the INTERVAL_START values are sorted into ascending order

FLOW:

- FLOAT64

- Vehicle count in the aggregation time-interval scaled to 1 hour (veh/hour; flow)

- Good values are non-negative, while bad values are -1.0 (flagged with ERROR_FLAG = 1)

OCCUPANCY:

- FLOAT64

- Fraction of time in the aggregation time-interval that the loop detector is occupied (occupancy)

- Good values are in the range 0.0 to 1.0 inclusive, while bad values are -1.0 (flagged with ERROR_FLAG = 1)

- N.B: Positive flow or speed measurements at zero occupancy are not flagged as errors

ERROR_FLAG:

- INT32

- Flag indicating an error

- All values are 0 (no error) or 1 (error)

SPEED:

- FLOAT64

- Mean vehicle speed (km/h) in the aggregation time-interval

- Good values are non-negative, while bad values are -1.0 (flagged with ERROR_FLAG = 1)

- This column is only present if DATA_SOURCE = 'LD.Flow.LD.Occupancy.LD.Speed'

The data on the loop detector measurements (ARIMA) are stored in FITS binary table files with names of the form:

<config.output_dir>/s2b.Loop.Detector.Measurements.ARIMA/<DATA_SOURCE>/<COUNTRY>/<CITY>/<DETECTOR_ID>/measurements.ARIMA.<DATA_SOURCE>.<COUNTRY>.<CITY>.<DETECTOR_ID>.fits

where DATA_SOURCE is 'LD.Flow.LD.Occupancy'. There are 9,328 files, with a total of 73,993,449 rows (all unique), of which 71,384,773 rows correspond to good measurements. For a specific "CITY + DETECTOR_ID" (i.e. for a specific file), any rows with duplicated "DATE + INTERVAL_START" entries are flagged with "ERROR_FLAG = 1". In each file, the rows are sorted by DATE and then INTERVAL_START. The columns in each file are as follows:

DATE:

- STRING

- Date (YYYY-MM-DD) on which the measurement was taken (local time)

- All values are valid dates with no bad values

- The values are sorted into ascending order

INTERVAL_START:

- INT32

- Time at the start of the aggregation time-interval (seconds after midnight local time)

- All values are in the range 0 to 86400 inclusive with no bad values

- For any specific DATE value, the INTERVAL_START values are sorted into ascending order

FLOW:

- FLOAT64

- Vehicle count in the aggregation time-interval scaled to 1 hour (veh/hour; flow)

- Good values are non-negative, while bad values are -1.0 (flagged with ERROR_FLAG = 1)

OCCUPANCY:

- FLOAT64

- Fraction of time in the aggregation time-interval that the loop detector is occupied (occupancy)

- Good values are in the range 0.0 to 1.0 inclusive, while bad values are -1.0 (flagged with ERROR_FLAG = 1)

- N.B: Positive flow measurements at zero occupancy are not flagged as errors

ERROR_FLAG:

- INT32

- Flag indicating an error

- All values are 0 (no error) or 1 (error)

ARIMA_FLOW:

- FLOAT64

- ARIMA-smoothed flow (veh/hour)

- Good values are non-negative, while bad values are -1.0 (flagged with ERROR_FLAG = 1)

ARIMA_OCCUPANCY:

- FLOAT64

- ARIMA-smoothed occupancy

- Good values are in the range 0.0 to 1.0 inclusive, while bad values are -1.0 (flagged with ERROR_FLAG = 1)

Stage 2b creates two summary statistics files:

<config.output_dir>/s2b.Loop.Detector.Locations/summary.statistics.of.detectors.per.city.txt

<config.output_dir>/s2b.Loop.Detector.Locations/summary.statistics.of.measurements.per.city.detector.txt