Skip to content

Commit

Permalink
DOC: NAV-170 - Review extended RAPTOR
Browse files Browse the repository at this point in the history
  • Loading branch information
munterfi committed Sep 21, 2024
1 parent c88b24a commit 9c31f06
Show file tree
Hide file tree
Showing 2 changed files with 63 additions and 62 deletions.
123 changes: 62 additions & 61 deletions Writerside/topics/implementation/extended-raptor.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Extended RAPTOR

The theoretical RAPTOR algorithm proposed by Delling et al. [REF], while innovative, lacks several practical features
The theoretical RAPTOR algorithm proposed by Delling et al. [4], while innovative, lacks several practical features
necessary for real-world applications. These missing elements include reverse time routing (i.e., routing backwards from
the arrival time and stop), key query configurations like setting a maximum walking distance between stops, defining
minimum transfer times, and limiting the number of allowable transfers. Additionally, the original implementation does
Expand All @@ -15,10 +15,10 @@ enhancing the algorithm’s usability for real-world transit planning.
Implementing latest departure routing, which calculates the latest possible departure from a departure stop given a
specified latest arrival time, did not introduce any new algorithmic concepts. However, achieving this in a way that
minimizes code duplication while maintaining readability presented significant challenges. The primary difficulty lay in
adapting the logic for reverse-time routing. In contrast to earliest arrival routing, where the source stop is always
the departure stop and the target stop is the arrival stop, these roles are reversed in latest departure routing. The
source stop becomes the arrival stop, and the target stop is now the departure stop, as routing progresses backward in
time.
adapting the logic for reverse-time routing. In contrast to the earliest arrival routing, where the source stop is
always the departure stop and the target stop is the arrival stop, these roles are reversed in latest departure routing.
The source stop becomes the arrival stop, and the target stop is now the departure stop, as routing progresses backward
in time.

This shift required changes to the variable naming conventions in the code. To avoid confusion, we adopted a generalized
terminology where the "source stop" refers to the stop from which the scanning process begins (regardless of the
Expand All @@ -36,17 +36,17 @@ the current routing direction (i.e., earliest arrival vs. latest departure) with
complexity. This approach was chosen over duplicating the entire algorithm with minor changes for each routing type, as
the latter would have made long-term maintenance significantly more difficult.

## Multiday
## Multi-day

The standard RAPTOR algorithm does not account for service days in a schedule, as it is primarily designed to scan
routes based solely on departure times in ascending order. Given that GTFS (General Transit Feed Specification)
schedules typically span a full year, directly chaining all departures in the stop times array would have resulted in an
excessively large array, severely impacting performance.
routes based solely on departure times in ascending order. Given that GTFS schedules typically span a full year,
directly chaining all departures in the stop times array would have resulted in an excessively large array, severely
impacting performance.

In the early iterations of our RAPTOR implementation, we opted to build RAPTOR data structures for a single service day
to address this issue. However, this approach introduced several limitations. For instance, a typical service day begins
at 5 AM and extends into the early hours of the next calendar day (around 1 AM or 5 AM, depending on the availability of
night services). As a result, routing requests for a local trip with a departure time of 12:00 AM might ideally be
night services). As a result, routing requests for a local trip with a departure time at 12:00 AM might ideally be
served by trips from the previous service day, but the algorithm would only display departures starting from 5 AM
onward. Similarly, long-distance trips departing later in the afternoon would not be fully accommodated within the same
service day, requiring the journey to extend into the next service day. This setup created gaps in service availability
Expand All @@ -57,9 +57,9 @@ schedule. This separation needed to be preserved in our design. To achieve this,
interface that could be injected into the RAPTOR implementation via dependency injection. This allowed the RAPTOR
algorithm to process schedule data without directly interacting with the GTFS implementation.

### TripMaskProvider
### Trip Mask Provider

In our solution, the RAPTOR algorithm queries the TripMaskProvider for trip masks corresponding to different service
In our solution, the RAPTOR algorithm queries the `TripMaskProvider` for trip masks corresponding to different service
days when handling a routing request. Typically, the algorithm requests trip masks for three days: the previous day, the
current day, and the next day. Once the trip masks are retrieved, the routing process can begin, ensuring RAPTOR
operates independently of how the schedule data is provided.
Expand All @@ -74,84 +74,81 @@ package GTFS {
package RAPTOR {
class RaptorRouter {
- tripMaskProvider : TripMaskProvider
+ routeEarliestArrival(...): List<Connection>
}
interface TripMaskProvider {
+ getTripMasksForDay(): DayTripMask
}
class DayTripMask {
-tripMask: Map<String, boolean[]>
- tripMask: Map<String, boolean[]>
}
RaptorRouter -- TripMaskProvider
TripMaskProvider --> DayTripMask : "provides"
RaptorRouter --> DayTripMask : "uses"
RaptorRouter o-- TripMaskProvider: has
TripMaskProvider --> DayTripMask: "provides"
RaptorRouter --> DayTripMask: "accesses"
}
package Service {
class GtfsTripMaskProvider
GtfsTripMaskProvider --> TripMaskProvider : "implements"
GtfsTripMaskProvider ..|> TripMaskProvider: "<<implements>>"
}
Service -- GTFS : "integrates with"
Service o-- GTFS: has
@enduml
```

The DayTripMasks provided by the TripMaskProvider consisted of a map, where the keys represented route IDs, and the
The `DayTripMask`s provided by the `TripMaskProvider` consisted of a map, where the keys represented route IDs, and the
values were boolean arrays that acted as masks. These arrays indicated which trips on a route were active on a given
day. During route scanning, the process would typically begin by scanning the DayTripMask for the previous day, then
day. During route scanning, the process would typically begin by scanning the day trip mask for the previous day, then
move to the current day's mask, and, if necessary, to the following day's mask if no suitable departure was found
earlier. In the route scanner code, an additional check was implemented to ensure that the trip at the specified offset
was active by verifying it against the DayTripMask.
was active by verifying it against the day trip mask.

However, these modifications had a notable impact on performance. Routing requests, which previously met the target, now
took around 250 milliseconds, falling short of the requirement for a response time of under 200 milliseconds. This
highlighted the need for further optimizations to meet performance expectations.

### New StopTimes Array layout and StopTimesProvider
### New Stop Times Array Layout and Stop Times Provider

To address the performance issues, several improvements were identified and implemented. These included optimizing how
departure times were accessed and restructuring the internal data to improve memory usage and scanning efficiency.

* **Easy Lookup Variables:** To reduce the number of stop time lookups, new variables were added to quickly determine if
a route had any departures after the first available departure time.
* **Memory Optimization:** The StopTimes array was changed from an array of StopTime[] objects to an array of int[] to
take advantage of memory locality. Since both arrival and departure times were integer values, this conversion allowed
for
more efficient memory access.
* **Pre-building Stop Times Array:** The stop times array was now pre-built before routing began, using the DayTripMask
to mask invalid stop times by setting them to Integer.MIN_VALUE. This eliminated the need for multiple lookups during
route scanning.

To implement these changes, an additional class called StopTimeProvider was introduced. It took the TripMaskProvider as
an injected dependency and was responsible for creating int[] stop time arrays for a given service day. These arrays
contained all trips for each route, sorted by departure time, with additional improvements:

* **Service Day-level Information:** At indices 0 and 1 of the stop times array, data was included about the earliest
departure and latest arrival for the entire service day. This allowed the algorithm to skip scanning the previous day
if, for instance, the routing request was for 8 AM and the previous day’s service ended at 5 AM.
* **Route-level Information:** Each route's partition in the array included two additional values at the start,
indicating the earliest departure and latest arrival for that route. This further optimized the process by allowing
the algorithm to avoid scanning trips on a route if it was inactive at the requested time.
* **Simplified Stop Times:** Each stop time now consisted of two integer values (arrival and departure times), replacing
the previous StopTime object, leading to faster data access.
* **Memory Optimization:** The stop times array was changed from an array of `StopTime[]` objects to an array of `int[]`
to take advantage of cache locality. Since both arrival and departure times were integer values, this conversion
allowed for more efficient memory access.
* **Pre-building Stop Times Array:** The stop times array is now pre-built before routing began, using
the `DayTripMask` to mask invalid stop times by setting them to `Integer.MIN_VALUE`. This eliminated the need for
multiple lookups during route scanning.

To implement these changes, an additional class called `StopTimeProvider` was introduced. It takes
the `TripMaskProvider` as an injected dependency and is responsible for creating `int[]` stop time arrays for a given
service day. These arrays contain all trips for each route, sorted by departure time, with additional improvements:

* **Service Day-level Information:** At indices 0 and 1 of the stop times array, data is included about the earliest
departure and latest arrival for the entire service day. This allows the algorithm to skip scanning the previous day
if, for instance, the routing request is for 8 AM and the previous day’s service ends at 5 AM.
* **Route-level Information:** Each route's partition in the array includes two additional values at the start,
indicating the earliest departure and latest arrival for that route. This further optimizes the process by allowing
the algorithm to avoid scanning trips on a route if it is inactive at the requested time.
* **Simplified Stop Times:** Each stop time now consists of two integer values (arrival and departure times), replacing
the previous `StopTime` object, leading to faster data access.

This new structure enabled the stop times array to serve multiple purposes while significantly improving routing
performancefrom 250 ms down to approximately 90 ms. However, accessing the correct information in the array became more
complex, as shown in the figure below.
performance, from 250 ms down to approximately 90 ms. However, accessing the correct information in the array became
more complex, as shown in the figure below.

![stop-times-array.png](stop-times-array.png){ width="750" }

## Accessibility and Bike Information

After implementing the multi-day logic for routing and adding functionality to the RAPTOR module to allow external
masking of trips using schedule information, it became straightforward to extend the QueryConfig values to accommodate
additional routing request parameters. These new parameters included options such as preferred travel modes (e.g., bus,
train, ship) and specific requirements like ensuring that all trips were wheelchair accessible or allowed bikes.
masking of trips using schedule information, it became straightforward to extend the `QueryConfig` values to accommodate
additional routing request parameters. These new parameters include options such as preferred travel modes (e.g., bus,
train, ship) and specific requirements like ensuring that all trips are wheelchair accessible or allow bikes.

Technically, nothing significant changed in the core logic of the RAPTOR algorithm. However, the stop time arrays now
became specific not only to the service day but also to the query configuration, meaning each set of preferences (e.g.,
Expand All @@ -160,18 +157,22 @@ stop time arrays that needed to be cached, but it allowed for greater flexibilit

## Caching

As expected, computing stop time arrays from schedule information using the StopTimeProvider and TripMaskProvider
introduces a significant overhead. Typically, this computation takes around 400–500 milliseconds per service day and
As expected, computing stop time arrays from schedule information using the `StopTimeProvider` and `TripMaskProvider`
introduces significant overhead. Typically, this computation takes around 400–500 milliseconds per service day and
query configuration, resulting in approximately 1200-1500 milliseconds for a multi-day routing request. To avoid
recalculating these arrays for each new request, caching was identified as an effective solution.

To address this, a Least Recently Used (LRU) eviction cache was implemented to store the computed stop time arrays. This
allowed frequently accessed stop time arrays to remain in memory, while less used ones were evicted to free up space.
To address this, a Least Recently Used (LRU) eviction cache was implemented to store the computed stop time arrays.
This allows frequently accessed stop time arrays to remain in memory, while less-used ones are evicted to free up space.

However, it's important to note that, for the Swiss GTFS schedule, the stop time arrays could be quite large - around 16
However, it's important to note that for the Swiss GTFS schedule, the stop time arrays can be quite largearound 16
million integer values, corresponding to a memory footprint of 64 MB per service day and query configuration
combination. This substantial memory requirement meant that the application would need to run on machines with
significant memory resources or, ideally, in a distributed system for a productive environment. In such a system, a
front-end layer would distribute routing requests cross a cluster of machines based on the available cached stop time
arrays, ensuring that the package could run efficiently in a production environment without excessive recalculations or
memory strain.
combination. This substantial memory requirement means that the application needs to run on machines with
significant memory resources or, ideally, in a distributed system for production. In such a system, a load balancer
would distribute routing requests from the front-end layer across replicas of the router based on the available cached
stop time arrays, ensuring that the application can run efficiently without excessive recalculations or memory strain.
Unfortunately, a setup like this sacrifices the stateless nature of the application. To account for this, the stop time
arrays and the GTFS schedule could be stored in a database, encapsulating the state within the database rather than the
application, allowing for stateless load balancing.

TODO: Maybe move to results and discussion?
2 changes: 1 addition & 1 deletion Writerside/topics/results-and-discussion/benchmarking.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Extended RAPTOR for Production

According to **NF-RO-M2** the benchmarking of the extended RAPTOR algorithm (supporting multiday connections, querying
According to **NF-RO-M2** the benchmarking of the extended RAPTOR algorithm (supporting multi-day connections, querying
by departure or arrival times, and allowing for custom query criteria such as transport modes, number of transfers,
maximum walking distance, minimum transfer time, accessibility, or the possibility of carrying a bicycle) was conducted
using the current GTFS data for the whole of Switzerland and the results where continuously to versioned files, which
Expand Down

0 comments on commit 9c31f06

Please sign in to comment.