-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port Covid Recovery Dash generation code to ingestor #112
base: main
Are you sure you want to change the base?
Conversation
@@ -99,6 +109,7 @@ def ingest_feed_to_dynamo( | |||
"lineId": total.line_id, | |||
"count": total.count, | |||
"serviceMinutes": total.service_minutes, | |||
"hasServiceExceptions": total.has_service_exceptions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "recovery dash" code needs to know, in general, whether the service levels on a given day include any service exceptions (usually additions/removals for holidays). It was easiest just to add this as a column to the ScheduledServiceDaily
table — the migration has already run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just a few questions
@@ -112,6 +123,7 @@ def ingest_feeds( | |||
force_rebuild_feeds: bool = False, | |||
): | |||
for feed in feeds: | |||
feed.use_compact_only() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this okay for all the s3 uploads? This is what mbta-performance will use every half hour
@@ -148,3 +149,9 @@ def store_landing_data(event): | |||
ridership_data = landing.get_ridership_data() | |||
landing.upload_to_s3(json.dumps(trip_metrics_data), json.dumps(ridership_data)) | |||
landing.clear_cache() | |||
|
|||
|
|||
# 9:00 UTC -> 4:30/5:30am ET every day (after GTFS and ridership have bene ingested) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# 9:00 UTC -> 4:30/5:30am ET every day (after GTFS and ridership have bene ingested) | |
# 9:00 UTC -> 4:30/5:30am ET every day (after GTFS and ridership have been ingested) |
res = {} | ||
if isinstance(key_getter, str): | ||
key_getter_as_str = key_getter | ||
key_getter = lambda dict: dict[key_getter_as_str] # noqa: E731 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of disabling each line, we can disable this rule repo wise if it's not helpful
This PR finally ports the Covid Recovery Dash generation code to the data ingestor so we can run it daily and feed the resulting JSON into a CRD-like page on the Data Dashboard. The new code is very similar to what the CRD has, with a few major changes:
Ridership
andScheduledServiceDaily
dynamo tables instead of reading from raw CSV files 🎉 Luckily most of that logic has long since been ported from CRD -> ingestor.dict[str, float]
format instead of daily values inlist[float]
form — this tends to be easier to work with on the Data Dashboard side.Going forward, the formats will continue to diverge, to remove covid-specific benchmarks and add by-mode totals, etc. But I've plugged this data into my local Covid Recovery Dash and confirmed that it's basically working.
The resulting JSON files are stored in the new
tm-service-ridership-dash
bucket (though surely a better name exists), indexed by date. It also always writes tolatest.json
for easy retrieval.