Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Instrumentation to Trip Segmentation and Dwell Segmentation #1007

Closed
wants to merge 1 commit into from

Conversation

TeachMeTW
Copy link
Contributor

@TeachMeTW TeachMeTW commented Dec 30, 2024

Overview

This PR introduces enhanced instrumentation to key components of the trip segmentation process, enabling more granular performance analysis. Specifically, the following changes were made:

  • Instrumented create_places_and_trips and handle_trip_ended functions.
  • Collected and analyzed updated performance metrics for both top-performing (Top 20%) and lesser-performing (Bottom 80%) operations.

Previous Data (based on ccebikes prod)

Top 20%

data.name,average_reading
TRIP_SEGMENTATION/create_places_and_trips,6.11492777755484
TRIP_SEGMENTATION/segment_into_trips_time/has_trip_ended,5.203754482825234
TRIP_SEGMENTATION/segment_into_trips_dist/has_trip_ended,0.33176563871859305
TRIP_SEGMENTATION/get_data_df,0.13479221123112303

Bottom 80%

data.name,average_reading
TRIP_SEGMENTATION/segment_into_trips_dist/get_filtered_points_df,0.1277177627025931
TRIP_SEGMENTATION/segment_into_trips_dist/get_transition_df,0.12171991904162699
TRIP_SEGMENTATION/segment_into_trips_time/get_filtered_points_pre_ts_diff_df,0.06102312830432008
TRIP_SEGMENTATION/segment_into_trips_time/get_transition_df,0.04896719856575752
TRIP_SEGMENTATION/segment_into_trips_dist/post_loop,0.009637999072394989
TRIP_SEGMENTATION/segment_into_trips_time/calculations_per_iteration,0.009030038257105083
TRIP_SEGMENTATION/get_time_range_for_segmentation,0.008812385345143932
TRIP_SEGMENTATION/segment_into_trips_dist/check_transitions_post_loop,0.0054819285690497896
TRIP_SEGMENTATION/handle_out_of_order_points,0.004389608453097595
TRIP_SEGMENTATION/segment_into_trips_time/post_loop,0.001586464757565409
TRIP_SEGMENTATION/segment_into_trips_time/filter_bogus_points,0.000983266447049876
TRIP_SEGMENTATION/segment_into_trips_dist/handle_trip_end,0.000659627289595929
TRIP_SEGMENTATION/segment_into_trips_dist/mark_valid,0.0003925704017833427
TRIP_SEGMENTATION/segment_into_trips_dist/get_last_trip_end_point,0.0003441714427687905
TRIP_SEGMENTATION/segment_into_trips_dist/continue_just_ended,0.00030632289442217953
TRIP_SEGMENTATION/get_filters_in_df,0.00028325290716931125
TRIP_SEGMENTATION/get_time_series,1.9164622159294e-05
TRIP_SEGMENTATION/create_dist_filter,4.974840094848555e-06
TRIP_SEGMENTATION/create_time_filter,4.684489195038672e-06
TRIP_SEGMENTATION/segment_into_trips_dist/set_new_trip_start_point,1.5871754537026088e-06

New Instrumented (Based on local open_access)

Top 20%

data.name,average_reading
TRIP_SEGMENTATION/get_combined_segmentation_points,289.41001524999956
TRIP_SEGMENTATION/segment_into_trips_dist/get_filtered_points_df,49.97330445059812
TRIP_SEGMENTATION/create_places_and_trips/process_segments,49.43672608266319
TRIP_SEGMENTATION/get_data_df,9.957696244051705

Bottom 80%

data.name,average_reading
TRIP_SEGMENTATION/segment_into_trips_time/has_trip_ended,0.008631514248452451
TRIP_SEGMENTATION/segment_into_trips_time/has_trip_ended/motion_and_gap_checks,0.0018074113038215288
TRIP_SEGMENTATION/create_places_and_trips/initialization,0.0017730383579070694
TRIP_SEGMENTATION/create_places_and_trips/insert_last_place,0.0007796396894767679
TRIP_SEGMENTATION/segment_into_trips_time/has_trip_ended/final_distance_checks,0.00021890854319204695
TRIP_SEGMENTATION/create_places_and_trips/log_segmentation_points,8.484674730508489e-05
TRIP_SEGMENTATION/segment_into_trips_time/has_trip_ended/initial_checks,2.015703723259139e-05

@TeachMeTW
Copy link
Contributor Author

Local Staging

Top 20%

data.name,average_reading
TRIP_SEGMENTATION/get_combined_segmentation_points,244.59239484766667
TRIP_SEGMENTATION/segment_into_trips_dist/get_filtered_points_df,1.3070380361337925
TRIP_SEGMENTATION/create_places_and_trips/process_segments,0.956691232625004
TRIP_SEGMENTATION/segment_into_trips_dist/get_transition_df,0.6485740562857245
TRIP_SEGMENTATION/get_data_df,0.5813163733474915

Bottom 80%

data.name,average_reading
TRIP_SEGMENTATION/segment_into_trips_time/has_trip_ended,0.04396951262024141
TRIP_SEGMENTATION/segment_into_trips_time/has_trip_ended/motion_and_gap_checks,0.029915469854730245
TRIP_SEGMENTATION/handle_out_of_order_points,0.02109187234964338
TRIP_SEGMENTATION/segment_into_trips_dist/has_trip_ended,0.01407521516264799
TRIP_SEGMENTATION/segment_into_trips_time/has_trip_ended/tracking_restart_check,0.010045099816818958
TRIP_SEGMENTATION/segment_into_trips_dist/post_loop,0.008481668595777592
TRIP_SEGMENTATION/get_time_range_for_segmentation,0.007696437147387769
TRIP_SEGMENTATION/segment_into_trips_dist/check_transitions_post_loop,0.00346295049530454
TRIP_SEGMENTATION/create_places_and_trips/process_segments/handle_untracked,0.0031489393157869906
TRIP_SEGMENTATION/get_filters_in_df,0.0019108789019810501
TRIP_SEGMENTATION/create_places_and_trips/initialization,0.0011224947499962241
TRIP_SEGMENTATION/segment_into_trips_dist/handle_trip_end,0.00076395126471166
TRIP_SEGMENTATION/create_places_and_trips/insert_last_place,0.0006669287500017754
TRIP_SEGMENTATION/segment_into_trips_dist/continue_just_ended,0.0005316707959574592
TRIP_SEGMENTATION/segment_into_trips_dist/mark_valid,0.00048122658699867316
TRIP_SEGMENTATION/segment_into_trips_dist/get_last_trip_end_point,0.00035555957177610593
TRIP_SEGMENTATION/segment_into_trips_time/has_trip_ended/final_distance_checks,0.00025557633905966227
TRIP_SEGMENTATION/create_places_and_trips/log_segmentation_points,6.484904166289181e-05
TRIP_SEGMENTATION/segment_into_trips_time/has_trip_ended/initial_checks,2.116130941705208e-05
TRIP_SEGMENTATION/get_time_series,1.9063383660977706e-05
TRIP_SEGMENTATION/create_time_filter,6.11470386502333e-06
TRIP_SEGMENTATION/create_dist_filter,3.921490133507177e-06
TRIP_SEGMENTATION/segment_into_trips_dist/set_new_trip_start_point,1.380765752401203e-06

It seems that get_combined_segmentation_points is a big pain point -- however, to note this wasn't present on production runs possibly because of this block:

        # Common case - let's make it easy
        with ect.Timer() as t_segment_trips:
            segmentation_points = filter_methods[filters_in_df[0]].segment_into_trips(ts, time_query)
        esds.store_pipeline_time(user_id, ecwp.PipelineStages.TRIP_SEGMENTATION.name + "/segment_into_trips", time.time(), t_segment_trips.elapsed)
    else:
        with ect.Timer() as t_get_combined_segmentation:
            segmentation_points = get_combined_segmentation_points(ts, loc_df, time_query,
                                                                   filters_in_df,
                                                                   filter_methods)
        esds.store_pipeline_time(user_id, ecwp.PipelineStages.TRIP_SEGMENTATION.name + "/get_combined_segmentation_points", time.time(), t_get_combined_segmentation.elapsed)

@shankari
Copy link
Contributor

@TeachMeTW @JGreenlee as you may recall, the plan now is to actually make the improvements instead of pushing finer-grained metrics to production. We can keep tweaking the instrumentation until the cows come home, but the goal of this project is not additional instrumentation - it is performance improvement.

Highlighting the steps from our last chat:

  • once we have identified the areas, we can load biggish datasets (the biggest that Robin can load locally) reset the pipeline for everybody and re-run
  • and verify that we are seeing the same areas as bottlenecks
  • these areas will likely be much faster locally but proportionally, they should be the slowest
  • and then you can use the logs to get finer grained instead of adding a bunch of blocks again
  • you can then make the changes locally, and use the logs to verify that the bottlenecks have been significantly improved
  • and then when they are deployed and merged, we should be able to see the changes on production
  • so I don't know if we need to add a lot of additional blocks

@TeachMeTW
Copy link
Contributor Author

@shankari I understand the goal is to focus on performance improvements rather than extending the instrumentation further. I shared the detailed analysis here to highlight the areas identified as bottlenecks and provide visibility into the instrumentation's findings. However, I agree that this might not be the best place for such detailed analysis.

I’ll move this analysis to the emission docs for better documentation and traceability -- and to not give the impression we are adding more instrumentation

@TeachMeTW
Copy link
Contributor Author

@TeachMeTW TeachMeTW closed this Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants