-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analysis of Emission Pipeline Top 20% and Bottom 80% #1098
Comments
From both mean and sum, the same functions show up in the top 20% and bottom 80% albeit a different order. I will go ahead and prune them from the pipeline. |
@TeachMeTW please put the results in graphs or inline tables so that we don't have to keep downloading files to see the results. Where are the scripts that you used to generate this split? Please link to them here. |
I also don't see any updates in here after including the time filter. The top 20% contains only the dist filter results. |
December Update
Previous Data (based on ccebikes prod)Top 20%
Bottom 80%
New Instrumented (Based on local open_access)Top 20%
Bottom 80%
Local StagingTop 20%
Bottom 80%
It seems that
|
Addressing Performance Bottlenecks and Instrumentation Insights1. Loading Big Datasets and Rerunning the PipelineI have identified 2. Verifying the Same Areas as BottlenecksThe bottlenecks observed locally are not the same as those identified in the production and staging environments. This discrepancy raises questions about whether the current instrumentation fully reflects server-side performance issues. 3. Relative Performance of Bottlenecks LocallyWhile the areas under scrutiny should theoretically remain proportional, the performance discrepancies suggest that local instrumentation may not provide an accurate proportional representation of bottlenecks seen in production or the server. 4. Using Logs for Granular RefinementThe new instrumentation does not seem to reflect server-side statistics accurately, leaving some uncertainty about next steps. Thoughts on this matter would be appreciated @shankari @JGreenlee |
For example, in the local runs, there is this new stat: Otherwise, the other areas of interet is things like |
Interesting findings. I took some time to digest this and I think we can conclude that right now, you should attempt to optimize these 2 blocks which were in the top 20% both locally and on prod:
As for Nonetheless, I think the first thing to do is optimize |
Introduces two distinct performance analysis methods to evaluate function-level metrics within our emission dataset. The objective is to identify which functions significantly impact performance and which do not, enabling targeted optimizations and improvements.
Analysis Types
1. Individual Entry Categorization
data.reading
entry into Top 20% or Bottom 80% based on the 80th percentile within eachdata.name
group.Features
TRIP_SEGMENTATION/segment_into_trips
TRIP_SEGMENTATION/segment_into_trips_dist/loop
data.reading
for easy identification of high-impact entries.2. Aggregated Entry Categorization
data.reading
metrics (both sum and mean) for eachdata.name
and categorizes the aggregated values into Top 20% and Bottom 80% based on their respective 80th percentiles.Features
data.reading
per function.data.reading
per function.data.reading
.bottom80_function_level_individual_sorted.csv
bottom80_function_level_mean_sorted.csv
bottom80_function_level_sum_sorted.csv
top20_function_level_individual_sorted.csv
top20_function_level_mean_sorted.csv
top20_function_level_sum_sorted.csv
The text was updated successfully, but these errors were encountered: