Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_id_breakdown is a bottleneck and needs rewriting #77

Open
ross-spencer opened this issue Jan 25, 2022 · 1 comment
Open

create_id_breakdown is a bottleneck and needs rewriting #77

ross-spencer opened this issue Jan 25, 2022 · 1 comment

Comments

@ross-spencer
Copy link
Member

ross-spencer commented Jan 25, 2022

Given an 8 million line SF YAML, (631,286 row database), create_id_breakdown is taking too long. It is largely unoptimized and not brilliantly written. Any rewrite I believe should bring pretty decent efficiency gains. Lets have a look at what we can do.

Edit: For reference, without this function alone, the script is quicker by over an hour, and completes in 77 seconds. There may be other bottlenecks along the way as much relies on the output here, but one step at a time.

NB. Rewrite could be focused on better sqlite queries which do not seem to be a bottleneck at all. Or it could be focused on improving the data structures we're using.

@ross-spencer
Copy link
Member Author

There's a spare index described here: exponential-decay/sqlitefid#9 that might be worth looking into for performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant