-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sssom-py is much too slow #202
Comments
Here are my observations:
As noted,
command for documentation purposes: |
Thank you @hrshdhgd this is a great analysis! You can put it a bit on the backburner now and we get back to it later! |
Agree it is too slow, and also a good analysis! |
Writing write_owl(msdf1,f) for large msdfs takes a huge amount of time (100MB tsv ~ > 2 hours) |
This should be basically instant but takes 30 minutes. Maybe bypass linkml for certain operations? |
Addresses #202 - [x] Ran `poetry update` - [x] Call `_get_sssom_schema_object()` once in the function `get_dict_from_mapping()` rather than multiple times in a for loop that is inefficient. - [x] Instead of `pandas.iterrows()` use `pandas.apply()` in `_get_mapping_set_from_df()` - [x] Use dict/list comprehensions instead of for loops - [x] Use sets instead of lists where lookups are done and sequence of elements don't matter. - [x] Improve `SchemaView` object instantiation and persistence - [x] Use `@cached_property` thank you @cthoyt --------- Co-authored-by: Charles Tapley Hoyt <cthoyt@gmail.com> Co-authored-by: Nico Matentzoglu <nicolas.matentzoglu@gmail.com>
Closing this for now in favor of #462 . Feel free to re-open a new issue with exact location of latency improvement needed. |
We need to figure out why that is first of all, i.e. which functions are so inefficient, and then working on improving efficiency. First goal:
The text was updated successfully, but these errors were encountered: