Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization of some functions #462

Merged
merged 40 commits into from
Nov 20, 2023
Merged

Optimization of some functions #462

merged 40 commits into from
Nov 20, 2023

Conversation

hrshdhgd
Copy link
Contributor

@hrshdhgd hrshdhgd commented Nov 13, 2023

Addresses #202

  • Ran poetry update
  • Call _get_sssom_schema_object() once in the function get_dict_from_mapping() rather than multiple times in a for loop that is inefficient.
  • Instead of pandas.iterrows() use pandas.apply() in _get_mapping_set_from_df()
  • Use dict/list comprehensions instead of for loops
  • Use sets instead of lists where lookups are done and sequence of elements don't matter.
  • Improve SchemaView object instantiation and persistence
    • Use @cached_property thank you @cthoyt

@hrshdhgd hrshdhgd marked this pull request as draft November 13, 2023 20:50
src/sssom/util.py Outdated Show resolved Hide resolved
src/sssom/constants.py Outdated Show resolved Hide resolved
@hrshdhgd hrshdhgd marked this pull request as ready for review November 14, 2023 17:12
@hrshdhgd hrshdhgd requested a review from matentzn November 14, 2023 17:12
@hrshdhgd hrshdhgd changed the title Optimization for some functions Optimization of some functions Nov 14, 2023
@hrshdhgd hrshdhgd requested a review from cthoyt November 14, 2023 17:16
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hugely important PR, thanks for that @hrshdhgd.

I would feel better if @cthoyt would also take a look. I cant mentally reason over most of the code changes as my python is a bit rusty. But it looks great to me!

src/sssom/constants.py Show resolved Hide resolved
src/sssom/parsers.py Show resolved Hide resolved
src/sssom/parsers.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
@@ -345,3 +357,76 @@ def test_msdf_from_mappings(self):
self.assertEqual(1, len(new_msdf.df.index))
self.assertEqual(rows[0], tuple(msdf.df.iloc[0]))
self.assertEqual(new_msdf.metadata, msdf.metadata)

def test_get_dict_from_mapping(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this tests many many different things - it's not helpful if we ever need to debug or understand what happened. Instead, it's preferable to make many small, minimal tests for each part of the function that's being tested

tests/test_utils.py Outdated Show resolved Hide resolved
tests/test_parsers.py Outdated Show resolved Hide resolved
# join all the values into a string separated by '|'
map_dict[property] = "|".join(enum_value for enum_value in mapping_property)
elif is_enum:
# FIXME needs test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs test

if isinstance(mapping_property, list):
# If the property is an enumeration and it allows multiple values
if is_enum and slot_of_interest["multivalued"]:
# FIXME needs test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs test

src/sssom/util.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, and the performance gains are phenomenal! Thank you @hrshdhgd!

@hrshdhgd hrshdhgd merged commit 20a00be into master Nov 20, 2023
6 checks passed
@hrshdhgd hrshdhgd deleted the optimize branch November 20, 2023 15:38
@hrshdhgd hrshdhgd mentioned this pull request Dec 15, 2023
1 task
@joeflack4
Copy link
Collaborator

I glanced through quickly and couldn't figure out how things are running faster, but glad to hear it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants