Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in v0.10.0 #52

Closed
davidmezzetti opened this issue Nov 15, 2024 · 11 comments
Closed

Performance regression in v0.10.0 #52

davidmezzetti opened this issue Nov 15, 2024 · 11 comments
Labels
bug Something isn't working performance

Comments

@davidmezzetti
Copy link
Contributor

Hello - Hope all is well.

I was building a new graph using the latest version the library (v0.10.0) and noticed a performance regression. It appears to be traced to this change: #46 (review).

With large graphs (100K+ nodes 1M+ edges) having to copy the graph takes quiet a while. Additionally, we would have to store a duplicate copy of the graph if we wanted to keep a persistent copy of the GrandCypher instance, which doubles the memory requirements.

Do we perhaps want to require a MultiDiGraph as input? Or only do the conversion if it's not already a MultiDiGraph?

@j6k4m8 @jackboyla

@jackboyla
Copy link
Contributor

jackboyla commented Nov 15, 2024

Hey @davidmezzetti , thanks for pointing this out. I think a loud warning would suffice here, before converting to MultiDiGraph (if it’s not already a MultiDiGraph).

We might also benefit from adding performance tests to the library so we know when things like this happen again. What do you think @j6k4m8 ?

@davidmezzetti
Copy link
Contributor Author

I agree that a warning could help.

In my case, queries went from 0.3s to 30-40s. I had thought it had to do with the graph being large but it was eventually traced to that line.

self._target_graph = target_graph
if not isinstance(self._target_graph, nx.MultiDiGraph):
  self._target_graph = nx.MultiDiGraph(target_graph)
  logger.warning("Converting graph to MultiDiGraph")

@davidmezzetti
Copy link
Contributor Author

Looks great! Thank you!

I now notice in the sample code I put out there that the logging message probably should be before the conversion, otherwise one has to wait to find out why it took so long.

@j6k4m8
Copy link
Member

j6k4m8 commented Nov 20, 2024

Ah that's a good point @davidmezzetti — want to make that change or want me to?

@davidmezzetti
Copy link
Contributor Author

Sorry, I lost track of this. I'll see if I can get around to making that change.

@j6k4m8
Copy link
Member

j6k4m8 commented Dec 4, 2024

I'm wondering if the solution to this problem is to just go in and add an abstraction layer on calls to multidigraph attributes so that we can run safely on non-multigraphs... Definitely bad to require an up-conversion that costs so much time...

@j6k4m8 j6k4m8 added bug Something isn't working performance labels Dec 4, 2024
@davidmezzetti
Copy link
Contributor Author

An abstraction layer seems like a good idea. That graph conversion isn't much faster for in-memory graphs.

@j6k4m8
Copy link
Member

j6k4m8 commented Dec 5, 2024

closed in #55!! :D

@j6k4m8 j6k4m8 closed this as completed Dec 5, 2024
@davidmezzetti
Copy link
Contributor Author

Thanks. I tested this on a large graph I have and it's much closer to the performance in v0.9. There is still a slight performance difference but it's much less noticeable.

@jackboyla
Copy link
Contributor

Glad to hear it’s faster, but I’m curious what exactly is making it slower than 0.9.0.
It may be the logic that filters for multiedges.

@davidmezzetti
Copy link
Contributor Author

If I can pinpoint anything concrete, I'll be sure to share. Right now it was just anecdotal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance
Projects
None yet
Development

No branches or pull requests

3 participants