-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance #32
Comments
I +1 this issue. The dump-restoring process can be really slow using rump, especially for a large Redis. I was thinking about which parts can run concurrently and which parts should run together:
Maybe we could have:
Edit: Actually, looking at the code, both read and write are called within a goroutine, and so they signal each other via the message bus. So maybe just calling the dump and TTL within a goroutine that blocks before sending content to the message bus can be helpful. |
Thanks for the reports, we're looking into a performance regression introduced with the latest version, related to pipelining. We'll update the issue as soon as we have more details. |
@vinicyusmacedo What I meant was that you're scanning the next bit of work while you are processing items from the previous scans effectively interleaving them with the dumps. This became an apparent bottleneck when we were extracting keys by pattern where scans would return no results (or little), often many times in a row. @nixtrace I don't know about the regression since we've only started using rump in October but we've definitely seen an improvement with some of our changes (especially with our use case of filtering by pattern), i.e.:
We also have introduced a QoL change to allow passing PATTERN to scans, per our own requirements. I tried to keep these features separate for easy integration (and actually opened this issue with an intent of donating code). Here you can preview the above-mentioned changes, respectively: concurrency, adjusting count parameter and adjusting scan pattern. |
Report
First of all thanks for a nice and robust tool!
I thought it could use some performance improvements. Currently scanning and dumping data from bigger data sets takes quite a bit of time and there are a couple reasons for that:
When observing the reads and writes they happen almost serially. Making them concurrent would certainly improve the situation. By making the reads concurrent we should also be able to scan ahead (currently when scanning we're idling while fetching the next chunk of data).
Another thing is the default
COUNT
returned when scanning is 10. Increasing it is definitely a low-hanging fruit, but for obvious reasons it cannot be set too high and since it wouldn't make sense to hard-code an arbitrary value I feel that giving the user an option to increase theCOUNT
manually is the right thing to do.The text was updated successfully, but these errors were encountered: