Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clear_over(priority) and clear_under(priority) #35

Open
simbleau opened this issue Nov 24, 2021 · 3 comments
Open

clear_over(priority) and clear_under(priority) #35

simbleau opened this issue Nov 24, 2021 · 3 comments

Comments

@simbleau
Copy link

Implement two methods : clear_under() and clear_over() that accept a priority as an argument.

This will essentially truncate the size of the data structure, marking all items over/under this priority as trash/nonexistent.

This would be an O(log(n)) complex operation, as It would take a maximum of log(n) time to sort the structure, and thus, find the upper bound and lower bound of this priority. It would be O(1) to mark the rest of the dataset as trash. (e.g. an iter would return a slice of the min element to this new priority)

This would be a great convenience as it would give this data structure the ability to greatly speed up pruning operations. Currently pruning is O(n) at best, as you can only remove one element at a time. (It is possible to prune the entire set in O(1) using clear but that is besides my point)

@garro95
Copy link
Owner

garro95 commented Nov 25, 2021

It is an interesting proposal.

Even though I am not sure the complexity you found in correct, I agree that probably the operation would be faster if done internally.

I am not sure of the complexity of O(log(N)) because even if finding the cutting elements is O(log(N)), the elements must be removed from the datastructure's internal map and correctly dropped, that is O(N) worst case. In the case of clear_over() then, there is an additional operation involved to restore the heap property, that is at least O(log(N)), but I am not sure yet.

@mlynch
Copy link

mlynch commented Jun 6, 2022

This would be very helpful for my use case. I'm looking to keep a list of the top N files on disk by file size, but I want to keep memory consumption reasonable so I only want to keep N items in the pq at any one time.

@garro95
Copy link
Owner

garro95 commented Jun 6, 2022

I did not implement this API because I was not able to find an efficient way to prune the heap. If you know a good algorithm to do that, I am open to suggestions.
For your purpose though, you could use the DoublePriorityQueue, check its len() after the insertion, and if greater then N call a pop_min().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants