Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow implementation of sieve #9

Open
EMBradley opened this issue Jul 1, 2024 · 0 comments · May be fixed by #11
Open

Slow implementation of sieve #9

EMBradley opened this issue Jul 1, 2024 · 0 comments · May be fixed by #11

Comments

@EMBradley
Copy link

EMBradley commented Jul 1, 2024

The current implementation of Sieve doesn’t use the Wheel as much as it could, resulting in a lot of unnecessary heap rearrangements. These rearrangements are so expensive that the Sieve ends up being over 2x slower than TrialDivision. With the current approach of storing (composite, prime factor) pairs in the BinaryHeap, and incrementing the composite by 2 times the corresponding factor whenever we update the heap, we miss out on one of the best optimizations from The Genuine Sieve of Eratosthenes.

For example, if we seed the wheel with 2, 3, and 5, then we know that the Sieve will never consider any multiples of those primes. Once the Sieve has established that 7 is prime, our heap will contain the pair (49, 7), meaning that the next composite multiple of 7 we expect to see (that hasn’t already been filtered out) is 49. After checking 49 and seeing that it’s not prime (because it is already in the heap), the current implementation changes that entry in the heap to (63, 7) (since 63 = 49 + 2 * 7). The problem with this is that 63 is a multiple of 3, so it will never show up in the wheel seeded with 2, 3, and 5. As a result, when the Sieve looks at 67 (the first number greater than 63 that is not filtered out by the wheel), it will have to pop the pair (63, 7) from the heap and then insert (77, 7). This is bad for two reasons.

First, every change to a binary heap requires that it be rearranged to ensure that it still satisfies the heap property. Popping the top element requires one rearrangement, and inserting a new element requires another rearrangement. The Genuine Sieve of Eratosthenes acknowledges this, saying on page 7,

We provide deleteMinAndInsert because a heap-based implementation can support this operation with considerably less rearrangement than a deleteMin followed by an insert.

Rust’s BinaryHeap type supports modifying the top element without popping it first, via the peek_mut() method. Using peek_mut() to modify the top element has the same effect as popping from the heap and then inserting a new element, but peek_mut() only rearranges the heap once. Simply using peek_mut() instead of pop() then insert() cuts the total amount of heap rearrangements in half. In my testing, just swapping pop()-insert() out for peek_mut() more than doubled the throughput of the find method for Sieve.

Second, we don’t need to have the pair (63, 7) in the heap in the first place. The wheel already skips it because it’s a multiple of 3. By storing a modified version of the wheel type in the heap instead of (prime, composite) pairs, we can avoid ever looking at 63. This optimization is seen in The Genuine Sieve of Eratosthenes, page 7, with the line

insertprime p xs table = PQ.insert (p*p) (map (* p) xs) table

The xs here already excludes all multiples of the primes used to create the initial wheel, so that map (* p) xs gives the list of all the multiples of p that are not multiples of the primes used to create the wheel. Going back to the example of 7, with a wheel that skips multiples of 2, 3, and 5, this means that when the sieve sees that 7 is prime, it makes note of the fact that 7 * 7 = 49 is the next multiple of 7 we need to look out for, same as the current Rust implementation here. But the difference is that once we get past 49 and update the entry for 7 in the heap, the next multiple of 7 we’re looking for is 77, not 63, because we know that the wheel already filtered out 63. Modifying Sieve to implement this optimization results in another more-than-doubling of throughput over the version where only the peek_mut() optimization is implemented. The two optimizations combined give a 4-5x improvement over the current implementation.

These changes also make it more beneficial to use a larger wheel. Starting with the current implementation and changing it only to use a wheel that filters out multiples of 2, 3, 5, and 7, we get a mixed throughput improvement of 4-25%. If we implement the changes described above and then use the larger wheel, the throughput improvement is a larger and more consistent 30-45%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant