You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation of Sieve doesn’t use the Wheel as much as it could, resulting in a lot of unnecessary heap rearrangements. These rearrangements are so expensive that the Sieve ends up being over 2x slower than TrialDivision. With the current approach of storing (composite, prime factor) pairs in the BinaryHeap, and incrementing the composite by 2 times the corresponding factor whenever we update the heap, we miss out on one of the best optimizations from The Genuine Sieve of Eratosthenes.
For example, if we seed the wheel with 2, 3, and 5, then we know that the Sieve will never consider any multiples of those primes. Once the Sieve has established that 7 is prime, our heap will contain the pair (49, 7), meaning that the next composite multiple of 7 we expect to see (that hasn’t already been filtered out) is 49. After checking 49 and seeing that it’s not prime (because it is already in the heap), the current implementation changes that entry in the heap to (63, 7) (since 63 = 49 + 2 * 7). The problem with this is that 63 is a multiple of 3, so it will never show up in the wheel seeded with 2, 3, and 5. As a result, when the Sieve looks at 67 (the first number greater than 63 that is not filtered out by the wheel), it will have to pop the pair (63, 7) from the heap and then insert (77, 7). This is bad for two reasons.
First, every change to a binary heap requires that it be rearranged to ensure that it still satisfies the heap property. Popping the top element requires one rearrangement, and inserting a new element requires another rearrangement. The Genuine Sieve of Eratosthenes acknowledges this, saying on page 7,
We provide deleteMinAndInsert because a heap-based implementation can support this operation with considerably less rearrangement than a deleteMin followed by an insert.
Rust’s BinaryHeap type supports modifying the top element without popping it first, via the peek_mut() method. Using peek_mut() to modify the top element has the same effect as popping from the heap and then inserting a new element, but peek_mut() only rearranges the heap once. Simply using peek_mut() instead of pop() then insert() cuts the total amount of heap rearrangements in half. In my testing, just swapping pop()-insert() out for peek_mut() more than doubled the throughput of the find method for Sieve.
Second, we don’t need to have the pair (63, 7) in the heap in the first place. The wheel already skips it because it’s a multiple of 3. By storing a modified version of the wheel type in the heap instead of (prime, composite) pairs, we can avoid ever looking at 63. This optimization is seen in The Genuine Sieve of Eratosthenes, page 7, with the line
The xs here already excludes all multiples of the primes used to create the initial wheel, so that map (* p) xs gives the list of all the multiples of p that are not multiples of the primes used to create the wheel. Going back to the example of 7, with a wheel that skips multiples of 2, 3, and 5, this means that when the sieve sees that 7 is prime, it makes note of the fact that 7 * 7 = 49 is the next multiple of 7 we need to look out for, same as the current Rust implementation here. But the difference is that once we get past 49 and update the entry for 7 in the heap, the next multiple of 7 we’re looking for is 77, not 63, because we know that the wheel already filtered out 63. Modifying Sieve to implement this optimization results in another more-than-doubling of throughput over the version where only the peek_mut() optimization is implemented. The two optimizations combined give a 4-5x improvement over the current implementation.
These changes also make it more beneficial to use a larger wheel. Starting with the current implementation and changing it only to use a wheel that filters out multiples of 2, 3, 5, and 7, we get a mixed throughput improvement of 4-25%. If we implement the changes described above and then use the larger wheel, the throughput improvement is a larger and more consistent 30-45%.
The text was updated successfully, but these errors were encountered:
The current implementation of
Sieve
doesn’t use theWheel
as much as it could, resulting in a lot of unnecessary heap rearrangements. These rearrangements are so expensive that theSieve
ends up being over 2x slower thanTrialDivision
. With the current approach of storing(composite, prime factor)
pairs in theBinaryHeap
, and incrementing the composite by 2 times the corresponding factor whenever we update the heap, we miss out on one of the best optimizations from The Genuine Sieve of Eratosthenes.For example, if we seed the wheel with 2, 3, and 5, then we know that the
Sieve
will never consider any multiples of those primes. Once theSieve
has established that 7 is prime, our heap will contain the pair(49, 7)
, meaning that the next composite multiple of 7 we expect to see (that hasn’t already been filtered out) is 49. After checking 49 and seeing that it’s not prime (because it is already in the heap), the current implementation changes that entry in the heap to(63, 7)
(since 63 = 49 + 2 * 7). The problem with this is that 63 is a multiple of 3, so it will never show up in the wheel seeded with 2, 3, and 5. As a result, when theSieve
looks at 67 (the first number greater than 63 that is not filtered out by the wheel), it will have to pop the pair(63, 7)
from the heap and then insert(77, 7)
. This is bad for two reasons.First, every change to a binary heap requires that it be rearranged to ensure that it still satisfies the heap property. Popping the top element requires one rearrangement, and inserting a new element requires another rearrangement. The Genuine Sieve of Eratosthenes acknowledges this, saying on page 7,
Rust’s
BinaryHeap
type supports modifying the top element without popping it first, via thepeek_mut()
method. Usingpeek_mut()
to modify the top element has the same effect as popping from the heap and then inserting a new element, butpeek_mut()
only rearranges the heap once. Simply usingpeek_mut()
instead ofpop()
theninsert()
cuts the total amount of heap rearrangements in half. In my testing, just swappingpop()
-insert()
out forpeek_mut()
more than doubled the throughput of thefind
method forSieve
.Second, we don’t need to have the pair
(63, 7)
in the heap in the first place. The wheel already skips it because it’s a multiple of 3. By storing a modified version of the wheel type in the heap instead of (prime, composite) pairs, we can avoid ever looking at 63. This optimization is seen in The Genuine Sieve of Eratosthenes, page 7, with the lineThe
xs
here already excludes all multiples of the primes used to create the initial wheel, so thatmap (* p) xs
gives the list of all the multiples ofp
that are not multiples of the primes used to create the wheel. Going back to the example of 7, with a wheel that skips multiples of 2, 3, and 5, this means that when the sieve sees that 7 is prime, it makes note of the fact that 7 * 7 = 49 is the next multiple of 7 we need to look out for, same as the current Rust implementation here. But the difference is that once we get past 49 and update the entry for 7 in the heap, the next multiple of 7 we’re looking for is 77, not 63, because we know that the wheel already filtered out 63. ModifyingSieve
to implement this optimization results in another more-than-doubling of throughput over the version where only thepeek_mut()
optimization is implemented. The two optimizations combined give a 4-5x improvement over the current implementation.These changes also make it more beneficial to use a larger wheel. Starting with the current implementation and changing it only to use a wheel that filters out multiples of 2, 3, 5, and 7, we get a mixed throughput improvement of 4-25%. If we implement the changes described above and then use the larger wheel, the throughput improvement is a larger and more consistent 30-45%.
The text was updated successfully, but these errors were encountered: