Skip to content

Commit

Permalink
Switch BACK to swap_remove
Browse files Browse the repository at this point in the history
It doesn't duplicate, it re-orders elements.
  • Loading branch information
urschrei committed May 23, 2024
1 parent 024ed37 commit bfcd526
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ readme = "README.md"
license = "MIT OR Apache-2.0"
repository = "https://github.com/urschrei/cvmcount"

version = "0.1.2"
version = "0.1.3"
edition = "2021"

[dependencies]
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The `--help` option is available.
If you're thinking about using this library, you presumably know that it only provides an estimate (within the specified bounds), similar to something like HyperLogLog. You are trading accuracy for speed!

## Perf
Calculating the unique tokens in a [418K UTF-8 text file](https://www.gutenberg.org/ebooks/8492) takes 19.2 ms ± 0.3 ms on an M2 Pro
Calculating the unique tokens in a [418K UTF-8 text file](https://www.gutenberg.org/ebooks/8492) takes 18.6 ms ± 0.3 ms on an M2 Pro

## Implementation Details
This library strips punctuation from input tokens using a regex. I assume there is a small performance penalty, but it seems like a small price to pay for increased practicality.
2 changes: 1 addition & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ impl CVM {
// I think this will be faster than a hashset for practical sizes
// but I need some empirical data for this
if let Some(pos) = self.buf.iter().position(|x| *x == clean_word) {
self.buf.remove(pos);
self.buf.swap_remove(pos);
}
if self.rng.gen_bool(self.probability) {
self.buf.push(clean_word);
Expand Down

0 comments on commit bfcd526

Please sign in to comment.