Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a runtime-creation-sized Bloom Filter #5

Merged
merged 9 commits into from
Jul 31, 2023

Conversation

matheus23
Copy link
Member

@matheus23 matheus23 commented Jul 24, 2023

Description

  • Adds a bloom filter implementation runtime_sized::BloomFilter
  • Splits out bloom filter implementations across modules
  • Implements rejection sampling in HashIndexIterator (has no effect if the size is a power of two)

Link to issue

Fixes #4

Type of change

  • New feature (non-breaking change that adds functionality)
  • Refactor (non-breaking change that updates existing functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Comments have been added/updated

Test plan (required)

  • Needs more tests. Consider this somewhat WIP.

@matheus23 matheus23 self-assigned this Jul 24, 2023
@codecov
Copy link

codecov bot commented Jul 24, 2023

Codecov Report

Merging #5 (9f19523) into main (ae5bda4) will increase coverage by 2.07%.
The diff coverage is 73.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main       #5      +/-   ##
==========================================
+ Coverage   71.42%   73.50%   +2.07%     
==========================================
  Files           2        5       +3     
  Lines          35      117      +82     
  Branches        6       15       +9     
==========================================
+ Hits           25       86      +61     
- Misses          7       21      +14     
- Partials        3       10       +7     
Files Changed Coverage Δ
deterministic-bloom/src/lib.rs 100.00% <ø> (+28.12%) ⬆️
deterministic-bloom/src/utils.rs 25.00% <0.00%> (-41.67%) ⬇️
deterministic-bloom/src/const_size.rs 65.21% <65.21%> (ø)
deterministic-bloom/src/runtime_size.rs 77.77% <77.77%> (ø)
deterministic-bloom/src/common.rs 82.50% <82.50%> (ø)

@matheus23
Copy link
Member Author

Argh, why does the benchmark bot have to ping Brooke? 😅

@matheus23
Copy link
Member Author

matheus23 commented Jul 25, 2023

I looked a bit into what's happening with performance, and honestly I just think it's some kind of optimization not kicking in in one case vs. another.

Here's a call graph cpu cycle graph from the state in this PR:
image

vs. the state before the PR:
image

As you can see it's basically doing the same thing except in one case, to set bits in the bitvec it uses deref_mut whereas in the other case it's using set.
For some reason the version with set is much faster than the version using deref_mut. Keep in mind in both cases this is using exactly the same code to set bits (the BitSlice::set method).

The thing that I mainly wanted to make sure was not the culprit are the changes in the HashIndexIterator::next function, but as you can see apparently that's using less time in the new version version before, so 🤷‍♂️

Copy link
Member

@appcypher appcypher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good to me.
Not a major issue but would Box<[u8]> better be represented as Bytes here?

@matheus23
Copy link
Member Author

@appcypher Yeah, valid question. Generally I didn't need any functionality from Bytes (e.g. don't need the sharing/refcounting), so I went with the most "barebones" way that works.
I actually kinda wanted to make BloomFilter { k_hashes: u32, bytes: [u8] } work, but it's difficult working with unsized types (e.g. missing a Clone implementation).

@matheus23 matheus23 merged commit fe077c7 into main Jul 31, 2023
10 of 11 checks passed
@matheus23 matheus23 deleted the matheus23/dynamic-bloom branch July 31, 2023 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support runtime / dynamic bloom sizing
2 participants