Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

genjidb vs sqlite? #116

Open
breezewish opened this issue Feb 28, 2022 · 2 comments
Open

genjidb vs sqlite? #116

breezewish opened this issue Feb 28, 2022 · 2 comments

Comments

@breezewish
Copy link
Member

The storage engine is evolved as the follows:

Use a KV that supports ZSTD to achieve max compression → Use genjidb for easier access over that KV engine

However, as we are now actually not using ZSTD for block-compressing, but compressing at a per-profile level, the genjidb + badger is not the only choice any more. For example, as the most widely deployed database engine, Sqlite may be a better choice.

@crazycs520
Copy link
Collaborator

crazycs520 commented Feb 28, 2022

I do a simple test to compare the genjidb and sqlite, here is the result:

write & read performance

genjidb total_write_size: 1301.977 MB, cost: 2.286s
genjidb total_read_size: 1301.977 MB, cost: 474.471211ms

sqlite total_write_size: 1301.977 MB, cost: 7.697s
sqlite total_read_size: 1301.977 MB, cost: 955.562619ms

# sqlite with batch commit when write data
sqlite total_write_size: 1301.977 MB, cost: 3.824s
sqlite total_read_size: 1301.977 MB, cost: 1.115740125s

genjidb is a little bit faster than sqlite.

data compression

▶ du -h profile-data-file   --original profile data directory
1.3G    profile-data-file

▶ du -h /tmp/badger  -- genjidb (badger) data directory
1.3G    /tmp/badger

▶ du -h foo.db  --sqlite.
1.3G    foo.db

@breezewish
Copy link
Member Author

@crazycs520 Doing a benchmark is a good attempt, however it does not help answering the problem I raised:

  1. The workload you are testing with is not the real world workload. Conprof never continuously write bulk profiling data. For example, there is no difference when writing takes 2s or 20s, as the Conprof only writes at a 1 minute interval.

  2. Even with the real world workload, the duration metric is trivial when other important aspects are not considered. For example, I can implement a simple db that beats genjidb and sqlite with a 0.0001s write latency totally, by simply performing the write to a memory buffer. The following questions should be at least checked:

  • What is the crash assurance for the genjidb? What will happen when you performed a write and then the power is lost? Will there be data lost or corruption?
  • How is the memory consumption?
  • How they performs when this process is running for long time?
  • How is the stability?
  1. There are also other aspects needs to be evaluated when comparing different solutions. Named a few:
  • Code quality
  • Feature sets
  • The behavior for our future possible workloads
  • etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants