genjidb vs sqlite? #116

breezewish · 2022-02-28T03:16:39Z

The storage engine is evolved as the follows:

Use a KV that supports ZSTD to achieve max compression → Use genjidb for easier access over that KV engine

However, as we are now actually not using ZSTD for block-compressing, but compressing at a per-profile level, the genjidb + badger is not the only choice any more. For example, as the most widely deployed database engine, Sqlite may be a better choice.

crazycs520 · 2022-02-28T03:34:45Z

I do a simple test to compare the genjidb and sqlite, here is the result:

write & read performance

genjidb total_write_size: 1301.977 MB, cost: 2.286s
genjidb total_read_size: 1301.977 MB, cost: 474.471211ms

sqlite total_write_size: 1301.977 MB, cost: 7.697s
sqlite total_read_size: 1301.977 MB, cost: 955.562619ms

# sqlite with batch commit when write data
sqlite total_write_size: 1301.977 MB, cost: 3.824s
sqlite total_read_size: 1301.977 MB, cost: 1.115740125s

genjidb is a little bit faster than sqlite.

data compression

▶ du -h profile-data-file   --original profile data directory
1.3G    profile-data-file

▶ du -h /tmp/badger  -- genjidb (badger) data directory
1.3G    /tmp/badger

▶ du -h foo.db  --sqlite.
1.3G    foo.db

breezewish · 2022-02-28T04:07:04Z

@crazycs520 Doing a benchmark is a good attempt, however it does not help answering the problem I raised:

The workload you are testing with is not the real world workload. Conprof never continuously write bulk profiling data. For example, there is no difference when writing takes 2s or 20s, as the Conprof only writes at a 1 minute interval.
Even with the real world workload, the duration metric is trivial when other important aspects are not considered. For example, I can implement a simple db that beats genjidb and sqlite with a 0.0001s write latency totally, by simply performing the write to a memory buffer. The following questions should be at least checked:

What is the crash assurance for the genjidb? What will happen when you performed a write and then the power is lost? Will there be data lost or corruption?
How is the memory consumption?
How they performs when this process is running for long time?
How is the stability?

There are also other aspects needs to be evaluated when comparing different solutions. Named a few:

Code quality
Feature sets
The behavior for our future possible workloads
etc.

mornyx mentioned this issue Sep 27, 2024

Abnormal memory usage #267

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

genjidb vs sqlite? #116

genjidb vs sqlite? #116

breezewish commented Feb 28, 2022

crazycs520 commented Feb 28, 2022 •

edited

Loading

breezewish commented Feb 28, 2022

genjidb vs sqlite? #116

genjidb vs sqlite? #116

Comments

breezewish commented Feb 28, 2022

crazycs520 commented Feb 28, 2022 • edited Loading

write & read performance

data compression

breezewish commented Feb 28, 2022

crazycs520 commented Feb 28, 2022 •

edited

Loading