-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database growth rate is too high #1790
Comments
There are a lot of items moving forward there. The major component responsible for the growth is the amount of ATX.
The current migration from 1.6 to 1.7 is a bit special because it had to migrate the whole DB. We don't expect soon to have a similar need. |
If I recall correctly there was a db migration not long ago - at 1.5. |
Yes db in 1.7 is bigger than 1.6 by about 10- GiB
It's currently stored in the most optimized format possible. We could add compression with something like zstd BUT that will make CPU requirements higher and would likely introduce latency. The major problem, as I said, is the amount of ATXes, which then leads to the amount of proposals. ATX merge will result in fewer proposals (by 1/3 likely) post-merge will reduce the number of ATXes. |
If it's zstd (level 3 to 5) cpu won't even notice - the disk IO will take much more time than zstd even if it's NVME. However compression only makes sense if data is big enough. A quick select showed that size is quite small: zstd -3 blob_dump.bin that's not enough to justify it. If the data is >50kb and it compresses to 50% or less on average then it might be worth it Still not good enough. One problem of storing it all into database is that index cost as much space as the column + row id size (8) = 40 bytes epoch 1 to 27 has ~55M rows which is ~3.7 GB It's not much but row id is there weather or not you use it, so you might take advantage of it. Well good luck! I hope you find some solution soon because it'll get out of hand. |
Yes moving hashes to ids is planned in go-spacemesh. It will make the DB a bit smaller but then it will grow again because of a new epoch etc. We have checked the zstd module for sqlite too, but sadly it does not help much (https://phiresky.github.io/blog/2022/sqlite-zstd/) The ultimate solution for that is POST merge, which is where we aim. Doing optimizations, as you mentioned, sadly, are not helping in the long run, as you noticed. POST merge should help tremendously with the ATX count and, therefore, have a bigger effect than any of the micro-optimizations, especially over the time. |
The database size grows too fast. With that speed database won't be able to fit in 1 TB SSD after 5 years. I saw a few optimizations done with the updates, but they reduce the size just a bit and within 1-2 epochs it catches up. Also the updates that need db migration adds one more problem. Let's say the database grows to 1 TB you'll need 2 TB to migrate when format is changed for some reason.
Optimizing storage format is only a temporary solution. The problem is growth speed. Because you are limited by the cpu how big the plot can be you must have multiple plots and each one will need 2 TB ssd for database.
Is there any plan how to fix this in the future?
The text was updated successfully, but these errors were encountered: