-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQLite WAL File for Events Growing Until Lotus Restart #12089
Comments
I spent some time tinkering with this because I've been running the db since nv22 and my WAL is now over 200G which is definitely not normal. Unfortunately I couldn't figure out a straightforward solution and get it solved! But I do think this is urgent and we should spend some time figuring this out because it likely impacts performance, aside from just being a pain in the backside for node operators. My understanding and some of the things I've tried:
Possibly some weird interaction with one of our other PRAMAs in there. Debugging this is obviously quite difficult; trial and error and time. Unless someone with more insight sees something obvious there. Will assign this to myself since i have a good working setup and started looking into this. But @snissn if you have a brilliant idea then I'm all ears! |
* fix unclosed multi-row query * tune options to limit wal growth Ref: #12089
Tinkering here #12090 |
* fix: events: sqlite db improvements * fix unclosed multi-row query * tune options to limit wal growth Ref: #12089 * fix: events: use correct context for CollectEvents transaction * fix: events: close prepared read statement * fix: events: close initial query; handle lint failures
* fix: events: sqlite db improvements * fix unclosed multi-row query * tune options to limit wal growth Ref: filecoin-project#12089 * fix: events: use correct context for CollectEvents transaction * fix: events: close prepared read statement * fix: events: close initial query; handle lint failures
* fix: events: sqlite db improvements * fix unclosed multi-row query * tune options to limit wal growth Ref: filecoin-project#12089 * fix: events: use correct context for CollectEvents transaction * fix: events: close prepared read statement * fix: events: close initial query; handle lint failures
* fix: events: sqlite db improvements * fix unclosed multi-row query * tune options to limit wal growth Ref: #12089 * fix: events: use correct context for CollectEvents transaction * fix: events: close prepared read statement * fix: events: close initial query; handle lint failures
* fix: events: sqlite db improvements * fix unclosed multi-row query * tune options to limit wal growth Ref: filecoin-project#12089 * fix: events: use correct context for CollectEvents transaction * fix: events: close prepared read statement * fix: events: close initial query; handle lint failures
* fix: ci: do not use deprecated --debug goreleaser flag (#12086) * chore: deals: remove forgotten graphsync references (#12084) * chore: types: remove more items forgotten after markets (#12095) * chore: cleanup: remove more items forgotten after markets * .gz somehow reappeared after #11625 * fix: ETH RPC API: ETH Call should use the parent state root of the subsequent tipset (#11905) * fix eth call * tests * changes as per review * changes as per review * Update node/impl/full/eth.go Co-authored-by: Rod Vagg <rod@vagg.org> * fix as per review --------- Co-authored-by: Rod Vagg <rod@vagg.org> * Update changelog to RC2 Update changelog to RC2 * Make gen / make docsgen-cli Make gen / make docsgen-cli * chore: api: the Net API/CLI now remains only on daemon The only part of this repository that does lp2p is now lotus-daemon Remove the CommonNet type, used exclusively bu the CLI stack Adjust the rest of struct-memebership to match what went where End result best seen in diff of `documentation/en/api-v0-methods-miner.md` * Update changelog Update changelog * fix: events: sqlite db improvements (#12090) * fix: events: sqlite db improvements * fix unclosed multi-row query * tune options to limit wal growth Ref: #12089 * fix: events: use correct context for CollectEvents transaction * fix: events: close prepared read statement * fix: events: close initial query; handle lint failures * Update CHANGELOG.md --------- Co-authored-by: Piotr Galar <piotr.galar@gmail.com> Co-authored-by: Peter Rabbitson <ribasushi@protocol.ai> Co-authored-by: Aarsh Shah <aarshkshah1992@gmail.com> Co-authored-by: Rod Vagg <rod@vagg.org> Co-authored-by: Peter Rabbitson <ribasushi@leporine.io>
* release: v1.26.3 (#11908) (#11915) * deps: update dependencies to address migration memory bloat to address memory concerns during a heavy migration Ref: filecoin-project/go-state-types#260 Ref: whyrusleeping/cbor-gen#96 Ref: filecoin-project/go-amt-ipld#90 * release: prep v1.26.3 patch Prep v1.26.3 patch release: - Update changelog, version and make gen + make docsgen-cli * deps: update cbor-gen to tagged version deps: update cbor-gen to tagged version * deps: update go-state-types to tagged version deps: update go-state-types to tagged version v0.13.2 * chore: deps: update go-state-types to v0.13.3 Fixes a panic when we have fewer than 1k proposals. --------- Co-authored-by: Rod Vagg <rod@vagg.org> Co-authored-by: Steven Allen <steven@stebalien.com> * build: release: v1.27.0-rc1 (#11947) * chore: Set version as v1.27.0-rc1 Set version as v1.27.0-rc1, run make gen & make docsgen-cli * Update changelog Update changelog * Update changelog Update changelog based on feedback * Bump pubsub-dep Bump pubsub-dep * Prep v1.27.0-rc2 Prep v1.27.0-rc2 * Typo fixes, and more changelog updates Typo fixes, and more changelog updates * chore: remove unmaintained bootstrappers (#11983) * chore: remove unmaintained bootstrappers chore: remove unmaintained bootstrappers * Update mainnet.pi fixing typoed domain fixing typo for 1475.io 'bootstarp' -> 'bootstrap' * Update mainnet.pi apparently the actual hostname is typoed. so bootstarp it is. --------- Co-authored-by: smagdali <stefan@fil.org> * chore: update go-data-transfer and go-graphsync * add ETH addrs API to Gateway (#11979) * fix: copy Flags field from SectorOnChainInfo Fixes: #11962 * feat: libp2p: Lotus stream cleanup (#11993) * set stream deadlines in Lotus * reduce timeout * whitelist bootstrappers * fix tests * Update changelog and version Update changelog and version * ci: deprecate circle ci in favour of github actions (#11786) * Update changelog Update changelog with the deprecate circle-ci * chore: update drand (#12021) * Update changelog / make docsgen Update changelog / make docsgen * chore: lint: update golangci lint config * remove and replace some linters * remove some exclusions * make all exclusions more explicit matches * chore: lint: fix lint errors with new linting config Ref: #11967 * chore: lint: address feedback from reviews * doc: eth: restore comment lost in linter cleanup Ref: #11968 * chore: libp2p: update to v0.34.1 (#12027) * update libp2p to v0.34.0 * fix libp2p err * fix imports * update go mod * update go mod * Update changelog Update changelog * go mod tidy go mod tidy * revert go version change (#12050) * Update changelog Update changelog * chore: backport #12054 to release/v1.27.0 branch (#12056) * chore: pin golanglint-ci to v1.58.2 (#12054) Fixes: #12044 * Add backport to changelog Add backport to changelog --------- Co-authored-by: Rod Vagg <rod@vagg.org> * Bump version - make gen/make docsgen Bump version - make gen/make docsgen * Update changelog Update changelog * Bump NodeBuildVersion to v1.27.1-rc1 Bump NodeBuildVersion to v1.27.1-rc1 * Add Lotus-Miner / Curio related changes Add Lotus-Miner / Curio related changes * Update date and upgrade warnings Update date and upgrade warnings * fix: ci: do not use deprecated --debug goreleaser flag (#12086) * chore: deals: remove forgotten graphsync references (#12084) * chore: types: remove more items forgotten after markets (#12095) * chore: cleanup: remove more items forgotten after markets * .gz somehow reappeared after #11625 * fix: ETH RPC API: ETH Call should use the parent state root of the subsequent tipset (#11905) * fix eth call * tests * changes as per review * changes as per review * Update node/impl/full/eth.go Co-authored-by: Rod Vagg <rod@vagg.org> * fix as per review --------- Co-authored-by: Rod Vagg <rod@vagg.org> * Update changelog to RC2 Update changelog to RC2 * Make gen / make docsgen-cli Make gen / make docsgen-cli * chore: api: the Net API/CLI now remains only on daemon The only part of this repository that does lp2p is now lotus-daemon Remove the CommonNet type, used exclusively bu the CLI stack Adjust the rest of struct-memebership to match what went where End result best seen in diff of `documentation/en/api-v0-methods-miner.md` * Update changelog Update changelog * fix: events: sqlite db improvements (#12090) * fix: events: sqlite db improvements * fix unclosed multi-row query * tune options to limit wal growth Ref: #12089 * fix: events: use correct context for CollectEvents transaction * fix: events: close prepared read statement * fix: events: close initial query; handle lint failures * Update CHANGELOG.md * build: release: v1.27.1-rc2 (#12101) * fix: ci: do not use deprecated --debug goreleaser flag (#12086) * chore: deals: remove forgotten graphsync references (#12084) * chore: types: remove more items forgotten after markets (#12095) * chore: cleanup: remove more items forgotten after markets * .gz somehow reappeared after #11625 * fix: ETH RPC API: ETH Call should use the parent state root of the subsequent tipset (#11905) * fix eth call * tests * changes as per review * changes as per review * Update node/impl/full/eth.go Co-authored-by: Rod Vagg <rod@vagg.org> * fix as per review --------- Co-authored-by: Rod Vagg <rod@vagg.org> * Update changelog to RC2 Update changelog to RC2 * Make gen / make docsgen-cli Make gen / make docsgen-cli * chore: api: the Net API/CLI now remains only on daemon The only part of this repository that does lp2p is now lotus-daemon Remove the CommonNet type, used exclusively bu the CLI stack Adjust the rest of struct-memebership to match what went where End result best seen in diff of `documentation/en/api-v0-methods-miner.md` * Update changelog Update changelog * fix: events: sqlite db improvements (#12090) * fix: events: sqlite db improvements * fix unclosed multi-row query * tune options to limit wal growth Ref: #12089 * fix: events: use correct context for CollectEvents transaction * fix: events: close prepared read statement * fix: events: close initial query; handle lint failures * Update CHANGELOG.md --------- Co-authored-by: Piotr Galar <piotr.galar@gmail.com> Co-authored-by: Peter Rabbitson <ribasushi@protocol.ai> Co-authored-by: Aarsh Shah <aarshkshah1992@gmail.com> Co-authored-by: Rod Vagg <rod@vagg.org> Co-authored-by: Peter Rabbitson <ribasushi@leporine.io> * small fix in changelog * fix: release: update goreleaser config file Fixes: #12120 * fix go releaser and test with rc3 * Update CHANGELOG.md * lotus v1.27.1 prep * address review - resolve one more conflicts - revert 2 new line added * doc: events: note events db migration impact --------- Co-authored-by: Phi-rjan <orjan.roren@gmail.com> Co-authored-by: Rod Vagg <rod@vagg.org> Co-authored-by: Steven Allen <steven@stebalien.com> Co-authored-by: smagdali <stefan@fil.org> Co-authored-by: Aarsh Shah <aarshkshah1992@gmail.com> Co-authored-by: Piotr Galar <piotr.galar@gmail.com> Co-authored-by: Peter Rabbitson <ribasushi@protocol.ai> Co-authored-by: Peter Rabbitson <ribasushi@leporine.io>
Description:
The SQLite Write-Ahead Logging (WAL) file for storing events in the Lotus client continues to grow indefinitely until the client is restarted. This can lead to excessive disk usage and potential performance degradation. It is not the state file but a write ahead log that should be periodically reset. Restarting lotus clears the file.
events.db-wal
was 14GBs from running over a < 24 hour period and was deleted and reset after restarting lotus. It continues to grow from 0 bytes after restarting lotusSteps to Reproduce:
Expected Behavior:
The SQLite WAL file should be periodically checkpointed and truncated to prevent uncontrolled growth.
Environment:
Additional Information:
Implementing a mechanism to manually trigger checkpoints or reviewing the configuration settings related to SQLite checkpointing may help mitigate this issue. The following config parameter should force the WAL to be vacuumed automatically https://sqlite.org/pragma.html#pragma_wal_autocheckpoint. Also as helpful information a query can be sent directly to sqlite to check point via
PRAGMA wal_checkpoint(FULL);
Possible Solutions:
PRAGMA wal_checkpoint(FULL);
The text was updated successfully, but these errors were encountered: