Consensus-Full-Nodes - resources usage issue #2935

tty47 · 2023-12-14T07:03:25Z

Summary of Bug

Hello team! 👋

I want to report to you an issue that we are facing with the consensus-full-nodes, we have been facing issues when trying to run the nodes with less than 20GB of RAM, by the time the nodes have to sync the chain (for example in mocha) the nodes cannot sync with less than this amount of resources. Even if we set the maximum to 20GB they are trying to get all the resources of the server until they get OOM and crash.
This happens if the nodes have to sync either from scratch or since some days ago.

I consider, that they should work with the resources that they have, even if that means that it's going to take longer to sync, but crashing because of that, might be an issue.

I assume that the nodes should work even with 8GB, as we have here

cc: @celestiaorg/devops

Version

v1.3.0

Steps to Reproduce

Start a consensus-full-node and connect it to an existing chain (mocha for example), then, it will have to sync to catch up.

For Admin Use

Not duplicate issue
Appropriate labels applied
Appropriate contributors tagged
Contributor assigned/self-assigned

The text was updated successfully, but these errors were encountered:

rootulp · 2023-12-14T16:47:29Z

Thanks for the issue! A few questions:

Based on the screenshot: do you have any idea on why this is happening for consensus-full-snapshot-0 and not for consensus-full-2 or consensus-full-3? How does conensus-full-snapshot-0 differ from the other consensus nodes? Perhaps a config change?
Does this repro on other chains besides mocha-4?

evan-forbes · 2024-01-04T19:57:20Z

I think we ran into this before, and becuase of all of the ibc memo spam on mocha, the tx-index will eat gobs of memory. Was the tx-index configured to be on @jrmanes ?

tty47 · 2024-01-05T09:26:08Z

hey guys!
sorry for the delay in my response, I just read your messages

Based on the screenshot: do you have any idea on why this is happening for consensus-full-snapshot-0 and not for consensus-full-2 or consensus-full-3? How does conensus-full-snapshot-0 differ from the other consensus nodes? Perhaps a config change?

This happens for every node, the image shows the consensus-full-snapshot-0 because was the one that we had to restart at this specific moment, but it doesn't matter the node, it's nothing in particular to this one.
We have seen it for the others as well.

Does this repro on other chains besides mocha-4?

We could see it also in Arabica, the problem of reproducing it in other chains is that it mostly happens when the chain is big in terms of data, so we cannot reproduce it easily in robusta for example

I think we ran into this before, and becuase of all of the ibc memo spam on mocha, the tx-index will eat gobs of memory. Was the tx-index configured to be on @jrmanes ?

I would say yes, we have it defined here, please let us know if there is something we can tweak

The main problem that I see guys, is that this kind of issue are hard to detect in Robusta, we need to have a chain with a lot of data already and connect a node to sync it, then, we will see this scenario.
Let us know if you see something we do from our side to help

rootulp · 2024-01-05T16:10:18Z

Based on the lines you linked, it looks like the tx indexer is enabled and set to kv which uses goleveldb. If the consensus nodes don't need to index transactions, we can remove those lines because the default is null which disables tx indexing. If we do need tx indexing, we can explore alternative db_backends and/or the psql option.

evan-forbes · 2024-07-15T09:27:11Z

if this is purely related to the kv indexer, can we perhaps close this issue and open a new one to improve the KV?

evan-forbes · 2024-07-24T11:42:18Z

we should be able to close this issue once we are able to run v2 in production, as that includes a new version of the kv that should remedy the massive amounts of memory used by the existing kv store

rootulp · 2024-07-24T13:54:47Z

Nina backported celestiaorg/celestia-core#1405 to celestia-core v1.38.0-tm-v0.34.29 which was released in celestia-app v1.13.0. Rachid bumped celestia-node to that release in this PR.

TLDR: we don't need to wait until celestia-app v2 is running in production. As soon as celestia-node cuts a release from main (likely v0.15.0), we can use the lightweight tx status work.

tty47 · 2024-07-26T08:02:38Z

hello!
should I then close this issue?

tty47 added the bug Something isn't working label Dec 14, 2023

github-actions bot added the external label Dec 14, 2023

MSevey mentioned this issue Jan 23, 2024

celestia node grabbing excessive RAM celestiaorg/celestia-node#3129

Closed

coderabbitai bot mentioned this issue May 7, 2024

docs: Increase LN disk recommendation to 100GB and bridge/full RAM to 16GB celestiaorg/docs#1564

Merged

evan-forbes added WS: Maintenance 🔧 includes bugs, refactors, flakes, and tech debt etc needs:triage labels May 17, 2024

evan-forbes removed external labels May 27, 2024

ninabarbakadze added the needs:grooming items needs a status change, to be closed, or clearer AC. Item should be discussed next sync. label May 28, 2024

evan-forbes added priority:high optional label to track the relative priority of planned items and removed needs:grooming items needs a status change, to be closed, or clearer AC. Item should be discussed next sync. labels Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consensus-Full-Nodes - resources usage issue #2935

Consensus-Full-Nodes - resources usage issue #2935

tty47 commented Dec 14, 2023

rootulp commented Dec 14, 2023

evan-forbes commented Jan 4, 2024

tty47 commented Jan 5, 2024

rootulp commented Jan 5, 2024

evan-forbes commented Jul 15, 2024

evan-forbes commented Jul 24, 2024

rootulp commented Jul 24, 2024

tty47 commented Jul 26, 2024

Consensus-Full-Nodes - resources usage issue #2935

Consensus-Full-Nodes - resources usage issue #2935

Comments

tty47 commented Dec 14, 2023

Summary of Bug

Version

Steps to Reproduce

For Admin Use

rootulp commented Dec 14, 2023

evan-forbes commented Jan 4, 2024

tty47 commented Jan 5, 2024

rootulp commented Jan 5, 2024

evan-forbes commented Jul 15, 2024

evan-forbes commented Jul 24, 2024

rootulp commented Jul 24, 2024

tty47 commented Jul 26, 2024