Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use only one rpc client at time #612

Merged
merged 31 commits into from
Jan 11, 2024
Merged

Use only one rpc client at time #612

merged 31 commits into from
Jan 11, 2024

Conversation

AndreiEres
Copy link
Collaborator

@AndreiEres AndreiEres commented Dec 11, 2023

Changes have been made to improve the stability of the subxt client (including the light client mode). It's done in two specific parts.

  • Updated subxt client implementation
  • Improved rpc node requests

Updated subxt client implementation

We used to spawn a client for a number of watched parachains or even more. Despite the memory and CPU usage it was a working approach until we decided to switch to a light client instead of a full RPC node. Spawning a light client for each parachain leads to overuse of resources and as a result - crashes the entire application.

What changed:

  • Subxt wrapper replaced by frontend-backend realization of executor.
  • Backend containing subxt client is initialized at startup.
  • Frontend can be cloned between services without duplicating the subxt client.
  • As a result, we have only one subxt client for the entire application.

Improved rpc node requests

Implementing the new subxt client revealed a hidden problem with the number of requests. One client was unable to process all requests at the same speed as many clients and became a bottleneck. After investigation it was found that the most common request is to check the HRMP channels of a given parachain (about 80%).

We used to fetch the ingress and egress channel index of each parachain and then check each channel. It could be optimized because most of the attempts were empty, i.e. not every open channel was used to send a message in a given block.

What changed:

  • To retreive the inbound channels, we moved from checking the inbound channel index to the HRMP channel digest. It shows which paras sent a message at the given block number for a given recipient. So we exclude empty attempts when fetching inbound channels.
  • We joined requests whenever possible instead of awaiting them in a loop.
  • In the meantime, we found a bug with incorrect channel id for outgoing messages: current para_id was set as receiver, but should be sender.

New implementation of HRMP channel requesting works well with a full RPC node, but light client can't hold that load. Since we currently only display HRMP metrics in CLI mode and don't send them to Prometheus, we decided to make them optional for a while. More work on improving HRMP fetching on the light client is coming.

Results

Processing of HRMP channels for one block, while watching all parachains in the Polkadot network.

Requests Time
Before 361 60.606s
After 180 0.518s

Memory usage, 5 minutes while watching all parachains in the Polkadot network.

Memory, MB Graph
Before 80.86 old
After, channels enabled 27.92 new
After, channels disabled 26.45 new_without_channels

@AndreiEres AndreiEres changed the title [WIP] Use only one rpc client at time Use only one rpc client at time Dec 12, 2023
Copy link
Collaborator

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Did you measure startup time and resource usage vs before this change ?

}

impl RequestExecutorError {
pub fn should_repeat(&self) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn should_repeat(&self) -> bool {
pub fn should_retry(&self) -> bool {


loop {
tokio::select! {
message = from_frontend.recv() => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't really need a select! here, you can just let message = from_frontend.recv().await

}
}

pub trait RequestExecutorNodes {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit slow to understand what does the String represent?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the rpc node url ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It's a bit odd way to allow pass different types of node urls as an argument.

) {
let client = match build_client(&url, api_client_mode, &self.retry).await {
Some(v) => v,
None => return,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we log an error here ?

@AndreiEres AndreiEres merged commit 102fcad into master Jan 11, 2024
6 checks passed
@AndreiEres AndreiEres deleted the AndreiEres/single-client branch January 11, 2024 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants