Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sequencer): rewrite check_tx to be more efficient and fix regression #1515

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Lilyjjo
Copy link
Contributor

@Lilyjjo Lilyjjo commented Sep 18, 2024

Summary

Previously handle_check_tx()'s processing of CheckTx::Recheck would always run all stateless checks and attempt insertion. This PR modified the function to only run the checks on transactions that have not been removed or are not contained inside of the app side mempool.

This PR additionally adds a HashSet of transaction hashes that are contained in the app's Mempool to the state of the app Mempool. This is used in the scenario where the sequencer restarts-- CometBFT persists transactions but our app mempool does not. The cache is used to allow transactions to re-enter the mempool on restarts.

This PR also restores the original behavior of CheckTX to not kick out transaction needlessly on rechecks, this regression was added when upgrading the mempool in #1323.

Background

A lot of the functionality from the original handle_check_tx() has moved into the app side mempool itself, including the balance and nonce checks. All of the remaining checks will not change between rechecks and only need to be ran when considering a transaction for insertion into the appside mempool.

Testing

Ran locally and added unit tests to ensure future nonces can be added and rechecks do not remove transactions needlessly.

Metrics

  • Deleted check_tx_removed_stale_nonce
  • Added check_tx_duration_seconds_check_tracked, mempool_tx_logic_error
  • Changed check_tx_duration_seconds_check_nonce to check_tx_duration_seconds_fetch_nonce

Related Issues

closes #1481

@github-actions github-actions bot added the sequencer pertaining to the astria-sequencer crate label Sep 18, 2024
@Lilyjjo Lilyjjo changed the title separate out checks sequencer, mempool: rewrite check_tx to be more efficient Sep 18, 2024
@Lilyjjo Lilyjjo changed the title sequencer, mempool: rewrite check_tx to be more efficient chore: sequencer, mempool: rewrite check_tx to be more efficient Sep 18, 2024
@Lilyjjo Lilyjjo changed the title chore: sequencer, mempool: rewrite check_tx to be more efficient chore: sequencer: rewrite check_tx to be more efficient Sep 18, 2024
@Lilyjjo Lilyjjo changed the title chore: sequencer: rewrite check_tx to be more efficient fix(sequencer): rewrite check_tx to be more efficient and fix regression Sep 18, 2024
@Lilyjjo Lilyjjo force-pushed the lilyjjo/check-tx-rewrite branch 3 times, most recently from 577d0c4 to 0de80ec Compare September 24, 2024 18:19
@Lilyjjo Lilyjjo marked this pull request as ready for review September 24, 2024 19:17
@Lilyjjo Lilyjjo requested a review from a team as a code owner September 24, 2024 19:17
parked.add(
timemarked_tx,
match parked.add(
timemarked_tx.clone(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you can do let id = timemarked_tx.id() before this, then use that later on in the function and remove the clone()

Comment on lines 153 to 155
if let Some(rsp) = insert_into_mempool(mempool, &state, signed_tx, metrics).await {
return rsp;
}
Copy link
Collaborator

@noot noot Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function returning an Option that's only set if there's an error response is somewhat confusing, would prefer for this function to return a Result<(), response::CheckTx> so the error variant is the error response

same for check_removed_comet_bft

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it, feels much better that way

Comment on lines 259 to 238
/// Returns a [`response::CheckTx`] if the transaction fails any of the checks.
/// Otherwise, it returns the [`SignedTransaction`] to be inserted into the mempool.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use a Result<SignedTransaction, response::CheckTx> as this function's return

@noot
Copy link
Collaborator

noot commented Sep 26, 2024

I think this PR looks good, just a few smaller comments!

This PR additionally adds a HashSet of transaction hashes that are contained in the app's Mempool to the state of the app Mempool. This is used in the scenario where the sequencer restarts-- CometBFT persists transactions but our app mempool does not. The cache is used to allow transactions to re-enter the mempool on restarts.

how does this work? with cometbft recheck?

@Lilyjjo
Copy link
Contributor Author

Lilyjjo commented Sep 26, 2024

This PR additionally adds a HashSet of transaction hashes that are contained in the app's Mempool to the state of the app Mempool. This is used in the scenario where the sequencer restarts-- CometBFT persists transactions but our app mempool does not. The cache is used to allow transactions to re-enter the mempool on restarts.

how does this work? with cometbft recheck?

Hmmm. I think I misheard why we wanted to allow rechecks to add to the appside mempool. I checked out the CometBFT code and didn't see it persisting the transactions anywhere that was accessible, (I wonder if I missed it). In the scenario where CometBFT doesn't persist, then the allowing recheck would only be useful if CometBFT doesn't crash if the sequencer app crashses. I'll do some more research

crates/astria-sequencer/src/mempool/mod.rs Outdated Show resolved Hide resolved
crates/astria-sequencer/src/mempool/mod.rs Outdated Show resolved Hide resolved
@@ -220,6 +223,10 @@ impl Mempool {
);
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just above here and in run_maintenance, we do parked.find_promotables which potentially removes multiple txs from parked. We then try to do pending.add for each of these, but if that add call returns an error, we should remove that tx from self.contained_txs since it's lost from the mempool I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. added similar logic to the error cases in maintenance too. i'll add a metric for these synchronization errors and ask Ido to setup an alert for it

crates/astria-sequencer/src/mempool/mod.rs Show resolved Hide resolved
crates/astria-sequencer/src/service/mempool/mod.rs Outdated Show resolved Hide resolved
crates/astria-sequencer/src/service/mempool/mod.rs Outdated Show resolved Hide resolved
crates/astria-sequencer/src/service/mempool/mod.rs Outdated Show resolved Hide resolved
crates/astria-sequencer/src/service/mempool/mod.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sequencer pertaining to the astria-sequencer crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sequencer/mempool: CheckTX fails all rechecks
3 participants