Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(sequencer-relayer)!: minimize resubmissions to Celestia #1234

Merged
merged 5 commits into from
Jul 25, 2024

Conversation

Fraser999
Copy link
Contributor

@Fraser999 Fraser999 commented Jul 2, 2024

Summary

The relayer has been updated to avoid resubmitting the same data to Celestia on restart or on a timeout of the BroadcastTx gRPC.

Background

Currently the relayer will likely resubmit the same data when it restarts, since the majority of the time spent in the submit loop is waiting for confirmation of the BlobTx having been stored. While in this state, the on-disk files aren't updated to indicate that we're waiting for confirmation. Hence if we restart, the new session begins by submitting all data from the last-confirmed point, i.e. the same data which was already likely successfully submitted in the final attempt of the previous session.

A similar situation happens if we timeout while waiting for the BroadcastTx response - the retry loop will attempt to resubmit the same data without first checking if the previous attempt succeeded.

Changes

  • Replaced the two state files (presubmit and postsubmit) with a single one (submission-state) and similarly for their respective env vars. See the updated spec for further details.
  • The submission module was heavily changed: there are now two primary state structs StartedSubmission and PreparedSubmission, between which the blob submitter toggles during normal operation. There is also FreshSubmission which is only relevant at startup, and an enum covering these three (SubmissionStateAtStartup) which is also only used during startup. Finally, the State enum is an ephemeral object only used to read/write the relevant state from/to disk.
  • BlobSubmitter::run was modified to try to confirm the last submission attempt from the previous session if the state file indicated the relayer exited while in prepared state. If that submission is confirmed (the most likely outcome), then the sequencer blocks in that submission are simply skipped in the write loop. (We could try to avoid even fetching these sequencer blocks, but that would be a significantly more pervasive change, and is probably not worth the complexity).
  • The relayer's celestia client was changed to split try_submit into try_prepare and try_submit so that the hash of the prepared BlobTx can be returned from try_prepare to be recorded in the state file before the transaction is broadcast to the Celestia app.
  • submit_with_retry was updated to check for a broadcast timeout error in the previous attempt, and in that case, attempts to just confirm that submission rather than automatically resubmitting the same data.

Testing

  • Unit tests for the new submission types.
  • Black box test later_height_in_state_leads_to_expected_relay was updated.
  • Manually observed expected behaviour against a locally-running sequencer and Celestia app.

Breaking Changelist

  • Removed ASTRIA_SEQUENCER_RELAYER_PRE_SUBMIT_PATH and ASTRIA_SEQUENCER_RELAYER_POST_SUBMIT_PATH config env vars.
  • Added ASTRIA_SEQUENCER_RELAYER_SUBMISSION_STATE_PATH config env var.

Related Issues

Closes #1200.

@Fraser999 Fraser999 requested review from a team as code owners July 2, 2024 22:22
@Fraser999 Fraser999 requested review from noot and joroshiba July 2, 2024 22:22
@github-actions github-actions bot added documentation Improvements or additions to documentation sequencer-relayer pertaining to the astria-sequencer-relayer crate cd labels Jul 2, 2024
Copy link
Member

@SuperFluffy SuperFluffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great and I don't see any outright blockers, just a bunch of requests for stylistic changes and tightening naming (to make the use of types and states clearer on a first read).

A big request I have is to not omit the use of context in error handling. While an author might have reason to avoid the context (for example, because ample information is provided in the called function), it's not at all clear for a reviewer. :)

Copy link
Contributor

@quasystaty1 quasystaty1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tested locally

@Fraser999 Fraser999 added this pull request to the merge queue Jul 25, 2024
Merged via the queue into main with commit 961294c Jul 25, 2024
42 checks passed
@Fraser999 Fraser999 deleted the fraser/minimize-celestia-resubmissions branch July 25, 2024 09:30
bharath-123 pushed a commit that referenced this pull request Jul 25, 2024
## Summary
The relayer has been updated to avoid resubmitting the same data to
Celestia on restart or on a timeout of the `BroadcastTx` gRPC.

## Background
Currently the relayer will likely resubmit the same data when it
restarts, since the majority of the time spent in the submit loop is
waiting for confirmation of the `BlobTx` having been stored. While in
this state, the on-disk files aren't updated to indicate that we're
waiting for confirmation. Hence if we restart, the new session begins by
submitting all data from the last-confirmed point, i.e. the same data
which was already likely successfully submitted in the final attempt of
the previous session.

A similar situation happens if we timeout while waiting for the
`BroadcastTx` response - the retry loop will attempt to resubmit the
same data without first checking if the previous attempt succeeded.

## Changes
* Replaced the two state files (presubmit and postsubmit) with a single
one (submission-state) and similarly for their respective env vars. See
[the updated
spec](https://github.com/astriaorg/astria/blob/c110bdbf7d0fd05fba04775d07d046c37bbd7372/specs/sequencer-relayer.md#submission-state-file)
for further details.
* The `submission` module was heavily changed: there are now two primary
state structs `StartedSubmission` and `PreparedSubmission`, between
which the blob submitter toggles during normal operation. There is also
`FreshSubmission` which is only relevant at startup, and an enum
covering these three (`SubmissionStateAtStartup`) which is also only
used during startup. Finally, the `State` enum is an ephemeral object
only used to read/write the relevant state from/to disk.
* `BlobSubmitter::run` was modified to try to confirm the last
submission attempt from the previous session if the state file indicated
the relayer exited while in `prepared` state. If that submission is
confirmed (the most likely outcome), then the sequencer blocks in that
submission are simply skipped in the write loop. (We could try to avoid
even fetching these sequencer blocks, but that would be a significantly
more pervasive change, and is probably not worth the complexity).
* The relayer's celestia client was changed to split `try_submit` into
`try_prepare` and `try_submit` so that the hash of the prepared `BlobTx`
can be returned from `try_prepare` to be recorded in the state file
before the transaction is broadcast to the Celestia app.
* `submit_with_retry` was updated to check for a broadcast timeout error
in the previous attempt, and in that case, attempts to just confirm that
submission rather than automatically resubmitting the same data.

## Testing
* Unit tests for the new `submission` types.
* Black box test `later_height_in_state_leads_to_expected_relay` was
updated.
* Manually observed expected behaviour against a locally-running
sequencer and Celestia app.

## Breaking Changelist
* Removed `ASTRIA_SEQUENCER_RELAYER_PRE_SUBMIT_PATH` and
`ASTRIA_SEQUENCER_RELAYER_POST_SUBMIT_PATH` config env vars.
* Added `ASTRIA_SEQUENCER_RELAYER_SUBMISSION_STATE_PATH` config env var.

## Related Issues
Closes #1200.
steezeburger added a commit that referenced this pull request Jul 29, 2024
* main:
  release: cut bridge withdrawer release (#1303)
  release: version cuts for dusk-9 (#1299)
  chore(core): remove ed25519_consensus from public API (#1277)
  chore: remove spurious entry in gitignore (#1276)
  chore(chart): Update EVM-Rollup Geth devTag (#1300)
  fix(proto)!: Change execution API to use primitive RollupId (#1291)
  refactor(core, proto)!: define bridge memos in proto (#1285)
  chore(sequencer-relayer)!: minimize resubmissions to Celestia (#1234)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cd documentation Improvements or additions to documentation sequencer-relayer pertaining to the astria-sequencer-relayer crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Minimize resubmission of same data by sequencer-relayer to Celestia
4 participants