Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(relayer): prevent tx1 resubmission #28

Merged
merged 26 commits into from
Sep 2, 2024
Merged

Conversation

Lazar955
Copy link
Member

@Lazar955 Lazar955 commented Aug 28, 2024

If tx2 fails and we are retrying, we shouldn't resend tx1 again.

Referencing issue

}
tx1 := rl.lastSubmittedCheckpoint.Tx1
// prevent resending tx1 if it was successful
if rl.lastSubmittedCheckpoint.Tx1 == nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relevant changes here, cache the TX1 if successful

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any chance for rl.lastSubmittedCheckpoint.Tx1 be loaded with the tx1 sent from the previous epoch?

rl.logger.Infof("Submitting a raw checkpoint for epoch %v for the first time", ckptEpoch)

if rl.lastSubmittedCheckpoint == nil {
rl.lastSubmittedCheckpoint = &types.CheckpointInfo{}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Init the lastSubmittedCheckpoint

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this to the default structure at New(

wallet btcclient.BTCWallet,

and then simplify the check by removing rl.lastSubmittedCheckpoint == nil

@Lazar955 Lazar955 marked this pull request as ready for review August 28, 2024 14:29
@@ -35,7 +35,7 @@ func NewWallet(cfg *config.BTCConfig, parentLogger *zap.Logger) (*Client, error)
HTTPPostMode: true,
User: cfg.Username,
Pass: cfg.Password,
DisableTLS: cfg.DisableClientTLS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove from config?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait this was confusing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why this appeared to me as code change at first

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was in previous PR, basically we are not using it in bitcoind.

Copy link
Member

@gitferry gitferry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking an alternative solution.
if we fail tx2 and retry the submission of both tx1 and tx2, tx1 would actually fail because the tx1 is already in mempool. In our impl, we would return error in this case so that tx2 will not be sent. So, I was thinking maybe we can achieve the goal by identifying the already in mempool error of tx1 and ignore it. Wdyt? Probably we need e2e to test this case

Copy link
Member

@gitferry gitferry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-wise lgtm! My major comment is that we kinda mixed the logic of handling the two cases. It would be good to separate the two cases.

Also, better to have both unit and e2e tests for this but can be done in a separate pr

Comment on lines 83 to 88
if rl.lastSubmittedCheckpoint == nil ||
rl.lastSubmittedCheckpoint.Tx1 == nil ||
rl.lastSubmittedCheckpoint.Epoch < ckptEpoch {
rl.logger.Infof("Submitting a raw checkpoint for epoch %v for the first time", ckptEpoch)

if rl.lastSubmittedCheckpoint == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we separate the two cases? if shouldSendCompleCheckpoint() --> convertCkptToTwoTxAndSubmit, else if shouldSendTx2() --> retrySendTx2. I think this would be logically cleaner and some low-level funtions can be reused.

Comment on lines 84 to 85
sendCompleteCkpt := rl.lastSubmittedCheckpoint.Tx1 == nil ||
rl.lastSubmittedCheckpoint.Epoch < ckptEpoch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. we should check whether rl.lastSubmittedCheckpoint is nil first. Otherwise, this might
  2. it would be good to have method for this like shouldSendCompleteCkpt() to hide details

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check whether rl.lastSubmittedCheckpoint is nil first. Otherwise, this might

-> lastSubmittedCheckpoint is now initialized in the constructor so, it shouldn't panic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. All good then

Comment on lines 88 to 89
shouldSendTx2 := (rl.lastSubmittedCheckpoint.Tx1 != nil || rl.lastSubmittedCheckpoint.Epoch < ckptEpoch) &&
rl.lastSubmittedCheckpoint.Tx2 == nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to have a method for this to hide details

shouldSendTx2 := (rl.lastSubmittedCheckpoint.Tx1 != nil || rl.lastSubmittedCheckpoint.Epoch < ckptEpoch) &&
rl.lastSubmittedCheckpoint.Tx2 == nil

if sendCompleteCkpt {
rl.logger.Infof("Submitting a raw checkpoint for epoch %v for the first time", ckptEpoch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rl.logger.Infof("Submitting a raw checkpoint for epoch %v for the first time", ckptEpoch)
rl.logger.Infof("Submitting a raw checkpoint for epoch %v", ckptEpoch)

maybe it's not the first time due to failure

// this is to wait for btcwallet to update utxo database so that
// the tx that tx1 consumes will not appear in the next unspent txs lit
// todo(Lazar): is the arbitrary timeout here necessary?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not after switching to using FundRawTransaction. Let's check it in e2e

}

tx1 := rl.lastSubmittedCheckpoint.Tx1
if tx1 == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to have a doc string for this function saying that the tx1 should not be nil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this function uses tx1 implicitly through state, I'll add the doc string above, but I would live the check in

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep yep, keeping the check is good here

Copy link
Member

@gitferry gitferry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Copy link
Collaborator

@KonradStaniec KonradStaniec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good ! (though lets wait @gitferry approval)

One question I have, do you guys think we should have some persistence/check on BTC chain whether checkpoint was not submitted already ?

Two separate cases I have in mind:

  • we send checkpoint to the mempool but we crashed. Now after restart we will send both transactions once again which will lead to money loss. (as will use different inputs for those transactions, those will be different transactions with the same data)
  • if there are multiple vigilante reporters in the network, the checkpoint may be already BTC chain but not yet submitted to Babylon. In this case we will also probably lose money.

@gitferry
Copy link
Member

gitferry commented Sep 2, 2024

we send checkpoint to the mempool but we crashed. Now after restart we will send both transactions once again which will lead to money loss. (as will use different inputs for those transactions, those will be different transactions with the same data)

I think this can be solved by storing the last submitted epoch in db. We need it for quick bootstrapping anyway

if there are multiple vigilante reporters in the network, the checkpoint may be already BTC chain but not yet submitted to Babylon. In this case we will also probably lose money.

This is a long-term thinking. Basically the submitter can have a separate goroutine looking for submitted checkpoints from new BTC blocks and decide whether the submit it

So maybe we can start by adding a db because for phase-2 we are probably still the only org running vigilantes?

@Lazar955 Lazar955 merged commit 489293d into dev Sep 2, 2024
8 checks passed
@Lazar955 Lazar955 deleted the lazar/handle_tx2_fail branch September 2, 2024 13:29
@KonradStaniec
Copy link
Collaborator

KonradStaniec commented Sep 2, 2024

This is a long-term thinking. Basically the submitter can have a separate goroutine looking for submitted checkpoints from new BTC blocks and decide whether the submit it

So maybe we can start by adding a db because for phase-2 we are probably still the only org running vigilantes?

Hmm agreed that solving the case 1 is more important.

Even if we do not solve the case 2 before mainnet launch, if we solve case 1 our vigiliante loses are constrained by 1 submission per epoch. This is not ideal but imo we can leave with it.

So this is will be extension of this task if I am correct: babylonchain/vigilante#156. We need persistent storage to:

  • save bootstrapping time after recovery
  • do not waste fees if we already submitted transactions for certain epochs

cc: @Lazar955

@gitferry
Copy link
Member

gitferry commented Sep 2, 2024

Yep, babylonchain/vigilante#156 was raised mostly for the monitor bootstrapping but I guess the submitter also needs it

Lazar955 added a commit that referenced this pull request Sep 16, 2024
In case of a restart, we want to avoid wasting fees if we have already
submitted transactions for a certain epoch.
Basic idea:
A crash occurs after sending both checkpoint transactions for epoch `n`
to the BTC and recording them in the database. Upon restarting, we find
that epoch `n` is still marked as sealed.

Before resending the transactions we first check our database and
confirm that the transactions for this epoch have already been sent.
Since the transactions were previously sent, the next step is to verify
their status on our Bitcoin node.

If the Bitcoin node is aware of the transactions and they are present in
the mempool, no further action is needed.
However, if the node does not recognize the transactions, this indicates
they were lost, and we must resend them to the network.



[References](#28 (comment))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants