Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resuming ghostferry-copydb after interrupt fails due to missed TableMapEvent #156

Open
kolbitsch-lastline opened this issue Mar 6, 2020 · 3 comments

Comments

@kolbitsch-lastline
Copy link
Contributor

The interrupt-resume feature as described in

https://shopify.github.io/ghostferry/master/copydbinterruptresume.html

works well, if the interrupt does not happen within batched RowsEvent events being processed.

A mysql replication event for changing data is always started by sending a TableMapEvent (describing the table to be changed), followed by one or more RowsEvent (containing the data to be changed). If multiple consecutive RowsEvent events are sent for the same table, the TableMapEvent is typically skipped (after sending it once).

Thus, if the interrupt happens after receiving the TableMapEvent but before receiving the last RowsEvent, ghostferry will try to resume from the last processed RowsEvent, causing the replication/BinlogSyncer syncer to crash with an invalid table id < table-ID >, no corresponding table map event exception.

This is due to the following code:

var ok bool
e.Table, ok = e.tables[e.TableID]
if !ok {
	if len(e.tables) > 0 {
		return errors.Errorf("invalid table id %d, no corresponding table map event", e.TableID)
	} else {
		return errMissingTableMapEvent(errors.Errorf("invalid table id %d, no corresponding table map event", e.TableID))
	}
}

Note that this is not a bug in the replication module, because ghostferry points the syncer at a resume position after the TableMapEvent , and the code cannot satisfy this (without ignoring the fact that it doesn't know the table ID).

Changing the replication to ignore unknown table IDs is tempting but very tricky: we could cache the table IDs previously seen in ghostferry and "fill the gap" this way. Unfortunately the parsing of the RowsEvent data relies on the table schema. And, conceptually, it also seems quite unclean.

Thus, IMO the correct approach is to make sure the syncer is not pointed at a location from which it cannot resume.

Ghostferry today only stores the resume position via its lastStreamedBinlogPosition property. In my variant of ghostferry (see #26) , I have changed the BinlogStreamer class to keep track of an additional position: the lastResumeBinlogPosition . The idea is to analyze each received event from the syncer and check if it's a RowsEvent or not. If not, the event position is considered "safe to be resumed from" and the property is updated - otherwise it is kept at its current value.
Thus, the resume position is a tuple of "last received/processed event" and "last position we can resume from".

Then, when resuming, we do not resume from lastStreamedBinlogPosition, but instead from lastResumeBinlogPosition. Any event received in BinlogStreamer.Run is first analyzed if the event position is before lastStreamedBinlogPosition , and if so, it is skipped. Otherwise it is queued to the eventListeners .

Happy to share my changes as a pull request - unfortunately it's currently somewhat intermangled with my changes on ghostferry-replicatedb (see other ticket). So before I untangle these two unrelated changes, I wanted to hear your opinions on the above, and if I'm overlooking a simpler way of doing this.

kolbitsch-lastline added a commit to kolbitsch-lastline/ghostferry that referenced this issue Mar 12, 2020
A mysql replication event for changing data is always started by
sending a TableMapEvent (describing the table to be changed),
followed by one or more RowsEvent (containing the data to be
changed). If multiple consecutive RowsEvent events are sent for the
same table, the TableMapEvent is typically skipped (after sending it
once), meaning that resuming from such a binlog is not possible.

This commit tracks the position of the most recently processed
TableMapEvent and uses this for resuming (skipping any actual events
between the last write and resume position that were already
processed.

Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d
kolbitsch-lastline added a commit to kolbitsch-lastline/ghostferry that referenced this issue Mar 22, 2020
A mysql replication event for changing data is always started by
sending a TableMapEvent (describing the table to be changed),
followed by one or more RowsEvent (containing the data to be
changed). If multiple consecutive RowsEvent events are sent for the
same table, the TableMapEvent is typically skipped (after sending it
once), meaning that resuming from such a binlog is not possible.

Resuming at a binlog position between the TableMapEvent and a
RowsEvent is not possible, as the binlog streamer would miss the
table definition for following DML statements.

This commit tracks the position of the most recently processed
TableMapEvent and uses this for resuming (skipping any actual events
between the last write and resume position that were already
processed).

Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d
@kolbitsch-lastline
Copy link
Contributor Author

I've cleaned up the PR a lot to make it more easily digestable. I'm still working on an integration test and reproducing it reliably in master (and fixed in the branch, obviously)

kolbitsch-lastline added a commit to kolbitsch-lastline/ghostferry that referenced this issue Mar 22, 2020
A mysql replication event for changing data is always started by
sending a TableMapEvent (describing the table to be changed),
followed by one or more RowsEvent (containing the data to be
changed). If multiple consecutive RowsEvent events are sent for the
same table, the TableMapEvent is typically skipped (after sending it
once), meaning that resuming from such a binlog is not possible.

Resuming at a binlog position between the TableMapEvent and a
RowsEvent is not possible, as the binlog streamer would miss the
table definition for following DML statements.

This commit tracks the position of the most recently processed
TableMapEvent and uses this for resuming (skipping any actual events
between the last write and resume position that were already
processed).

Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d
@kolbitsch-lastline
Copy link
Contributor Author

OK, I've managed to add an integration test that reliably triggers the problem in master and works..

I'm not particularly proud of this test code, to be quite frank, as it relies on winning a race condition, but it seems to be stable enough with the input values (and I've extracted the important parameters into dedicated variables to make it more obvious and added a bunch of documentation).

Hopefully this allows demonstrating the problem a bit better. The latest PR is good from my side - looking forward to hearing feedback

kolbitsch-lastline added a commit to Lastline-Inc/ghostferry that referenced this issue Mar 23, 2020
A mysql replication event for changing data is always started by
sending a TableMapEvent (describing the table to be changed),
followed by one or more RowsEvent (containing the data to be
changed). If multiple consecutive RowsEvent events are sent for the
same table, the TableMapEvent is typically skipped (after sending it
once), meaning that resuming from such a binlog is not possible.

Resuming at a binlog position between the TableMapEvent and a
RowsEvent is not possible, as the binlog streamer would miss the
table definition for following DML statements.

This commit tracks the position of the most recently processed
TableMapEvent and uses this for resuming (skipping any actual events
between the last write and resume position that were already
processed).

Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d
kolbitsch-lastline added a commit to Lastline-Inc/ghostferry that referenced this issue Mar 23, 2020
A mysql replication event for changing data is always started by
sending a TableMapEvent (describing the table to be changed),
followed by one or more RowsEvent (containing the data to be
changed). If multiple consecutive RowsEvent events are sent for the
same table, the TableMapEvent is typically skipped (after sending it
once), meaning that resuming from such a binlog is not possible.

Resuming at a binlog position between the TableMapEvent and a
RowsEvent is not possible, as the binlog streamer would miss the
table definition for following DML statements.

This commit tracks the position of the most recently processed
TableMapEvent and uses this for resuming (skipping any actual events
between the last write and resume position that were already
processed).

Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d
kolbitsch-lastline added a commit to Lastline-Inc/ghostferry that referenced this issue Mar 23, 2020
A mysql replication event for changing data is always started by
sending a TableMapEvent (describing the table to be changed),
followed by one or more RowsEvent (containing the data to be
changed). If multiple consecutive RowsEvent events are sent for the
same table, the TableMapEvent is typically skipped (after sending it
once), meaning that resuming from such a binlog is not possible.

Resuming at a binlog position between the TableMapEvent and a
RowsEvent is not possible, as the binlog streamer would miss the
table definition for following DML statements.

This commit tracks the position of the most recently processed
TableMapEvent and uses this for resuming (skipping any actual events
between the last write and resume position that were already
processed).

Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d
kolbitsch-lastline added a commit to Lastline-Inc/ghostferry that referenced this issue Mar 30, 2020
A mysql replication event for changing data is always started by
sending a TableMapEvent (describing the table to be changed),
followed by one or more RowsEvent (containing the data to be
changed). If multiple consecutive RowsEvent events are sent for the
same table, the TableMapEvent is typically skipped (after sending it
once), meaning that resuming from such a binlog is not possible.

Resuming at a binlog position between the TableMapEvent and a
RowsEvent is not possible, as the binlog streamer would miss the
table definition for following DML statements.

This commit tracks the position of the most recently processed
TableMapEvent and uses this for resuming (skipping any actual events
between the last write and resume position that were already
processed).

Change-Id: I2bef401ba4f1c0d5f50f177f48c2e866bb411f5d
@shuhaowu
Copy link
Contributor

This is fixed by #168?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants