-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for changing schemas #26
Comments
There are two parts to this:
|
Thanks @pushrax!
Why is that important? Any changes that occur to a table or column that gets created during a run will be visible to the binlog streamer. So, if we can pause all binlog event processing while we handle a DDL event, we'd automatically copy any activity on new columns / tables. |
@hkdsun and I came up with some ideas for this last week, and met with @shuhaowu yesterday to discuss them. We got stuck in our solution without a good way to synchronously detect schema changes in the target database. @hkdsun and I by chance had lunch with @insom today, who gave us an insightful hint to leverage transactional behaviour. Here's the idea:
Let's go over some examples to see if this might be correct. In the figure, time progresses rightward, S is the source, T is the target, the vertical arrows represent the moment a schema change is applied, and the the triangle represents the moment ghostferry's binlog streamer sees the table map event due to the source schema change. Column creation
Column removal
Table creation
Table removal
Non-exhaustive list of remaining things to think about:
|
So after some discussions with many people and some work with TLA+, I believe that the problem is of two folds:
The first problem still needs a solution as it's unclear how we can do this easily. The second problem, however, has a very simple solution. This is the interrupt + reconcile + cleanup/recopy method:
Further analysis of this has been done in prose in issue #17 and in TLA at #43 and is shown to preserve the safety properties of Ghostferry. Some alternatives to step 4 has also been documented in #17 that could be more time efficient. What's remaining is that we need to detect when schema changes occur, and this could be non-trivial. There are also alternatives where we don't interrupt Ghostferry. In these ideas, we can possibly complete Ghostferry runs faster, although I believe it will be complex to implement into the current code base. |
A few questions off the top of my head:
Why do we have to stop ghostferry rather than just making it sleep and wait for the schema to equalize? Doing that would save us the complexity of dumping and restoring state. It would also allow us to keep making progress on tables that didn't have their schema change and only sleep in the goroutine of the one that did. Wouldn't that be more efficient (and also sounds easier to implement)?
How do we know that they have equalized if ghostferry isn't running anymore? Who/what checks whether or not they are equalized yet?
What's the definition of a "good position"? I assume that's the last position (or any) position we know of that didn't have the schema change yet?
This only applies to tables that had their schema changed, right? Not ALL tables? If ALL, why? |
This was suggested as an alternative solution internally as well. Yes, it would be more time efficient to pause the copy on a particular table and delete the applicable records while copying other tables whose schemas are not changing. However, I think it will be much harder to implement that dumping and restoring since most of the code for dump/restore is already in the Ghostferry codebase, just not hooked up. There are also a fair amount of complexity surrounding exactly how to pause a particular table, since we essentially have a pre-forked model in terms of how we parallelize per table. Additionally, having a dump/restore would be beneficial for other circumstances as well such as in circumstances lost of connectivity to the database, whereas the code implementing the sleep/wait for equalize would only have a single use: to handle schema changes.
Every time we start Ghostferry in reconciliation mode, it should check the schemas to see if they're identical. If not, Ghostferry should quit with some appropriate exit code. An external supervisor service could simply launch Ghostferry in a loop. Alternatively, the supervisor could check the schemas itself and launch Ghostferry when appropriate.
This is documented somewhat in #17. It also depends how we want to perform schema change detection. Theoretically, this would be the lastStreamedBinlogPosition and lastSuccessfulPrimaryKeys in the BinlogStreamer and DataIterator respectively.
ALL tables. During the downtime, any record on any table can change. We must update the target or otherwise the target data would get stuck with the version of data from when we interrupted as future binlog events will have no effect. This is documented in great details in #17, under the section |
Detecting schema changes is something we have already talked a lot about when trying to figure out a truly general solution.
|
I'm new to ghostferry but have spend quite some time working on this problem. I think there is a subset of this problem that might be somewhat easier to solve (and that I'm particularly interested in): We are trying to use ghostferry not to move data, but instead to continuously replicate data from a source into a target. This can be useful, for example, when the target DB
In our scenario, we can assume that we restore the target DB from an earlier backup of the source. Thus, the batch-copy operation of ghostferry has completed already or never took place. At this point, we only want to stream binlog events, but we also need to follow schema changes to be able to apply all row-events. The nice thing about our scenario - and that's the subset that we could be solving somewhat easily - is that the target DB has the final state of all tables we care about before any alteration happens. As a result, we can use the target DB before/after altering (or adding) tables to re-extract the schema. There is, however, a problem with this: the race condition between the binlog-writer applying a schema change, and the binlog-streamer receiving row-events, and checking that the schema (the number of columns) that we have in the table-schema-cache matches the number of columns in the incoming rows-event.
I have worked around this problem, but "dumbing down" the binlog-streamer, and making it a "pure streamer" - that is, one that does not care (or know) about the schemas of tables. IMHO, making the streamer not know (or care) about schemas is a slightly cleaner design - but if there was a good reason you guys chose to do the opposite, I'm happy to be proven wrong :-) . Of course there is still a race between the binlog-writer and a batch-writer, as the batch-writer also may need to know the schema. However, I think that trying to apply schema changes before all batches have been applied is a losing battle. The thoughts you pointed out above are clever, but it's going to be incredibly difficult getting this right. Two more comments on the above:
So, long preamble to this: I have semi-working version of GFR that allows continuously replicating data and schema changes. Most of the changes could be used in ghostferry-copydb, but I understand that copydb tries to work under different scenarios than what I describe above. |
Hi! Thanks for the detailed analysis. Internally we've actually come up with a different way of handling schema changes using interrupt/resume as opposed to some of the points outlined above. I can post a more detailed version of that at a later time if you'd like, however it feels like the solution takes a direction that doesn't quite match what you're doing. In the meanwhile, you said there's a design flaw in the resume logic. Can you elaborate that? We can do this fix quickly if you can give a good description of it. |
This depends on how much of your modifications is to the BinlogStreamer/Writer and how that may interact with the rest of the codebase. It is theoretically to write a ghostferry application (ghostferry-replicatedb) that bypasses the |
yep, just wrote up #156 . We can continue this part of the conversation there. |
all my changes are meant to be kept generic. I wouldn't be surprised if my refactorings negatively impacted the batch/iterative copy code, but nothing has conceptually changed - I wrote all code in a way that it would also benefit copydb . I think the changes I made should be split up into multiple sub-pull-requests
One step at a time :-) |
Is it possible for you to post your WIP code somewhere? We can take a look and give you feedback on whether we can work with it for upstreaming. |
The most common type of failures in Shopify's internal usage of Ghostferry is errors caused by changing schemas. These can manifest themselves in many different ways including
DataIterator
/BinlogWriter
inserting data into deleted columns, tables disappearing on target database, and other incompatibilities between source and target databases.This is especially problematic for very time consuming ferry runs as it increases the likelihood of this class of failures.
We need to think about this can be remedied. Starting with a limited scope seems like a good idea for this. For example, start by supporting addition/removal of columns or tables.
cc @Shopify/pods
The text was updated successfully, but these errors were encountered: