-
-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Managing Identity Columns as Primary Keys in DMS. Feasible ? #590
Comments
To be honest - I do consider using sequential numeric values as primary keys in sync scenarios a fundamental design flaw. |
Speaking as one who is staying out of this discussion to another @gentledepp, how are sequential guids generated on the client? I'm not sure sequential numeric values are necessarily a design flaw in a distributed system. Like most things, it depends. Most systems, including most systems discussed here do not generate more than 10 million rows per second. For those, DateTimeOffset.UtcTicks may be a viable primary key. Or that's what I would have said if I weren't staying out of it. |
May I ask what the current status is of this? I am using a different sync framework at this moment, but I am thinking about switching. In our database we use integer based keys and foreign keys. And before everybody mentions we should use guid’s, that is not possible for us. We need to communicate with other platforms/applications that don’t allow guid’s as id/foreign keys. I can add guid’s as primary key, but I would still need an extra unique integer based id field. The current framework also works with id ranges per client. And those are updated on each sync. This setup was also mentioned above, but i was wondering if that is enabled in the current release? |
Hello @percramer Today, using identity columns as primary keys works fine in Since we do not have yet a proper solution for handling identity columns on the server side, it will not be implemented. Just out of curiosity, which framework are you using today, and does it manage correctly the identity columns or do you have to make trade off ? |
Hi @Mimetis, I am using Sqlite-sync . They made the express version open source , that only works with sqlite on the server and client side. I am using sql server on the server side, but it all works in the same way. I need to be able to do inserts from the client side since the application is also used in locations where there is no network connection. What they do is also reserve id ranges per client (and per table of course) in a table, but those are synced to the client also. So each client will have a unique integer range which is updated (it needed) during the sync. And triggers on the client sqlite table ensure that the used values are from that range. So the server doesn’t have to deal with that anymore because the data from the clients already has got unique id’s. so that is partially what you proposed also, but you wanted to keep that logic server side? Wouldn’t the setup of my current framework but a workable solutions for this also? It works without any problems at this moment. sqlite-sync has got some other design flaws sadly though that prevent usage with larger amounts of data / changes |
A possible solution could be to add a GUID column while keeping the Identity columns as a non-primary key indexed column. You can still index the identity column for performance, but the GUID column would be used to confirm that rows actually are referencing to the same data row. This could be in conjunction with cascading updates in SQL server*** to change the identity of a potential collision before syncing. I see this as "doable" but certainly not elegant or easy to implement. It would be very heavy on the user to make sure all the cascading updates and new GUID columns are set up properly, I would have doubts that it could be made into a general solution usable by average users. In summary: Feasibility to provide a possible solution pathways to people interested and experienced enough to implement for their own organization: Quite feasible Feasibility to create an elegant, easy to implement general solution that could handle all of the possible individual requirements of all systems? Not feasible. *** I'm expert-level in MS SqlServer, but I won't begin to claim I know what fancy features that the other supported DBs do or don't support. |
I've been using Microsoft Sync Framework since year 2011 and serving more than 6 billion transaction records on one Azure Hyperscale database sync with 6000+ clients. Sadly when the DB designed 12 years ago we used Integer (even not bigint) for the identity column (auto increment), the way we handle conflict is following Microsoft's best practice, to set the sync group when conflict happen, use the client data or server data as the final copy. We have set when conflict happened, the sync agent will use the client side's data to override the server side's data. The key solution is to set composite columns, e.g. A retail group have 200+ shops, they have an (int) column for OrderId, so in order to prevent conflict, we used composite keys as primary key with auto increment. And we even introduced the terminal ID (as one shop may have more than one terminal) in the composite keys. In summary: I think DMS do not need to handle the "creation" when there's conflicts of Identity Columns as Primary Keys, it is a design flaw of the SQL DB schema. The current conflict resolution (either set client override server, or server override client) already sufficient to handle this situation. |
Although that is a nice solution for the issue (composite keys) it can’t be always applied. We do have situation like yours but with a few systems we don’t have control over the db schema (since they are from external software applications which have been bought). And this applications only work with an singe column integer key sadly |
Do you have control over the local database? If yes, i am thinking you can create a new column (e.g. ShopId) for those local tables, and setup a stored procedure or create row trigger to automatically fill in the ShopId to 1 for shop A, shopId to 2 for shop B. And then in DMS, set the filter to only bi-directional sync rows with shopID column equals to 1 to shop A, 2 to shop B, etc. |
No that is the main issue, we can't change the schema at all. Our SQL user doesn't have those rights. We where using Sqlite-sync in the past but for different reasons I am looking for something different. SQLite-sync uses a identity range per client which is actively updated when the user syncs. And then local triggers to ensure the ID is in the range of the user. I have been thinking about implementing this here also, but time... |
For allocating identity range per client approach, the sync server need to estimate the growth rate of the client, how frequent the client will connect to server to update the "used range" and to take the next batch of the available range, and in case the identity range used up all ids, can it overflow? (potentially the overflow part will conflict with other client's assigned range) Another approach is to build a middle DB (either local or on the cloud), use DMS perform a 1:1 to sync the sqlite data to the middle DB, then fill back the shop ID in the middle DB, and perform another data sync from the middle DB to the centralized DB (merged with other shops data). |
i have been doing some testing with negative primary keys on my client app. So the auto increment step is -1 instead of 1. And then during the sync i can easily check if they are negative and then replace them with a newly generated value. Just need to sync that back to the client then. |
Introduction
This issue is an explanation about "why" auto increment primary keys is not handled by
DMS
, yet.Some solutions exist to ensure a correct sync between several systems, using an identity primary keys for some tables.
Theoretically, solutions exist, but in practice (you know.. real life) some really complex challenges can emerge from these solutions.
This post is NOT a solution, but more an open discussion on "How he could be done, if possible".
The challenge of auto increment keys is basically how can we afford any conflict when 2 records from 2 systems are using the same primary key (auto incremented) for a new record.
Basically, any delete or update does not have any problem here. The main source of concerns comes from the insertion.
Is it an update conflict ?
Is it two insertion ? If so, should we change the Client primary key to 43 ? (and what will happens on next insertion ?)
Solutions
Basically we have two solutions, to avoid insertions conflicts with auto increment primary keys:
1) Assigning ranges
The idea is to work with Identity columns as Primary keys and range of ids.
For instance, you can define something like that:
1
-99 999 999
100 000 000
-100 999 999
101 000 000
-101 999 999
102 000 000
-102 999 999
You can store these values in any configuration table.
When you reach the limit for one client (or server), just change the range for a new one.
2) Assigning Seed / Step
In the same concept, another technic is to use the IDENTITY (m, n) formula as follows:
1
, First Client =2
, Second Client =3
, etc.)For example, we have 4 nodes (
n = 4
) to sync (1 Server + 3 Clients):The Serial number m will be
1
for Server and2
,3
,4
for Client – A, Client – B and Client – C respectively.According to the formula, the data will be populated as follows:
IDENTITY (1, 4)
- the column data sequence will be1
,5
,9
,13
…IDENTITY (2, 4)
- the column data sequence will be2
,6
,10
,14
…IDENTITY (3, 4)
- the column data sequence will be3
,7
,11
,15
…IDENTITY (4, 4)
- the column data sequence will be4
,8
,12
,16
…You can store the formulas values / increments in any configuration table.
Solution we will discuss here
Since the major clients databases used with
DMS
is SQLite, and since the merge replication from SQL Server uses also this technic, we will discuss the implementation of Solution 1: Assigning Ranges.DMS possible Implementation
I've already worked on the primary keys auto increment implementation in
DMS
, even if it wasn't publish so far.I'm going to explain what I've done and what are the problems I did not resolve yet
Playing with Identity Columns On SQL Server
Firstly, let's see what we have, in terms of queries / language / hints, on SQL Server, when we are working with Identity columns:
(Obviously, we have many others SQL statements for
IDENTITY
columns, that we don't need here)SET IDENTITY INSERT
: Allowing the insertion of a specific value in an identity columnDBCC CHECKIDENT
: Managing Seed and Step for an identity column for a specific table.IDENT_CURRENT
: Getting the current identity increment values for a specific table.Macro implementation in DMS
For the implementation, we will stick to SQL Server only, for now.
scopes_identity table
Firstly, we need to save and handle the ranges for each client and the server.
Obviously, we can use a simple table (like
scope_info
) to stores these values.Nothing really complicated here.
Here is a draft of the
scope_info_identity
table, stored and managed from the server side:Using this kind of tables, we can imagine a sync system where a Client will:
Merging rows
Here are coming the problems.
Here is a macro version of the actual
Insert
/Update
(well on SQL Server, we are using aMERGE
statement) stored procedure used when inserting new rows in any client or server (when dealing with identity primary keys):Something important to notice about
SET IDENTITY_INSERT
:It allows us to specify a value for our identity column, and if this value is higher than any stored value, the new internal identity counter is incremented as well.
A small sample:
Since we need to be sure each clients will stay in its range, we need to handle this behavior.
Client 1: Range 1 000 000 to 1 999 999
Let's see what happens for Client 1 with range
1 000 000
to1 999 999
?We need to ensure that all values inserted in from Client 1 should always be in that range.
But what happens if we have a sync on Client 1 with records coming from Client 3 where range is
3 000 000
to3 999 999
?Well, for sure, the internal identity value for that column will be reseed to something between
3 000 000
to3 999 999
.And we need to avoid that.
Here is a macro version to be sure we can avoid this situation:
Using this technic, we will RESEED the internal increment value to the previous value, before any merge (with potential higher values for the identity column)
This solution leads to several problems ....
Multi SQL Sessions insertions
What happens if we have another session between the identity increment saving (
SELECT @identity_current = IDENT_CURRENT('Product');
) and theRESEED
?Another session will insert a new record and our temp saved value will not be corect.
The RESEED will work, but will be wrong as well.
Since the reseeding will not be correct, we will have an error raised on next
INSERT
in the Client 1 database.On the other part, what will happens if we try to insert something, in another session, just after the
MERGE
?Here, we may have a new record on Client 1 database, out of the authorized ranges of values....
Using a Serializable transaction.
Serializable transaction is the most protective level of transaction. Can we use it here ?
The serializable transaction won't save us here, since it will not lock the entire table, and will allow any other session to get a new identity value and insert a new record in the table, potentially as we saw in our last 2 samples...
TABLOCK hint
The TABLOCK hint is more restrictive and will lock the entire table during our
MERGE
statementIt will not save us at all since we still can insert before or after the
MERGE
statement...Solutions of both side.
So far, we saw that the problem is coming from multi SQL sessions on the table that is updating during a sync
MERGE
statement....FREEZE UI and prevents multi SQL Sessions
Can we say something like:
"Hey, when you are doing a sync on this table, DO NOT TRY TO MAKE ANY INSERTS IN IT"
That could potentially works for client databases. You can easily freeze the UI and prevents this situation to happens.
But what will happens on the Server side ?
The Server side needs to handle multiple sync at the same time, and we can't prevent two sync to happens at the same time ...
Disabling the internal counter increment
An ideal solution would be to disable the internal increment of the identity current value, during the
MERGE
statement:I guess this is what happens when we are using the
NOT FOR REPLICATION
option with the SQL MERGE replication.Source : https://www.mssqltips.com/sqlservertip/1274/change-not-for-replication-value-for-sql-server-identity-columns/
Conclusion
I'm not sure I've put here all the researchs and tests I've made so far on the Sync Identity columns problems, but at least this thread could be a good starting point to discuss about it.
If you have any idea, do not hesitate to share here your thoughts.
The text was updated successfully, but these errors were encountered: