INF: Managed Range Identities in Merge Replication (322910)

MORE INFORMATION

Identity Columns in Merge Replication

The goal in replicating identity values is to use an incrementing technique so that inserts performed at different servers will not use the same incremental values. Because merge replication must handle servers that are disconnected and that cannot communicate with each other, it cannot coordinate each insert to use the next available ID value across both copies of the table.

Ranged Identities

The solution merge replication uses is "ranged identities", in which each published article is assigned a range of identity values. The identity values in each table start at the minimum value of the range, and increment to the maximum value of the range. For example, range assignments might be defined as batches of 100:

                   Range      Range
Server             Minimum    Maximum
------------------ ---------- ---------- 
Publisher                 1      100
Subscriber1             101      200
Subscriber2             201      300

To implement these ranges, the seed property for the identity column in each table is set to the minimum value of the range, which "seeds" the incremental values to start at the low end of the range. To control the high end of the range, a CHECK constraint is used to prevent the identity column values from falling outside the range. For example, a Publisher table might be defined as:

CREATE TABLE id_table
(
     ID int identity (1,1) not for replication primary key,
     CONSTRAINT id_range_check CHECK (ID BETWEEN 1 AND 100)
)

The Subscriber1 table would be:

CREATE TABLE id_table
(
     ID int identity (101,1) not for replication primary key,
     CONSTRAINT id_range_check CHECK (ID BETWEEN 101 AND 200)
)

The Subscriber2 table would be:

CREATE TABLE id_table
(
     ID int identity (201,1) not for replication primary key,
     CONSTRAINT id_range_check CHECK (ID BETWEEN 201 AND 300)
)

Allocating New Ranges

Using the table definitions that are earlier in this article, each Subscriber can successfully insert 100 rows, and that newly-inserted data can then be safely replicated to the other Subscriber (or Publisher) databases without error or without unique index violations. However, after the one-hundredth row, inserts start to fail with messages that indicate the Subscriber-side identity range has filled up:

Server: Msg 548, Level 16, State 2, Line 1

The identity range managed by replication is full and must be updated by a replication agent. The INSERT conflict occurred in database 'dbGeneralMergeSub', table 't1', column 'c1'. Sp_adjustpublisheridentityrange can be called to get a new identity range.

The statement has been terminated.

When the original identity range fills up, the replication agent must create a new range before allowing any more rows to be inserted into the table. Using the earlier example, the next available range is 301-400, and the following Transact-SQL code is used to implement the new range:

ALTER TABLE id_table DROP CONSTRAINT id_range_check

DBCC CHECKIDENT('id_table','reseed',301)

ALTER TABLE WITH NOCHECK id_table ADD CONSTRAINT id_range_check
CHECK (ID BETWEEN 301 and 400)

After this point, inserts are permitted in the table and they start to increment in the new range. Note that this code is for explanatory purposes only. Merge replication manages this process for you, and you cannot use this technique if the table is published for merge replication.

How to Implement Managed Range Identities with Merge Replication

While it is possible to implement ranged identities independent of merge replication, it can be very difficult to manage the range assignments and reallocate new ranges, especially if the servers do not have a reliable network link.

Identity Columns and NOT FOR REPLICATION

There is a special consideration to keep in mind when replicating tables with identity columns. By default, SQL Server does not know who is performing the insert. It might be a user process that is performing an insert, or it might be the merge replication process replicating the user insert. By default, in both cases, SQL Server tries to increment the identity column when it loads the row. However, you typically want SQL Server to increment the value only when the user performs the insert, but you do not want the value incremented when the replication process inserts the new row into another server in the replication topology. When the row is inserted into another server you typically want it to retain the value from the original insert.

For this reason, if you are going to publish a table that contains a column with the IDENTITY property, you must use the NOT FOR REPLICATION (NFR) property for the IDENTITY column. When this option is set, SQL Server recognizes merge replication processes when they perform the insert, and the original identity column value is retained.

If the NFR property is not used, you may receive several different types of errors during the merge process. The errors range from unique index errors to more subtle errors such as foreign key violations and trigger errors.

Add the NOT FOR REPLICATION Option to Existing Tables

If you are publishing an existing table with an IDENTITY column, you must first add the NOT FOR REPLICATION property to the IDENTITY column. This is relatively easy if the table is empty. You just drop the table and re-create it with the NOT FOR REPLICATION option. However, because the Transact-SQL ALTER TABLE statement does not allow you to change IDENTITY column properties, this can become more difficult if, for example, there is data that must be retained, or if the IDENTITY column is also the PRIMARY KEY for the table and other tables have FOREIGN KEY constraints that reference the PRIMARY KEY identity column.

Following is the basic process for adding the NOT FOR REPLICATION property to an existing table:

Back up the database.
Copy the table data to another table or to a file.
Save the SQL script for the foreign key constraints that reference the identity table, and then drop the foreign keys. You need these scripts to re-create the foreign keys later.
Drop the identity table, and then re-create it with the NOT FOR REPLICATION option.
Set the IDENTITY_INSERT option to ON for the table.
Copy the data from the other table or the file back to the new table.
Set the IDENTITY_INSERT option to OFF for the table.
Re-create the foreign keys by using the script you saved earlier.

How to Configure Merge Replication Articles

When you are creating an article for publication that is based on a table with identity columns, the sp_addmergearticle stored procedure provides these options:

@auto_identity_range
The expected values are TRUE and FALSE. The default is FALSE. If set to TRUE, merge replication manages the identity ranges that are allocated to each Subscriber for the table.

@pub_identity_range
This value determines the size of the identity range that is defined for the table in the published database. If the published table already has data, the new range will start from the maximum existing value in the column.

@identity_range
This value is the size of the range that is defined for the table in each Subscriber database.

@threshold
This value ranges from 1 to 100 and defines the percentage of the range that causes the merge process to allocate a new range. For example, if the range size is 100, from 1 to 100 and the @threshold is 80, the Merge Agent will allocate a new range after the value 80 is inserted.

NOTE: The same stored procedure parameters and identity-management functionality are available for transactional replication articles, in the sp_addarticle stored procedure.

Special Considerations for Reallocating Ranges

It is important to realize that identity ranges are managed at the publication and the distribution databases, and that they are only re-allocated during merge replication processing. Managed ranges are controlled through a CHECK constraint on the tables. So, if the users perform enough inserts to fill up the range for their table, all later inserts fail with error 548 (mentioned in the "Ranged Identities" section of this article). The only way to obtain a new range for a subscriber table is to run the Merge Agent. If a publisher's range is full, you can reallocate a new range without running the Merge Agent by calling the sp_adjustpublisheridentityrange stored procedure.

Planning the Identity Management Options

Before you implement ranged identity management with merge replication, you must first consider how many inserts will be performed by users and how frequently they will be merging their changes. The primary goal when defining the identity options for an article is to make the ranges large enough that the Subscriber will not run out of values before the next merge.

If the ranges are too small, users cannot insert new rows after the range fills up. However, if the range sizes are too large and each Subscriber only uses a small portion of their range, there will be large gaps in the incremental values. This can be a perception problem for users who are accustomed to strictly incremental values. It may be possible for the allocated ranges to become very large, which makes the values have so many digits that they are difficult for users who must manually enter ID values in Web pages or other user interfaces. In extreme cases, it is possible for the allocated ranges to be greater than the limitations for the data type of the identity column.

If gaps are acceptable, and you have a small number of Subscribers, you can make the range large enough so that Subscribers will probably never need a new range. Alternatively you can set it so that they only need a new range every day, week, or month, depending on the needs of the application.

You can also use the @threshold value to define how aggressively the Merge Agent allocates a new range before the old range fills up. Setting @threshold too low will result in the ranges being reallocated long before they fill up, which also causes larger gaps to appear in the incremental values. Setting it too high defeats the purpose of the setting. The Merge Agent will not allocate a new range even though the old range is almost full, making it more likely to fill up and cause errors before the next merge.

Special Considerations for Republishing

With merge replication you can replicate data among a hierarchy of Publishers and Subscribers. For example, you might replicate data from a central corporate server to regional servers, and from the regional servers to local city Subscribers. First you publish the data from the central server and set up subscriptions from the regional servers to the central Publisher. Then you "re-publish" the regional Subscriber servers, and set up subscriptions from the leaf-node city Subscribers to the regional Publishers.

The previous examples presented have discussed the management of the identity ranges in the context of a single Publisher and one or more Subscribers. When multiple levels of Publishers and Subscribers are defined in a topology, you must keep in mind that the Subscribers' ranges are allocated from their Publisher's range. When the Publisher's range is full, that Publisher must request a new range from their Publisher, and so on, up to the top node. For this reason, you must make the ranges fairly large for the top level publications, and increasingly smaller for the lower levels of the hierarchy.

Identity Ranges and Multiple Publications

With merge replication you can publish a single table in more than one merge replication publication, and the Publisher table may be a Subscriber to a transactional replication publication. Because the articles may be defined with different ranged identity settings in the different publications, you must be careful to use the same settings for @auto_identity_range, @pub_identity_range, @identity_range, and @threshold across all of the publications.

Alternatives to Managed Ranged Identities

Custom Built Identity Range Management

You can implement identity ranges yourself, independent of replication. The basic requirements are:

Use the NOT FOR REPLICATION option on the identity columns.
Define the seed value for each database so that each Subscriber starts incrementing at a different value.
Define a CHECK constraint to prevent the values from overflowing the range for the database.

The table examples in the "Ranged Identities" section of this article show how to implement identity ranges. However, as stated in that section, the difficulty comes in allocating new ranges when the current range fills up. Especially if the disconnected databases cannot connect to a single controlling server that manages the ranges.

This option is most useful in cases when there are relatively few inserts in the database, and you do not want to deal with the administrative issues of using the features of merge replication.

Compound Primary Keys

You may be able to publish tables with identity columns and not use managed ranges. However, to prevent unique key violations, you must define a compound key constraint or index based on the identity plus some other value that will not be duplicated across the databases.

A good option is to include a computed column in the table that uses a function such as @@SERVERNAME or SUSER_SNAME(). Your table may already have a column used for another purpose that fits this need. You can then define the primary key or unique constraint based on both the identity column and the other column.

The first problem is that the combination of values in the compound key must be unique across the replication topology. If, for example, users can connect to different servers, these values may not be unique. Here is an example:

A user, named "mergeuser", can connect to two different servers and insert a row. At both servers, the column that evaluates the function SUSER_SNAME now contains the value "mergeuser." If the value for the identity column at each server is the same (such as 101), the compound key is not unique across both servers and causes a duplicate key violation on a later merge.

The second problem with this technique is that the ID values are frequently used for needs such as customer and invoice IDs where including the server name or user name in the value may not be appropriate.

Finally, there are generic issues with having compound keys. If the table is referenced by a foreign key constraint in another table, that table must also have both columns. This makes Transact-SQL coding more difficult, and using multiple columns in compound indexes may be less optimal than single column unique indexes.

Random Values

You can also use the Transact-SQL RAND function to generate ID values. However, these values are not always acceptable for users who expect incremental values and other types of consistency in the values. While the RAND function does guarantee randomness, it does not guarantee uniqueness. Depending on how the RAND function is used, there is a statistical probability that a random value will be duplicated at some point in the future.

GUID (UNIQUEIDENTIFIER) Columns

The final alternative to using incremental values is to use "guid" columns, or columns that are defined with the UNIQUEIDENTIFIER data type and that are populated with values generated by the NEWID function. These values are always guaranteed to be unique, so they may be a good candidate. These values are in this format:

9AAA35AE-D5F0-4C24-BF92-7EF20740C995

Because of their length and format, they are not very friendly for users who must work with the values manually. If the tables are accessed strictly through code and not used by users, this is actually a very good option. If you are publishing the table with merge replication, the table must have a UNIQUEIDENTIFIER column (together with the ROWGUIDCOL property and a unique index), so you do not have to make any special considerations for identity values.

REFERENCES

SQL Server 2000 Books Online; topics: "Managing Identity Values"; "Using NOT FOR REPLICATION"; "sp_adjustpublisheridentityrange"; "Planning for Application Development"