A Scrum Of One: Database Design

Wednesday, March 9, 2011

Enter the ‘Ordered’ Guid

So in the previous post, I talked about the desire for something with the uniqueness of Guids with the ordered-ness of integers.

Microsoft already have something like this in the database from SQL Server 2005 onwards, but in my mind, this takes manifests the worst aspects of both integer and Guid primary keys.

For starters, NEWSEQUENTIALID() is a database function, which implies that the database generates the id for you – which makes it functionally equivalent to an auto-incremented integer, with the added cost of the Guid’s size.

The true strength of the random Guid is that the id is predictably unique before the insert operation – so that sets of related rows can be inserted directly.

So taking a leaf out of that book, I decided to write a predictably unique Guid, which is also ordered, and reasonably quick to create.

General Principle

The following constructor for Guid seems like a good staring point:

public Guid(int a, short b, short c, byte[] d)

We could somehow configure

a to be a (random number) to represent a realm or domain
b to be a (random number) to represent a server in that domain
c to be a (random number) to represent an application on that server

and then provide a monotonically increasing 64-bit value for d, and we would be in business.

The Guid could be self-constructed external to the database, be predictably unique, and be ordered as well.

Setting up the Context Arguments

I created a new Class Library for this OrderedGuid class.

I used the Settings tab to create three parameters along with some default values. These values will persist after any changes made by making changes to the Class Library’s app.config file.

In the static constructor of the OrderedGuid class, set up the parameters with random values.

public static class OrderedGuid
{
    private static int _a;
    private static short _b;
    private static short _c;

    private static readonly object LockObject = new object();
    
    static OrderedGuid()
    {
        lock(LockObject)
        {
            var randomNumberGenerator = new Random();
    
            _a = (int) Settings.Default.Realm_UniqueID;
            while (_a == 0)
            {
                _a = randomNumberGenerator.Next();
            }
    
            _b = (short) Settings.Default.Server_UniqueID;
            while (_b == 0)
            {
                _b = (short) randomNumberGenerator.Next(short.MaxValue);
            }
    
            _c = (short) Settings.Default.Application_UniqueID;
            while (_c == 0)
            {
                _c = (short) randomNumberGenerator.Next(short.MaxValue);
            }
        }
    }

    public static Guid NewSequentialGuid()
    {
        throw new NotImplementedException();
    }
}

We’ll discuss and refine the NewSequentialGuid() implementation shortly. Stay tuned…

A familiar debate…

Those of us who write database-based applications are familiar with several choices that have to be made when designing the database. One of these is the selection of the primary key type.

Several schools of thought exist on the matter, and the discussions generally descend into the realms of religious dogma. There is no single hard-and-fast correct answer.

I am steering clear of several more fundamental debates in this post – specially the discussion of natural vs. surrogate primary keys. We’ll assume that we’re using surrogate primary keys in keeping with the general nature of the post.

From what I’ve learned over the years, the important characteristics of a primary key are as follows:

1) It must to be unique across the table. A typical approach would be to generate a sequence (or let the database automatically create one) and obtain a unique integer value as the primary key when a new row is inserted in a table.

This approach has several issues – first among which is that the primary key for a row isn’t known until after the row is inserted, which makes inserting a set of related rows somewhat involved.

Furthermore, in real life, databases are backed up and restored, and potentially copies of the databases need to be synchronized. So it would be nice if a primary key were globally unique, and not auto-generated by the database. This can quite easily be achieved by making a Guid the primary key type.

2) It must be easy to index (sort) by, since primary keys of one table can be foreign keys in others, and filtering by foreign keys is a very common operation.

The (potentially automatically generated) integer primary key is perfect suited to fill this need.

A Guid primary key is atrociously bad for this operation, since the globally-unique nature of a Guid is obtained by making it somewhat random. Guids do not sort easily, and in code, they natively do not even support ordering – only equality-comparison.

If only there were a way to get the globally unique nature of a Guid with the ordered nature of integers…