Wednesday, March 9, 2011

A familiar debate…


Those of us who write database-based applications are familiar with several choices that have to be made when designing the database. One of these is the selection of the primary key type.

Several schools of thought exist on the matter, and the discussions generally descend into the realms of religious dogma. There is no single hard-and-fast correct answer.

I am steering clear of several more fundamental debates in this post – specially the discussion of natural vs. surrogate primary keys. We’ll assume that we’re using surrogate primary keys in keeping with the general nature of the post.

From what I’ve learned over the years, the important characteristics of a primary key are as follows:

1) It must to be unique across the table. A typical approach would be to generate a sequence (or let the database automatically create one) and obtain a unique integer value as the primary key when a new row is inserted in a table.

This approach has several issues – first among which is that the primary key for a row isn’t known until after the row is inserted, which makes inserting a set of related rows somewhat involved.

Furthermore, in real life, databases are backed up and restored, and potentially copies of the databases need to be synchronized. So it would be nice if a primary key were globally unique, and not auto-generated by the database. This can quite easily be achieved by making a Guid the primary key type.

2) It must be easy to index (sort) by, since primary keys of one table can be foreign keys in others, and filtering by foreign keys is a very common operation.

The (potentially automatically generated) integer primary key is perfect suited to fill this need.

A Guid primary key is atrociously bad for this operation, since the globally-unique nature of a Guid is obtained by making it somewhat random. Guids do not sort easily, and in code, they natively do not even support ordering – only equality-comparison.

If only there were a way to get the globally unique nature of a Guid with the ordered nature of integers…