Friday, August 28, 2015

Database concurrency - simplified

Just the other day I was asked why we did not use NHibernate since using an ORM is the only way to ensure concurrent writes to the database. Well, after a while I understood that we were not actually talking about concurrency per se, but rather how to make sure that we don’t update the database using stale data.
In other words making sure that we don’t overwrite somebody else’s data.
Since I strongly disapprove of the very concept of ORM’s, I thought this would be a perfect example to explain how simple this really is.

Basically we have three ways to approach this problem.

  • No concurrency check - Last write wins which might be fine for quite a number of cases.
  • Using a version/timestamp column - Always include a “version” column where the “new” values must match the value in the database.
  • Using the old values from the modified columns.
CREATE TABLE Persons
(
    PersonId int,
    FirstName varchar(50),
    LastName varchar(50),
    Version int
)

Update without concurrency check

UPDATE Persons
SET
    FirstName = "NewName"
WHERE 
    PersonId = 1      

Without the concurrency check it will always be “last write wins”

Update using a version column

UPDATE Persons
SET
    FirstName = "NewName"
WHERE 
    PersonId = 1 AND
    Version = 100     

The update will only succeed if the version number matches the given number.

It should be fairly easy to ensure that the Version column is always presents in the update statement using an update trigger on the table. Besides validating the presence of the version column in a trigger ensures that this will work regardless of the tool used to access the database.

Update using modified columns

UPDATE Persons
SET 
    FirstName = "NewName"
WHERE 
    PersonId = 1 AND
    FirstName = "OldName"   

This approach allows concurrent updates to the same row as long as the updates does not update the same columns. The disadvantage of this model is that it requires the client to keep track of the old column values and this tends to add complexity to the data access layer. There is also an indexing problem as we have no guarantee that we have an index that covers the columns used in the where clause of the update statement.

Conclusion

In terms of simplicity and something that is easy to understand and implement, I recommend going with a version column. This does not have to be an integer as shown here, but can also be a datetime/timestamp or the like. And believe me, you don't need NHibernate or any other crap ORM tool to implement concurrent writes to you database. Keep things simple and you will live happily ever after.

No comments: