SQL Server Read Store Performance when using Guids for Aggregate ID values

362 views
Skip to first unread message

@nbarnwell

unread,
Jun 25, 2012, 12:48:21 PM6/25/12
to ddd...@googlegroups.com
I've just submitted the first patch for a greenfield DDD/CQRS/ES app to our DBA team and they've come back with (amongst other things) one very specific point.

I've chosen in my Application Services to generate the AggregateID values for newly-created aggregates as Guids (I'm using .NET so quite literally Guid.NewGuid()).

As per the typical design, the Aggregates generate Events, which are shipped asynchronously to some apps responsible for updating various read stores. One is Neo4j, one is SQL Server. For the SQL Server read store, I have a table similar to this:

create table Thing (
Id uniqueidentifier not null,
[Version] int not null,
Name varchar(255) not null,
constraint PK_Trip primary key (Id)
)
GO

The usual issues with Guids is being suggested (performance, clustered index fragmentation etc). Given however that this isn't an OLTP db (my event store is RavenDB using my Regalo CQRS library), do I have options here? I'm thinking things like having a different clustering key value (that's incremental for clustering purposes though I actually ignore it) to the PK (my application-generated aggregate ID).

Many thanks in advance.

Neil.

João Bragança

unread,
Jun 25, 2012, 1:16:00 PM6/25/12
to ddd...@googlegroups.com
Since you are already using, Raven, why not use it for the read model
too? What you could do is let Raven generate its own id using hi lo.
Use the index replication bundle to get it into sql server.

Ephraim Mower

unread,
Jun 25, 2012, 1:19:37 PM6/25/12
to ddd...@googlegroups.com
I've done something similar to this to solve this issue in the past:


Some more information on dealing with concerns such as byte ordering:


-Cork

@yreynhout

unread,
Jun 25, 2012, 5:06:46 PM6/25/12
to ddd...@googlegroups.com
Have they measured? 16byte indexes are bigger than 4byte indexes, but sequential guids make up for a lot of the "problems". They only start appearing when you're above 1 million rows (and again, seq guid make for steady insert speed and do not cause said fragmentation). I've built several stores with hundreds of millions of rows in one table full of guids, indexed on several columns, not had many problems with that.

OTOH natural partitioning is your friend, find it. This, to me, is the first thing you should be looking at. It maybe date ranges, it maybe value ranges ...

@nbarnwell

unread,
Jun 26, 2012, 7:40:36 AM6/26/12
to ddd...@googlegroups.com
No we've not measured yet because it's so new. I don't want to fall foul of premature optimisation etc, but it would seem unwise to not listen to their concerns and see if the "read store" nature of this db means there are some tricks I can use.

Could you expand a little on the "natural partitioning" aspect? One idea we've had is to have the PK be a identity int column that the app essentially ignores, but that allows the "real" PK (the Guid aggregate ID) to be in a *non-clustered* index. Are you suggesting a similar idea, but for something like a "created" date or other value?

I found a resource on generating sequential guids in C# on codeproject, is generating them by hand what you were suggesting? http://www.codeproject.com/Articles/388157/GUIDs-as-fast-primary-keys-under-multiple-database

Dan Normington

unread,
Jun 26, 2012, 9:40:22 AM6/26/12
to ddd...@googlegroups.com
Neil
 What you're explaining in your last post is similar to what we did. We created an "identity" column called "EventId" and assigned it as the primary clustered index. Then, we created a nonclustered unique index consisting of the AggregateId and the Version. Below is the table structure I used for a presentation I gave. Again, it was just a presentation so I'm sure it's not perfect but you get the idea.

CREATE TABLE [dbo].[Event](
[EventId] [bigint] IDENTITY(1,1) NOT NULL,
[AggregateId] [uniqueidentifier] NOT NULL,
[Version] [int] NOT NULL,
[EventDate] [datetime] NOT NULL,
[EventType] [varchar](1000) NOT NULL,
[EventData] [varchar](max) NOT NULL,
 CONSTRAINT [PK_Event] PRIMARY KEY CLUSTERED 
(
[EventId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY],
 CONSTRAINT [NK_Event] UNIQUE NONCLUSTERED 
(
[AggregateId] ASC,
[Version] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

@nbarnwell

unread,
Jun 26, 2012, 10:31:48 AM6/26/12
to ddd...@googlegroups.com
Great. The only thing to note is that I'm talking about a read store table as opposed to an event store table, but it sounds like the technique is sound from a SQL Server performance point of view.

Thanks.

Dan Normington

unread,
Jun 26, 2012, 10:45:49 AM6/26/12
to ddd...@googlegroups.com
We did apply the same "pattern" for our read model tables as well. The primary reason was due to fragmentation.

Jeffrey T. Fritz

unread,
Jun 26, 2012, 11:26:56 AM6/26/12
to ddd...@googlegroups.com
My team has taken the notion of sequential Guids and turned it on its head with an idea from Rinat in this list:

Construct the sequential Guid by hand with the natural partition data built in. A Guid is 16 bytes:

4 bytes for the number of seconds between date created and application project start date. 
2 bytes for tenantId of the entity
2 bytes for the originating server that created the data point
4 bytes for the userid who created the entity
Fill the rest with zeroes

With this strategy we have a sortable Guid, and we can identify parts of the key that make sense. It's easy to look at a primary key in an orders table and see which tenant it belongs to, because we know which bytes in the Guid to look at. 

Thoughts?


Jeff

Werner Clausen

unread,
Jun 27, 2012, 2:46:27 AM6/27/12
to ddd...@googlegroups.com
 
That might not be the best option for SQL server. As I understand it, the sequence has to originate from a specific algorithm in order to qualify as a sequence that Sql server indexing is happy with. At least if they are stored as uniqueIdentifiers. That said, I haven't been able to figure out how Sql server actually indexes uniqueIdentifiers and ultimately what makes up a good sequence. All I have are those common known examples (NHibernate) on how to create a good (not defined further) Sql server sequential guid. I'd love to hear that my assumption is wrong though - using Guids that are readable like this seems very nice.

Szymon Pobiega

unread,
Jun 27, 2012, 2:15:28 PM6/27/12
to ddd...@googlegroups.com
What I find most difficult with RDBMS-based event stores is dispatching events to multiple consumers. The mechanism has to support adding consumers after the system is operational (and this means dispatching all events since the beginning of time) and has to have relatively small delay when processing in real time manner. 

Szymon

2012/6/27 Werner Clausen <item...@hotmail.com>

Peter Ritchie

unread,
Jun 27, 2012, 3:42:26 PM6/27/12
to ddd...@googlegroups.com
In addition to what Werner mentioned, you should get sequence and origin "out of the box" with GUIDs:  http://blogs.msdn.com/b/ericlippert/archive/2012/04/30/guid-guide-part-two.aspx 

Changing the semantics of a GUID through manual creation breaks these contracts (i.e. your GUID would break an sequence/location comparison with other, natural, GUIDs).  I'm also not a big fan of creating a single composite value like this when individual values or columns would suffice.  It's essentially a value with multiple responsibilities, e.g. something to solve independent, potentially conflicting, problems and thus coupling them.

Cheers -- Peter

Nicolás Erlijman

unread,
Sep 4, 2015, 11:32:07 PM9/4/15
to DDD/CQRS
@Dan,

EventId is used in the application layer or you are only using AggregateId?
I mean, EventId won't be used to perform any db join? 

Ben Kloosterman

unread,
Sep 5, 2015, 12:37:17 AM9/5/15
to ddd...@googlegroups.com

worrying about index size is premature optimization. If a large database is important create the data and test. What is an issue as @reynhout alluded to is Guid primary keys ,, by default SQL makes it the clustered key  which is guaranteed to be bad moving chunks around on every insert unless the data is very static  so use createddate for the clustered index - the clustered index does not need to be the key,.

Nicolás Erlijman

unread,
Sep 7, 2015, 9:38:12 PM9/7/15
to DDD/CQRS
@Dan,

What is the column used for the Joins? AggregateId, right ? (as is the one created from the app layer)
I use CreatedDate for the clustered one, with GetDate(), i think is the same.


On Tuesday, June 26, 2012 at 10:40:22 AM UTC-3, Dan Normington wrote:
Reply all
Reply to author
Forward
0 new messages