Re: [axonframework] DomainEventEntry insert performance with a MySQL backed JPAEventStore

362 views
Skip to first unread message

bui...@gmail.com

unread,
Mar 8, 2013, 3:35:33 PM3/8/13
to axonfr...@googlegroups.com, axonfr...@googlegroups.com
Hey Sebastian,

the main reason for using client generated ID's is that a client can send multiple commands for the same aggregate, without the need to wait for a return value of the first command. I don't think mysql key generator will have any trouble keeping up.
The disruptor commandbus solution solves another problem. I have seen cases where applications would have so many concurrent (and active) connections, that there was high contention. In such a case, using fewer connections in a smart way could help. The disruptor command bus can do exactly that. 

The link you sent is interesting, to say the least. I don't think the UUID has anything to do with it. It is probably purely the use of a unique index. It seems that MySQL has difficulty managing large numbers of unique indexes. With a auto increment, mysql knows it doesn't need to check anything. There can't be any value equal to the generated one. 

In the JPA Event Store, the unique key is the combination of the aggregate identifier and the event's sequence number. Using an auto-increment key in mysql is not really going to be a solution. But it may be interesting to find out if there is a way to remove the need for a unique index...

To be continued...

Allard

On 8 mrt. 2013, at 08:28, Sebastian Ganslandt <seba...@ganslandt.nu> wrote:

Hey!

Allard, when we had a chat a couple of weeks ago you sounded pretty confident that UUID keys on MySQL tables would make for a lesser performance hit that having an auto increment integer key. Your reasoning as I recall it was that you would very soon get congestion around the key generator (and the double round trips to the DB for getting the PK back). However, in the sentence after that you suggested us trying out the disruptor to only have a few threads doing the actual inserts which would mitigate the congestion problem, no? The auto increment key should of course only used locally in the specific table and would prove very useful for doing incremental maintenance jobs on the table as well.

The reason why I'm bringing this up again we keep stumble on reports like http://kccoder.com/mysql/uuid-vs-int-insert-performance/, and I'm curious if you, or any one else have done any similar tests disproving this result? 

Cheers
Sebastian

--
You received this message because you are subscribed to the Google Groups "Axon Framework Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to axonframewor...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Sebastian Ganslandt

unread,
Mar 8, 2013, 4:32:22 PM3/8/13
to axonfr...@googlegroups.com
From what I've understood it has to do with how MySQL lays indexes out on pages. A unique index is fine as long as long as inserts coming together end up on about the same pages. If they do, these pages will be in the InnoDB buffer pool and uniqueness can be checked without going to disk. This will be the case if you for example are doing inserts with an auto increment key, the page where a possible duplicate would live is the same as we inserted to just before and will definitely be in memory. 

With the composite PK of DomainEventEntry that is pretty random with it's aggregate identifier (and it just struck me, this only applies if you are creating many new short lived aggregates) the page containing possible duplicates can be located anywhere and with only 10% of the table in memory it's a 9/10 chance that you will have to read that page from disk to find out.

It would be very nice to not have to count on the specific event persistance mechanism to check for concurrent modifications. Only a single thread executing domain logic would do the trick : ) 

And isn't there a locking repository in Axon? Using that you should be able to skip the unique part?

bui...@gmail.com

unread,
Mar 9, 2013, 4:41:49 AM3/9/13
to axonfr...@googlegroups.com
Hi Sebastian,

something that might help here, is using time-based UUIDs. They are more sequential of nature (duh). They use the machine's mac address and a sequence number when multiple ID's are generated in the same (milli)second. The same machine will generate sequential identifiers, making the chance a page is already in memory a lot larger. 

If you have a single-machine setup, you don't need the event store to check for uniqueness. By default, the repository uses pessimistic locking. The disruptor command bus uses a non-locking algorithm that has the same effect. 

The way databases work with keys isn't really optimized for event stores. An event store doesn't really need to maintain all the used values for the keys. Only the most recent ones per aggregate are probably ever used. But you can't tell a database to only store the latest 100 keys per aggregate, unfortunately. These things are just shouting for a custom event store implementation....

Cheers,

Allard

Sebastian Ganslandt

unread,
Mar 9, 2013, 6:14:59 AM3/9/13
to axonfr...@googlegroups.com
Interesting thinking and actually we are using time base UUIDs which might mean the problem is not as big as it could have been. Would be interesting to do some tests though.

We are on a one machine setup and probably will it will probably suffice for our needs, except for a possible hot spare. So that's great to hear.

And yes, using a relational database for an event store is somewhat crazy, apart from the fact the MySQL has a been pretty battle proofed when it comes to not screwing up the data you put in there. With the recent discussions about whether MongoDB can achieve that you get a little reserved against trying out new and fancy storage engines like Greg Youngs Event Store for example.

Sebastian Ganslandt

unread,
May 14, 2013, 4:29:20 PM5/14/13
to axonfr...@googlegroups.com
Let me share my results from doing the actual tests.

I created 3 tables with a subset of 10M events from our production event store, each with a different configuration and tested insert and replay speed. Inserts where performed sequentially with a SQL-procedure and the data was just md5(rand()). 
 - With the original primary key (aggId, seqNr, type) and replay index (timeStamp, seqNr) we got 1000 inserts done in 10 seconds.
 - With an auto_increment primary key and non unique agg-lookup index (aggId, seqNr, type) the insert speed was around 50% worse. Replay speed was marginally faster.
 - With an MyISAM table instead of InnoDB (created by accident from an create table as select) we got 10000 inserts / sec and replays that where 3-4x faster.

So, if we skip transactions and all the other safety measures of InnoDB we get amazing performance. With just replacing the primary key with an auto increment and dropping the unique index on (aggId, seqNr and type), nothing.

Cheers   
To unsubscribe from this group and stop receiving emails from it, send an email to axonframework+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "Axon Framework Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to axonframework+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "Axon Framework Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to axonframework+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
You received this message because you are subscribed to the Google Groups "Axon Framework Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to axonframework+unsubscribe@googlegroups.com.

Allard Buijze

unread,
May 15, 2013, 2:37:18 AM5/15/13
to Axon Framework Users
Hi Sebastian,

that looks pretty good. The primary auto_increment key results amaze me. I expected it to be faster (as long as the primary key is not assigned as such in the JPA classes). Do you still have the indexes? Some have suggested that the auto-increment primary key will make for smaller indexes (each secundary key contains the primary key it refers to. BigInteger is much smaller that 2 strings and a BigInteger).

MyISAM could be relatively safe to use when storing an entire commit per entry, instead of a single event. It's an idea I've been playing with and have implemented in MongoDB. Might be worth investigating for the JPA Event Store as well.

Thanks for sharing the results!
Cheers,

Allard


To unsubscribe from this group and stop receiving emails from it, send an email to axonframewor...@googlegroups.com.

Sebastian Ganslandt

unread,
May 15, 2013, 3:19:56 AM5/15/13
to axonfr...@googlegroups.com
The results amazes me as well. Completely contrary to what the blog post reported.

I'm not sure if I made it clear, but when using an auto_increment primary key that was added as an additional column on top of the natural key spanning over the uuid aggId, seqNr and type. Using a BigInteger (auto_incr or not) instead of an UUID for aggId should definitely make both key, indexes and the whole table smaller but I can't see that it would have any dramatic effect on performance.

About the index sizes for my test, the summed index size for the original table layout was 956MB and for the table with a bigint primary key and aggLookup- and replay-index on top of that, 1272MB so 33% larger. 

Sebastian
 
Reply all
Reply to author
Forward
0 new messages