[akka-persistence] throughput and benchmarks

1,180 views
Skip to first unread message

Carsten Saathoff

unread,
Mar 15, 2014, 1:17:05 PM3/15/14
to akka...@googlegroups.com
Hi,

I am currently evaluating akka-persistence, specifically with respect to its throughput. I want to use it in a system consisting of approximately 1000000 actors, each representing an aggregate root, distributed over a sharded cluster of actor systems. Each of these actors should become a processor (or probably an eventsourced processor). So I am interested in how many messages a single actor system can persist per second. In a typical scenario each actor will receive a single message, but I want the time the system takes to persist all messages to be as short as possible.

I wrote a simple test case without any sharding, but having a similar internal structure: https://github.com/kodemaniak/akka-persistence-throughput-test

A sender sends messages to each ID in the system. All messages pass through a region actor which routes the messages according to the id contained therein and which creates a child actor per ID on demand. I am able to persist 2500-3000 msgs per second on my MacBook Pro (Mid 2010) with a SSD when the actors are recovered. During recovery it is around 1000 msgs/second.

When I replace the region actor with a single receiver that receives and persists all messages, the throughput increases by one magnitude, i.e., >20k msgs/s when the actor is initialized.

My assumption would have been that throughput is independent of the number of actors persisting messages. And in any case, I would not have expected one magnitude difference. Additionally, both numbers seem to be lower that what I've read before about the performance (50k msgs/s IIRC, though that's obviously hardware dependent), although the numbers with a single processor are very close.

Am I doing anything wrong or are the numbers as expected? Is it a bad idea to have many processors in a system? Are there any official benchmarks available, maybe with code?

Thanks and best regards

Carsten

Daniel Wang

unread,
Mar 15, 2014, 10:50:49 PM3/15/14
to akka...@googlegroups.com
I'm also very interested in your benchmark result. Carsten, please share more data when they are ready.
Have you tried  a customized serializer for your message? I wonder if the default Java serializer is too slow.

Carsten Saathoff

unread,
Mar 16, 2014, 4:43:36 AM3/16/14
to akka...@googlegroups.com
Am Sonntag, 16. März 2014 03:50:49 UTC+1 schrieb Daniel Wang:
I'm also very interested in your benchmark result. Carsten, please share more data when they are ready.
Have you tried  a customized serializer for your message? I wonder if the default Java serializer is too slow.

Hi,

nope, I have not tested different serializers, yet. It's on my todo list, but first I want to understand the behaviour described above.

best

Carsten

√iktor Ҡlang

unread,
Mar 16, 2014, 8:29:28 AM3/16/14
to Akka User List

In retrospect, having JavaSerializer was probably a mistake.

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Patrik Nordwall

unread,
Mar 16, 2014, 3:47:24 PM3/16/14
to akka...@googlegroups.com
Hi Carsten,


On Sat, Mar 15, 2014 at 6:17 PM, Carsten Saathoff <car...@kreuzverweis.com> wrote:
Hi,

I am currently evaluating akka-persistence, specifically with respect to its throughput. I want to use it in a system consisting of approximately 1000000 actors, each representing an aggregate root, distributed over a sharded cluster of actor systems. Each of these actors should become a processor (or probably an eventsourced processor). So I am interested in how many messages a single actor system can persist per second. In a typical scenario each actor will receive a single message, but I want the time the system takes to persist all messages to be as short as possible.

I wrote a simple test case without any sharding, but having a similar internal structure: https://github.com/kodemaniak/akka-persistence-throughput-test

A sender sends messages to each ID in the system. All messages pass through a region actor which routes the messages according to the id contained therein and which creates a child actor per ID on demand. I am able to persist 2500-3000 msgs per second on my MacBook Pro (Mid 2010) with a SSD when the actors are recovered. During recovery it is around 1000 msgs/second.

What journal are you using? The bottleneck will be in the IO to the data store. LevelDB is not really an option in a clustered system.
 

When I replace the region actor with a single receiver that receives and persists all messages, the throughput increases by one magnitude, i.e., >20k msgs/s when the actor is initialized.

My assumption would have been that throughput is independent of the number of actors persisting messages. And in any case, I would not have expected one magnitude difference. Additionally, both numbers seem to be lower that what I've read before about the performance (50k msgs/s IIRC, though that's obviously hardware dependent), although the numbers with a single processor are very close.

When using a single processor there is a huge difference between a command sourced Processor and an EventsourcedProcessor. The reason is that the command sourced Processor can take advantage of a dynamic batching optimization which will reduce the number of roundtrips and fsyncs for LevelDB.

On my MacBook Pro: 2,3 GHz Intel Core i7, SSD
110201.58 persistent commands per second
10204.87 persistent events per second

When I fire up 100 EventsourcedProcessor, I see around 117 events per second in each.
 

Am I doing anything wrong or are the numbers as expected? Is it a bad idea to have many processors in a system?

I don't think it's a bad idea.
 
Are there any official benchmarks available, maybe with code?

No, we have not benchmarked much. There is only akka.persistence.PerformanceSpec.

I think the journal implementation, the used data store, and serialization will be the biggest factors.

Looking forward to see your results.

/Patrik
 

Thanks and best regards

Carsten

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--

Patrik Nordwall
Typesafe Reactive apps on the JVM
Twitter: @patriknw

Carsten Saathoff

unread,
Mar 17, 2014, 3:38:15 AM3/17/14
to akka...@googlegroups.com
Hi Patrik,


Am Sonntag, 16. März 2014 20:47:24 UTC+1 schrieb Patrik Nordwall:
On Sat, Mar 15, 2014 at 6:17 PM, Carsten Saathoff <car...@kreuzverweis.com> wrote:
Hi,

I am currently evaluating akka-persistence, specifically with respect to its throughput. I want to use it in a system consisting of approximately 1000000 actors, each representing an aggregate root, distributed over a sharded cluster of actor systems. Each of these actors should become a processor (or probably an eventsourced processor). So I am interested in how many messages a single actor system can persist per second. In a typical scenario each actor will receive a single message, but I want the time the system takes to persist all messages to be as short as possible.

I wrote a simple test case without any sharding, but having a similar internal structure: https://github.com/kodemaniak/akka-persistence-throughput-test

A sender sends messages to each ID in the system. All messages pass through a region actor which routes the messages according to the id contained therein and which creates a child actor per ID on demand. I am able to persist 2500-3000 msgs per second on my MacBook Pro (Mid 2010) with a SSD when the actors are recovered. During recovery it is around 1000 msgs/second.

What journal are you using? The bottleneck will be in the IO to the data store. LevelDB is not really an option in a clustered system.

In the small test above I am using LevelDB. In the real system we are going to use a HBase backed journal (very probably at least).

The main reason I ended up like that was that I wanted to measure the impact on performance when making certain actors persistent. However, after having obtained the first numbers, I wasn't sure how to explain them and that's why I am actually asking here ;)
 
When I replace the region actor with a single receiver that receives and persists all messages, the throughput increases by one magnitude, i.e., >20k msgs/s when the actor is initialized.

My assumption would have been that throughput is independent of the number of actors persisting messages. And in any case, I would not have expected one magnitude difference. Additionally, both numbers seem to be lower that what I've read before about the performance (50k msgs/s IIRC, though that's obviously hardware dependent), although the numbers with a single processor are very close.

When using a single processor there is a huge difference between a command sourced Processor and an EventsourcedProcessor. The reason is that the command sourced Processor can take advantage of a dynamic batching optimization which will reduce the number of roundtrips and fsyncs for LevelDB.

In my tests I am using a command sourced processor. I already read before that event sourcing will further impact the performance.
 

On my MacBook Pro: 2,3 GHz Intel Core i7, SSD
110201.58 persistent commands per second
10204.87 persistent events per second

How have these numbers be obtained? If I can achieve those numbers with a lot of Porcessors in the system, I am more than happy ;)
 
When I fire up 100 EventsourcedProcessor, I see around 117 events per second in each.
 

Am I doing anything wrong or are the numbers as expected? Is it a bad idea to have many processors in a system?

I don't think it's a bad idea.
 
Are there any official benchmarks available, maybe with code?

No, we have not benchmarked much. There is only akka.persistence.PerformanceSpec.

Will have a look at it, thanks!
 

I think the journal implementation, the used data store, and serialization will be the biggest factors.

Yeah, I thought so. However, as I wrote above, my main concern right now is the large difference in numbers between a single Processor and many. 

Thanks again

Carsten 

Patrik Nordwall

unread,
Mar 17, 2014, 4:06:28 AM3/17/14
to akka...@googlegroups.com
On Mon, Mar 17, 2014 at 8:38 AM, Carsten Saathoff <car...@kreuzverweis.com> wrote:
Hi Patrik,

Am Sonntag, 16. März 2014 20:47:24 UTC+1 schrieb Patrik Nordwall:
On Sat, Mar 15, 2014 at 6:17 PM, Carsten Saathoff <car...@kreuzverweis.com> wrote:
Hi,

I am currently evaluating akka-persistence, specifically with respect to its throughput. I want to use it in a system consisting of approximately 1000000 actors, each representing an aggregate root, distributed over a sharded cluster of actor systems. Each of these actors should become a processor (or probably an eventsourced processor). So I am interested in how many messages a single actor system can persist per second. In a typical scenario each actor will receive a single message, but I want the time the system takes to persist all messages to be as short as possible.

I wrote a simple test case without any sharding, but having a similar internal structure: https://github.com/kodemaniak/akka-persistence-throughput-test

A sender sends messages to each ID in the system. All messages pass through a region actor which routes the messages according to the id contained therein and which creates a child actor per ID on demand. I am able to persist 2500-3000 msgs per second on my MacBook Pro (Mid 2010) with a SSD when the actors are recovered. During recovery it is around 1000 msgs/second.

What journal are you using? The bottleneck will be in the IO to the data store. LevelDB is not really an option in a clustered system.

In the small test above I am using LevelDB. In the real system we are going to use a HBase backed journal (very probably at least).

The main reason I ended up like that was that I wanted to measure the impact on performance when making certain actors persistent. However, after having obtained the first numbers, I wasn't sure how to explain them and that's why I am actually asking here ;)
 
When I replace the region actor with a single receiver that receives and persists all messages, the throughput increases by one magnitude, i.e., >20k msgs/s when the actor is initialized.

My assumption would have been that throughput is independent of the number of actors persisting messages. And in any case, I would not have expected one magnitude difference. Additionally, both numbers seem to be lower that what I've read before about the performance (50k msgs/s IIRC, though that's obviously hardware dependent), although the numbers with a single processor are very close.

When using a single processor there is a huge difference between a command sourced Processor and an EventsourcedProcessor. The reason is that the command sourced Processor can take advantage of a dynamic batching optimization which will reduce the number of roundtrips and fsyncs for LevelDB.

In my tests I am using a command sourced processor. I already read before that event sourcing will further impact the performance.
 

On my MacBook Pro: 2,3 GHz Intel Core i7, SSD
110201.58 persistent commands per second
10204.87 persistent events per second

How have these numbers be obtained? If I can achieve those numbers with a lot of Porcessors in the system, I am more than happy ;)

with the PerformanceSpec
Run with:
sbt -Dakka.persistence.performance.cycles.load=200000 -Dakka.persistence.performance.cycles.warmup=10000 "project akka-persistence-experimental" "test-only akka.persistence.PerformanceSpec"

 
 
When I fire up 100 EventsourcedProcessor, I see around 117 events per second in each.
 

Am I doing anything wrong or are the numbers as expected? Is it a bad idea to have many processors in a system?

I don't think it's a bad idea.
 
Are there any official benchmarks available, maybe with code?

No, we have not benchmarked much. There is only akka.persistence.PerformanceSpec.

Will have a look at it, thanks!
 

I think the journal implementation, the used data store, and serialization will be the biggest factors.

Yeah, I thought so. However, as I wrote above, my main concern right now is the large difference in numbers between a single Processor and many. 

I think that is because of the batching that can be utilised fully by a single command sourced Processor. Using one Processor holding data for thousands of entities will introduce other scalability problems compared to many small entities, that can more easily be sharded, passivated, and so on.

I see no reason why a journal implementation could not batch operations and then it should not matter if you use many or few processors. Note this line in the hbase journal doc: "Even though in the code it looks like it issues one Put at a time, this is not the case, as writes are buffered and then batch written thanks to AsyncBase."

/Patrik
 

Thanks again

Carsten 

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Carsten Saathoff

unread,
Mar 17, 2014, 8:45:39 AM3/17/14
to akka...@googlegroups.com
Hi,


Am Montag, 17. März 2014 09:06:28 UTC+1 schrieb Patrik Nordwall:

On my MacBook Pro: 2,3 GHz Intel Core i7, SSD
110201.58 persistent commands per second
10204.87 persistent events per second

How have these numbers be obtained? If I can achieve those numbers with a lot of Porcessors in the system, I am more than happy ;)

with the PerformanceSpec
Run with:
sbt -Dakka.persistence.performance.cycles.load=200000 -Dakka.persistence.performance.cycles.warmup=10000 "project akka-persistence-experimental" "test-only akka.persistence.PerformanceSpec"

I ran those tests as well.
Output is in the following gist:


The eventsourcing and persistent channel test don't even terminate successfully. Only the command sourcing works partially, but the numbers are way worse than your numbers. I ran those tests on a desktop PC, 16 GB RAM (sbt gets 4GB), SATA Harddisks (RAID 1) and a Core i5 processor.

I am doing a lot of data importing on this machine with HBase, and my feeling is that given the hardware a higher throughput should be possible. So I really start to wonder whether anything else is wrong on my machine.

Do you have any comparative numbers fpr systems without a SSD?

best

Carsten

Patrik Nordwall

unread,
Mar 17, 2014, 9:42:39 AM3/17/14
to akka...@googlegroups.com
The only thing I have is from our build servers.
48 core AMD Opteron 6172 2.1 GHz, spinning disk
-Dakka.persistence.performance.cycles.load=200000 -Dakka.persistence.performance.cycles.warmup=10000
37709.13 persistent commands per second
2209.10 persistent events per second

Note that sbt forks the test and the heap is therefore default size (256MB in your case). Not sure if that matters.

/Patrik
 

best

Carsten

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Carsten Saathoff

unread,
Mar 17, 2014, 9:55:48 AM3/17/14
to akka...@googlegroups.com
Using a (custom) HBase journal, I am able to persist 10k commands per second, both using a single processor (slightly above 10k) and using multiple processors (slightly below 10k). In that respect, HBase seems to scale better. The journal is not optimized, so probably there is potential for more.

Carsten Saathoff

unread,
Mar 17, 2014, 9:59:45 AM3/17/14
to akka...@googlegroups.com
Hi,


Am Montag, 17. März 2014 14:42:39 UTC+1 schrieb Patrik Nordwall:



On Mon, Mar 17, 2014 at 1:45 PM, Carsten Saathoff <car...@kreuzverweis.com> wrote:
Hi,

Am Montag, 17. März 2014 09:06:28 UTC+1 schrieb Patrik Nordwall:

On my MacBook Pro: 2,3 GHz Intel Core i7, SSD
110201.58 persistent commands per second
10204.87 persistent events per second

How have these numbers be obtained? If I can achieve those numbers with a lot of Porcessors in the system, I am more than happy ;)

with the PerformanceSpec
Run with:
sbt -Dakka.persistence.performance.cycles.load=200000 -Dakka.persistence.performance.cycles.warmup=10000 "project akka-persistence-experimental" "test-only akka.persistence.PerformanceSpec"

I ran those tests as well.
Output is in the following gist:


The eventsourcing and persistent channel test don't even terminate successfully. Only the command sourcing works partially, but the numbers are way worse than your numbers. I ran those tests on a desktop PC, 16 GB RAM (sbt gets 4GB), SATA Harddisks (RAID 1) and a Core i5 processor.

I am doing a lot of data importing on this machine with HBase, and my feeling is that given the hardware a higher throughput should be possible. So I really start to wonder whether anything else is wrong on my machine.

Do you have any comparative numbers fpr systems without a SSD?

The only thing I have is from our build servers.
48 core AMD Opteron 6172 2.1 GHz, spinning disk
-Dakka.persistence.performance.cycles.load=200000 -Dakka.persistence.performance.cycles.warmup=10000
37709.13 persistent commands per second
2209.10 persistent events per second

Note that sbt forks the test and the heap is therefore default size (256MB in your case). Not sure if that matters.

OK, so that's also slower than your benchmarks from before. I don't think the heap is an issue here. But all tests I performed locally on my machine provide consistent results in the respect, that LevelDB on my machine stays around 3k commands/second. I am a bit surprised, but maybe that's the way it is. At least currently I don't have an idea what could be wrong.

Thanks so far

Carsten

Patrik Nordwall

unread,
Mar 17, 2014, 10:05:37 AM3/17/14
to akka...@googlegroups.com
On Mon, Mar 17, 2014 at 2:55 PM, Carsten Saathoff <car...@kreuzverweis.com> wrote:
Using a (custom) HBase journal, I am able to persist 10k commands per second, both using a single processor (slightly above 10k) and using multiple processors (slightly below 10k). In that respect, HBase seems to scale better. The journal is not optimized, so probably there is potential for more.

Thanks for the update.
/Patrik
 

Am Montag, 17. März 2014 13:45:39 UTC+1 schrieb Carsten Saathoff:
Hi,

Am Montag, 17. März 2014 09:06:28 UTC+1 schrieb Patrik Nordwall:

On my MacBook Pro: 2,3 GHz Intel Core i7, SSD
110201.58 persistent commands per second
10204.87 persistent events per second

How have these numbers be obtained? If I can achieve those numbers with a lot of Porcessors in the system, I am more than happy ;)

with the PerformanceSpec
Run with:
sbt -Dakka.persistence.performance.cycles.load=200000 -Dakka.persistence.performance.cycles.warmup=10000 "project akka-persistence-experimental" "test-only akka.persistence.PerformanceSpec"

I ran those tests as well.
Output is in the following gist:


The eventsourcing and persistent channel test don't even terminate successfully. Only the command sourcing works partially, but the numbers are way worse than your numbers. I ran those tests on a desktop PC, 16 GB RAM (sbt gets 4GB), SATA Harddisks (RAID 1) and a Core i5 processor.

I am doing a lot of data importing on this machine with HBase, and my feeling is that given the hardware a higher throughput should be possible. So I really start to wonder whether anything else is wrong on my machine.

Do you have any comparative numbers fpr systems without a SSD?

best

Carsten

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Daniel Wang

unread,
Mar 17, 2014, 11:53:02 AM3/17/14
to akka...@googlegroups.com, akka...@googlegroups.com
Thanks for sharing. 
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to a topic in the Google Groups "Akka User List" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/akka-user/7GE495Ks-5c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to akka-user+...@googlegroups.com.

Pavel Zalunin

unread,
Mar 17, 2014, 7:36:17 PM3/17/14
to akka...@googlegroups.com
Hi,

We are going to use akka-persistance for our application and we are running our things on ec2, I tried to start PerformanceSpec on m1.medium instance, it gives ~ 10k persistent commands per sec (but actually test fails).

Also tried to start this spec on basic DigitalOcean (512RAM, SSD) instance, it gives ~ 20k persistent commands per sec, so looks like disk io is a bottleneck here.

https://gist.github.com/whiter4bbit/9610488 - output for ec2 m1.medium instance.

Pavel.




You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages