Re: [akka-user] Actor throttling strategies

684 views
Skip to first unread message

Ryan LeCompte

unread,
Feb 11, 2013, 12:43:55 AM2/11/13
to akka...@googlegroups.com
Instead of the producer actively pushing work to the consumers, why not have the consumers ask the producer for work? Check out this article from Derek Wyatt for inspiration:


Ryan




On Sun, Feb 10, 2013 at 8:40 PM, Michael Pollmeier <michael....@googlemail.com> wrote:
We just run into a situation that feels like projects will have, so I wanted to discuss some options to solve it.
We've got an Actor that produces work very quickly (i.e. it just pulls records from the database) and sends that work as individual messages to some (routed) Actors who do some long-time processing on those.
Our problem is that the producer is creating work much faster than all the worker actors will be able to work off. Those messages will eventually eat up our heap space, so I guess we need to throttle the producer.

Those are some solutions we thought about, however all seem to have caveats. I'd like to know if there's a better one out there and which one you would choose:
1) have thousands of worker actors. That doesn't work for us because they depend on a database which is our actual bottleneck.
2) use a bounded mailbox size for the worker actors. That would then block the producing Actor when sending even more messages to the workers. Sounds like what we need, however this doesn't work with remote Actors: it doesn't block in that case but sends the message to the Deadletter Queue
3) use Derek Wyatt's PressureQueue (https://github.com/derekwyatt/PressureQueue-Concept/). It's a custom mailbox for the worker actors that delays the sumission of new messages based on the mailbox size. Sounds like a really nice solution, I'm just a bit if that's `the Akka way`, also because I don't see anyone actually using it and there's no commits for 10 months.
4) the producer could only pull the IDs from the database and we hope that those fit into memory - i.e. we'd have millions of IDs as messages floating around. The workers then fetch the complete record later on and slowly get the job done. This obviously only works if all the IDs fit into memory.
5) use the TimerBasedThrottler (http://letitcrash.com/post/28901663062/throttling-messages-in-akka-2) which makes our producer only create X amount of work per time unit. The problem here is how do I get the X and the time unit? It's only ever going to be a rough guess, so I'm either missing out on performance (if my workers could do faster) or potentially running out of memory (if my workers can't catch up, e.g. because of other load on the system)

Any thoughts?
Michael

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Evan chan

unread,
Feb 11, 2013, 12:50:29 AM2/11/13
to akka...@googlegroups.com, akka...@googlegroups.com
Another way to go is to have an acknowledgement system. When each batch of data makes it through to the end consumer, the end consumer sends a message back to the producer, acking the message completion. This is really easy with futures.   The producer keeps track of the number of outstanding unack'ed batches, and stops producing if the outstanding number is too high.   We are using this pattern. 

-Evan
Carry your candle, run to the darkness
Seek out the helpless, deceived and poor
Hold out your candle for all to see it
Take your candle, and go light your world
 

Jeremy Pierre

unread,
Feb 11, 2013, 1:35:31 AM2/11/13
to akka...@googlegroups.com
How are you planning on parallelizing this (presumably you are on some level)?  What datastore are you writing to?  Clearly reading is not your bottleneck so I assume you've got some aggressive caching and your data is fairly denormalized.  Is this the case for your writes as well?  What are the consistency constraints for writes, meaning can you use something more appropriate for a lot of write throughput?

For the producer, does your datastore for reads provide a cursor?  If so, self-throttling as Evan mentioned would be very easy.  What's the likelihood that the data will change while running if you go with your solution #4?

Can you parallelize this over multiple nodes?  If so, you could sort of mix number 4 and Evan's suggestion by pre-segmenting the list of input records, particularly if you know the range of your primary keys (this is the way I'd un-elegantly brute-force it to just get it done).  Even a naive segmentation algorithm with inputs coordinated by some central kick-off actor would work.

Jeremy

Patrik Nordwall

unread,
Feb 17, 2013, 3:33:46 PM2/17/13
to akka...@googlegroups.com
Thanks for sharing Michael.

You might want to watch the registered workers and remove when terminated.

/Patrik


On Sat, Feb 16, 2013 at 10:16 AM, Michael Pollmeier <mic...@michaelpollmeier.com> wrote:
And here's how we ended up implementing it  - including test:
http://www.michaelpollmeier.com/akka-work-pulling-pattern-to-throttle-work/


On Tuesday, 12 February 2013 16:59:10 UTC+13, Michael Pollmeier wrote:
Thanks for your thoughts, guys!
Another problem i found with the PressureQueue concept is that it blocks the producing actor on queuing the message which means it can't react to other messages.

We've come up with a simple acknowledgement idea inspired by Derek's article and Evan's hint:
- workers register themselves at the master
- master sends all workers a WorkAvailable message if it has some work
- workers then request work from master which responds with a piece of work
- when the worker is finished with that piece of work they request more work from the master
- in case there's no more work available the master simply doesn't respond

Best regards
Michael



On Monday, 11 February 2013 04:40:11 UTC, Michael Pollmeier wrote:
We just run into a situation that feels like projects will have, so I wanted to discuss some options to solve it.
We've got an Actor that produces work very quickly (i.e. it just pulls records from the database) and sends that work as individual messages to some (routed) Actors who do some long-time processing on those.
Our problem is that the producer is creating work much faster than all the worker actors will be able to work off. Those messages will eventually eat up our heap space, so I guess we need to throttle the producer.

Those are some solutions we thought about, however all seem to have caveats. I'd like to know if there's a better one out there and which one you would choose:
1) have thousands of worker actors. That doesn't work for us because they depend on a database which is our actual bottleneck.
2) use a bounded mailbox size for the worker actors. That would then block the producing Actor when sending even more messages to the workers. Sounds like what we need, however this doesn't work with remote Actors: it doesn't block in that case but sends the message to the Deadletter Queue
3) use Derek Wyatt's PressureQueue (https://github.com/derekwyatt/PressureQueue-Concept/). It's a custom mailbox for the worker actors that delays the sumission of new messages based on the mailbox size. Sounds like a really nice solution, I'm just a bit if that's `the Akka way`, also because I don't see anyone actually using it and there's no commits for 10 months.
4) the producer could only pull the IDs from the database and we hope that those fit into memory - i.e. we'd have millions of IDs as messages floating around. The workers then fetch the complete record later on and slowly get the job done. This obviously only works if all the IDs fit into memory.
5) use the TimerBasedThrottler (http://letitcrash.com/post/28901663062/throttling-messages-in-akka-2) which makes our producer only create X amount of work per time unit. The problem here is how do I get the X and the time unit? It's only ever going to be a rough guess, so I'm either missing out on performance (if my workers could do faster) or potentially running out of memory (if my workers can't catch up, e.g. because of other load on the system)

Any thoughts?
Michael

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://akka.io/faq/
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--

Patrik Nordwall
Typesafe The software stack for applications that scale
Twitter: @patriknw

Michael Pollmeier

unread,
Feb 19, 2013, 1:43:16 AM2/19/13
to akka...@googlegroups.com
Yep, good point Patrik. Have updated our code and will update the blog
entry / github shortly.

Cheers
Michael
> (https://github.com/__derekwyatt/PressureQueue-__Concept/
> <https://github.com/derekwyatt/PressureQueue-Concept/>).
> It's a custom mailbox for the worker actors that delays the
> sumission of new messages based on the mailbox size. Sounds
> like a really nice solution, I'm just a bit if that's `the
> Akka way`, also because I don't see anyone actually using it
> and there's no commits for 10 months.
> 4) the producer could only pull the IDs from the database
> and we hope that those fit into memory - i.e. we'd have
> millions of IDs as messages floating around. The workers
> then fetch the complete record later on and slowly get the
> job done. This obviously only works if all the IDs fit into
> memory.
> 5) use the TimerBasedThrottler
> (http://letitcrash.com/post/__28901663062/throttling-__messages-in-akka-2
> <http://letitcrash.com/post/28901663062/throttling-messages-in-akka-2>)
> which makes our producer only create X amount of work per
> time unit. The problem here is how do I get the X and the
> time unit? It's only ever going to be a rough guess, so I'm
> either missing out on performance (if my workers could do
> faster) or potentially running out of memory (if my workers
> can't catch up, e.g. because of other load on the system)
>
> Any thoughts?
> Michael
>
> --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ: http://akka.io/faq/
> >>>>>>>>>> Search the archives:
> https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google
> Groups "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to akka-user+...@googlegroups.com
> <mailto:akka-user%2Bunsu...@googlegroups.com>.
> To post to this group, send email to akka...@googlegroups.com
> <mailto:akka...@googlegroups.com>.
> Visit this group at http://groups.google.com/group/akka-user?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
>
>
> --
>
> Patrik Nordwall
> Typesafe <http://typesafe.com/> - The software stack for applications
> that scale
> Twitter: @patriknw
>
Reply all
Reply to author
Forward
0 new messages