Dependent Processing with Millions of Records without Backlogging?

47 views
Skip to first unread message

kraythe

unread,
May 31, 2016, 12:31:29 PM5/31/16
to Akka User List
I am seeking advice from those more experienced about some ideas for dealing with dependent processing. The use case is simple though I have to be a bit coy because of NDAs. I have some ideas myself but was hoping some of you with more experience than I could throw out some suggestions. 

I have an entity that ranges in the millions of them, we can call this entity A (the actual type is irrelevant). When entity A is updated it requires that entity B (the sorting entity) be updated depending upon the sort of fields from all the As in each B. Now to keep everything in synch and not be dependent on outside forces we update all the As in a batch every minute to get updates from external systems. So at each minute 1 to 10 million As could be updated and we never know where within that range. Currently we are not using an actor system. We have an endpoint that hits a play server which fetches all the data to update the As and then we wait for all As to be updated then feed those updated values into the Bs to sort and categorize them. We cant split out the sort because the order is dependent on every other A in the B entity. We cant do away with the sort either. 

I want to move this to an actor system. What I cant do is resort the list every time I get a single UpdatedA message. If I did, the B would be perpetually sorting and getting behind. I also need to keep all the As in synch in B so I cant update them and wait to sort them on a one minute interval. I have to collect a batch of A updates and do the sort all at once. I also cant do the entire update process of updating As from the external events in the B actor because that would mean the load isn't distributed and that would be prohibitive. Sorting a 10 million entry list is about 4 seconds, sorting and updating all 10 million would be over a minute and not possible. 

I had thought of possibly using distributed pub-sub and having the system collect A update messages and then periodically sending itself a message to update the As in the B and sort them. Another idea (though I have no clue how to do this) would be to collect A updates and then if we stop getting them for a few seconds, do an update and sort ay that time. 

Can anyone think of any other way they might suggest architecting a solution to this problem with actors? 

Thanks in advance.

Viktor Klang

unread,
May 31, 2016, 1:11:50 PM5/31/16
to Akka User List

Hi,

What alternate solutions do you have? (not worrying about implementation for now)

--
Cheers,

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

kraythe

unread,
May 31, 2016, 2:52:31 PM5/31/16
to Akka User List
Sorry you mean other ideas? 

I had considered doing the update of the As inside the Bs since the events that update A are a lot smaller than the number of As. The problem is that this would extend the update time for B to be so excessive that it would back up the processing of the Bs. I also debated making some kind of trigger that would go fetch current state of the As at some particular moment in time but the problem is that would require returning millions of objects across the entire cluster and that seems excessive and not scalable. Distributed Pub-Sub seems to be the right solution. One other idea is to have a specific batching actor that collects messages and sends them all to another topic in batches. In this way only these processors would listen to the "A updated" topic and publish to a different topic for B. Other than that, I don't have other ideas at the moment. 

In fact one question would be how I can call a bunch of actors with the same message across the cluster and get the responses all in bulk instead of 10 million individual messages. 

- -Robert

Viktor Klang

unread,
Jun 1, 2016, 4:18:45 AM6/1/16
to Akka User List
The question is why you need to update them all the time, who's observing their state?
--
Cheers,

kraythe

unread,
Jun 1, 2016, 10:44:09 AM6/1/16
to Akka User List
Users are observing their state via web pages. And no, it's not viable to the business to remove this ability,

Viktor Klang

unread,
Jun 1, 2016, 11:31:16 AM6/1/16
to Akka User List

So If no users are looking why does it need to be already updated?

--
Cheers,

On Jun 1, 2016 4:44 PM, "kraythe" <kra...@gmail.com> wrote:
Users are observing their state via web pages. And no, it's not viable to the business to remove this ability,

kraythe

unread,
Jun 1, 2016, 2:04:34 PM6/1/16
to Akka User List
Pardon? I just said the users are observing the state and observing it change in real time.

Roland Kuhn

unread,
Jun 1, 2016, 2:56:01 PM6/1/16
to akka-user
It seems that your initial description is too vague for direct help—that sorting aspect is not really clear, at least to me.

kraythe

unread,
Jun 1, 2016, 4:43:37 PM6/1/16
to Akka User List
Perhaps, though honestly the entire problem scope is in the post, just generically expressed. :) 

I cant be more specific because of NDAs though. Basically the B's job is to maintain a sort order of a specific collection of As and when As mutate resort them. It also and manage other metadata associated with the collection of As. No you cant avoid the sort, its part of the core business feature. 

At any rate I am going to go with an internal batching strategy. I cant think of anything else.

-- Robert

Roland Kuhn

unread,
Jun 1, 2016, 5:11:08 PM6/1/16
to akka...@googlegroups.com
Yes, that sounds reasonable. Perhaps a kind of natural batching that accumulates changes as long as the previous sort is being executed?

Sent from my iPhone

kraythe

unread,
Jun 2, 2016, 1:40:40 PM6/2/16
to Akka User List
I am collecting the batch updates in a map, replacing old updates if new ones arrive and then when I process the update sort message, I clear the batch. I had debated forking off the sort into a completable future but that could lead to nasty complexity. I think I will try this approach. The only thing is the ranking object gets HAMMERED by users so I ant one per node, I have decided on a replicated actor strategy. There is a risk that one of them could be slightly out of synch if it misses an update message but thats a risk that is preferable to copying the object or sending all traffic to one node. 

Incidentally along this line I need to send the updates to all the As and I am using distributed pub-sub but I was wondering if I sent it with an ActorSelection with a wild card, would the update only be copied once to be sent to remote nodes? I wouldn't want it to be copied to each destination individually but rather sent to the remote once and then sent to inboxes of target actors. 
Reply all
Reply to author
Forward
0 new messages