parallel processing question

111 views
Skip to first unread message

Russ P.

unread,
Sep 15, 2016, 8:07:44 PM9/15/16
to scala-user
Suppose I have an list of candidates for solving a problem, where the list is ordered according to some desirability metric. I want to go down the list, check each candidate, and take the first one that succeeds in solving the problem. The list could be fairly long (over 100), and each check for success could take several seconds. The entire process is repeated many times, and the entire running time could be hours. Also, it will eventually have to run in real time, so performance is important.

If the first successful candidate is early in the list, then I am better off just going sequentially through the list and not parallelizing. However, if the first successful candidate is late in the list, I am better off parallelizing. I am wondering if it is possible to get the best of both worlds. Is it possible to run it in parallel but have the whole process stop as soon as the best solution is found? So let's say that the check for candidate number 5 completes and is successful. If candidates 1 through 4 are on record as having failed, then candidate 5 should be selected and all other checks stopped.

This must be a common problem. Is there a standard way of dealing with it? Thanks.

Vlad Patryshev

unread,
Sep 15, 2016, 8:52:55 PM9/15/16
to Russ P., scala-user
This sounds to me pretty similar to the problem of finding top N.
Why not launch the first one that is acceptable, then the second, etc; then, once you get an N+1-th that is better than all the ones before, minus the time(s) spent, you can decide whether to kick out the 'thread' with the first one and launch the new one? 

If I understood you correctly.

Why 'thread'? Because, well, using bare threads in business logic is an abomination (I think). You can have responsive actors instead.

Thanks,
-Vlad

--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Russ P.

unread,
Sep 15, 2016, 9:33:21 PM9/15/16
to scala-user, russ.p...@gmail.com
I don't think you understood the problem. The problem *is* to find the "first one that is acceptable."

Perhaps a bit more context will help here. I'm trying to find an aircraft flight trajectory that is free of conflict with all other (previously assigned) trajectories. I generate a list of candidates by varying a set of parameters, then I order the list by flight time to the end point. The problem is to find the first trajectory in the list that is free of conflict with all other trajectories. It's a lot of computation, and I need to do it efficiently.

Thanks,
-Vlad

To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+...@googlegroups.com.

Matthew Pocock

unread,
Sep 16, 2016, 10:12:48 AM9/16/16
to Russ P., scala-user
Hi Russ,

Depending on how much one flight trajectory shares with another, it may be possible to speed this up by testing the decomposed flight trajectory components for clashes. That way, if a component of the trajectory is shared in a lot of solutions, you get a big speed-up. But assuming this isn't the case, then:

Choose some number N for your parallelism.
Take the top N solutions. Process them in parallel.
As each solution check terminates, see if it is clash free. If so, let the current block of N complete and take the lowest numbered winner.
If the checked solution has a clash, use the same thread/actor/resource to process the next unprocessed solution.
If you run out of solutions to check, stop.

Matthew

To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Dr Matthew Pocock
Turing ate my hamster LTD

Integrative Bioinformatics Group, School of Computing Science, Newcastle University

skype: matthew.pocock
tel: (0191) 2566550

Oliver Ruebenacker

unread,
Sep 16, 2016, 10:43:16 AM9/16/16
to Russ P., scala-user

     Hello,

  A typical approach is to have N dedicated threads. Each thread picks the next available from the list, checks it, and if it is not a solution, picks the next. If  solution is found, it is reported and then, all threads working on less desirable candidates stop immediately, and threads working on more desirable candidates finish their candidate and then stop.

  It is not hard to do this using Threads directly. Make sure access to the list is synchronized. There are also many libraries you can use, for example Akka, and some streaming libraries.

  The size of N depends on your resources and costs. If you have a number of dedicated processor cores, you can minimize time by choosing as many threads as cores. However, if the number of threads approaches the number of candidates, it becomes wasteful. Also, if one candidate takes a much longer than others, and there are too many threads, it can happen that all other candidates have finished and we are still waiting for the long one. Experiment with different N, plot the time it takes against N and think how much the gain in time is worth.

     Best, Oliver


--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Oliver Ruebenacker
Senior Software Engineer, Diabetes Portal, Broad Institute

Russ P.

unread,
Sep 16, 2016, 3:11:06 PM9/16/16
to scala-user
Thanks to everyone for the helpful suggestions. I will try them later today when I get a chance.

Russ P.

unread,
Sep 18, 2016, 5:45:33 PM9/18/16
to scala-user, russ.p...@gmail.com
I took Matthew's suggestion of doing N at a time in parallel. It seems to work well as far as I can tell. As for selecting N, I figure I should use the number of cores or slightly less (since some cores could be busy with web browsing and such). My laptop has 8 cores, and my desktop machine has 32. So I googled and found the following method:

  val ncores = Runtime.getRuntime().availableProcessors()

I then subtract 2 and use that number for the parallelism factor. The "grouped" method of Vector made it straightforward.

Thanks again for the suggestions.

PUB Razvan Cojocaru

unread,
Sep 26, 2016, 4:49:38 AM9/26/16
to Russ P., scala-user
If you leave it up to the default parallel collection library, I think it should do all that for you, including N selection:

Array(1,2,3).par.find(BigCalculation(_))
Reply all
Reply to author
Forward
0 new messages