basic parallel processing question

조회수 62회
읽지 않은 첫 메시지로 건너뛰기

Russ Paielli

읽지 않음,
2015. 9. 30. 오후 6:11:2815. 9. 30.
받는사람 scala-user
I finally decided to try my hand at some basic parallel processing. However, I am a bit confused.

Suppose I have something like this:

var candidates = List[Maneuver]() // empty list

for (par1 <- parameters1) {
  ...
  for (par2 <- parameters2) {
    ...
    val candidate = bigCalc(par1, par2, ...)

    if (candidate.successful) list :+= candidate
    }
  }

candidates.sortBy(_.cost)

As you can see, I am building a list of candidate solutions for a problem (air traffic conflict resolution). The calculation of each candidate requires significant computation. I would like to parallelize this by simply adding ".par" to paremeters1 and parameters2. However, I don't think that would be wise given that I am appending to a local variable to store the list of candidates. What is the simplest way to do this right? Will a for/yield work better? Thanks.

--

Kevin Wright

읽지 않음,
2015. 9. 30. 오후 6:26:1815. 9. 30.
받는사람 Russ Paielli, scala-user
Assuming the processing of each parameter is isolated, I use this pattern a lot:

def slowAsyncFunc(param: String): Future[Int] = ...
val params: Seq[String] = ...
val results: Future[Seq[Int]] = Future.traverse(param)(slowAsyncFunc)




--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Kevin Wright
mail: kevin....@scalatechnology.com
gtalk / msn : kev.lee...@gmail.com
vibe / skype: kev.lee.wright
steam: kev_lee_wright

"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra

Oliver Ruebenacker

읽지 않음,
2015. 9. 30. 오후 6:34:0015. 9. 30.
받는사람 Russ P., scala-user


     Hello,

  I think you want to use fold methods.

     Best, Oliver

--

Rex Kerr

읽지 않음,
2015. 9. 30. 오후 6:46:3915. 9. 30.
받는사람 Russ Paielli, scala-user
I'd probably use Futures instead.  But you could

(for (p1 <- pars1) ... yield (p1, p2, p3, ..., pN)).par.flatMap{ case (p1,...,pN) =>
  val candidate = bigCalc(p1, ..., pN)
  if (candidate.successful) candidate :: Nil
  else Nil
}

which would do approximately the same thing.  That is, given that all the work is in the big calculation, you should just generate the parameter set serially in advance, and then parallelize the computation at the end.

  --Rex


--

Kevin Wright

읽지 않음,
2015. 9. 30. 오후 6:48:2915. 9. 30.
받는사람 Russ Paielli, scala-user
Expanding on that a bit to match your example, using Future.sequence this time:


val params1: Seq[String] = ...
val params2: Seq[String] = ...

def findCandidate(p1: String, p2: String): Future[Maneuver] = ...

val candidates: Future[Seq[Maneuver]] = Future.sequence{
  for {
    p1 <- params1
    p2 <- params2
  } yield findCandidate(p1,p2)
}

val successfulCandidates: Future[Seq[Maneuver]] = candidates map { _.filter(_.successful) }

Russ Paielli

읽지 않음,
2015. 9. 30. 오후 7:23:0515. 9. 30.
받는사람 scala-user
Let me ask a simpler question. Is a for/yield comprehension safe for parallelization? Suppose I have something like this:

for {
  parameter <- parameters.par
  candidate = bigCalc(parameter, ...)
  if candidate.successful
  }
yield candidate

Note the ".par" (for parallelization) on parameters. Is that guaranteed to be safe here, or could I still have a race condition? Thanks.


On Wed, Sep 30, 2015 at 3:11 PM, Russ Paielli <russ.p...@gmail.com> wrote:

Jed Wesley-Smith

읽지 않음,
2015. 9. 30. 오후 7:49:5615. 9. 30.
받는사람 Russ Paielli, scala-user
That's a slightly complicated question. In general, using a for comprehension across multiple sub-expressions explicitly chains those expressions/operations, as the results of one are available to the next one:

  for {
    a <- aa
    b <- bb(a)
    c <- cc(b)
  } yield c

this will happen even if the results are not used in subsequent expressions, eg:

  for {
    a <- aa
    b <- bb
    c <- cc
  } yield (a, b, c)

The reason is that if we desugar, we get: 

  aa.flatMap { a => bb(a).flatMap { b => cc(b).map { c => c } } }

we are explicitly threading the result of the previous computation into the next one via the A => M[B] function passed to flatMap. This is indeed the point of and power of monads, to chain computations together!

In the second example above however, we are not necessarily passing anything through, we are merely combining the results once they are done, and this can in theory be done in parallel. In order to do this we need something less specific than the Monad pattern, we need something known as Applicative.

Unfortunately, Scala's for comprehensions don't support applicative composition, only monadic, so if you want to do that then you need to use the unsugared versions.

Despite all this, your question was slightly misleading as you asked whether for/yield is safe for parallel for a very specific example where you are simply flatMapping over a parallel collection, which contains the parallelism internally – in this case it is absolutely safe and works fine – with the usual caveat that your expressions should be pure and contain no side-effects.

cheers,
jed.

--

Russ P.

읽지 않음,
2015. 10. 1. 오전 12:11:1315. 10. 1.
받는사람 scala-user, russ.p...@gmail.com, j...@wesleysmith.io
Thanks for the explanation, but I'm not sure I understand what you are saying. Let's take your second example:

  for {
    a <- aa
    b <- bb
    c <- cc
  } yield (a, b, c)

Are you saying that I can't parallelize this as follows:

  for {
    a <- aa.par
    b <- bb.par
    c <- cc.par
  } yield (a, b, c)

If I can't do that, can I do something like this:

  val tups = for {
    a <- aa
    b <- bb
    c <- cc
  } yield (a, b, c)

for ((a, b, c) <- tups.par) ...

That would achieve the same effect, wouldn't it? Or did I miss something?

Jed Wesley-Smith

읽지 않음,
2015. 10. 1. 오전 3:05:2715. 10. 1.
받는사람 Russ P., scala-user
I'm afraid I'm a bit confused as to what you want to achieve.

Russ Paielli

읽지 않음,
2015. 10. 1. 오전 3:14:2115. 10. 1.
받는사람 scala-user
I want to achieve a computational speedup with parallel processing.

Naftoli Gugenheim

읽지 않음,
2015. 10. 8. 오전 12:59:4515. 10. 8.
받는사람 Russ P., scala-user, j...@wesleysmith.io


On Thu, Oct 1, 2015, 12:11 AM Russ P. <russ.p...@gmail.com> wrote:

Thanks for the explanation, but I'm not sure I understand what you are saying. Let's take your second example:

  for {

    a <- aa

    b <- bb

    c <- cc

  } yield (a, b, c)

Are you saying that I can't parallelize this as follows:

  for {

    a <- aa.par

    b <- bb.par

    c <- cc.par

  } yield (a, b, c)

If I can't do that, can I do something like this:

  val tups = for {

    a <- aa

    b <- bb

    c <- cc

  } yield (a, b, c)

I think you can do (aa, bb, cc).zipped

To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+unsub...@googlegroups.com.

Oliver Ruebenacker

읽지 않음,
2015. 10. 8. 오전 7:05:4515. 10. 8.
받는사람 Jed Wesley-Smith, Russ Paielli, scala-user

     Hello,

  Sorry, I don't understand why you couldn't do for-comps for parallel collections. Supposedly, parallel collections can do map and flatMap in parallel, that's the whole point, isn't it?

     Best, Oliver
--
Oliver Ruebenacker
Senior Software Engineer, Diabetes Portal, Broad Institute

전체답장
작성자에게 답글
전달
새 메시지 0개