Parallel Collections fail

46 views
Skip to first unread message

Johannes Rudolph

unread,
Jul 4, 2011, 7:18:28 AM7/4/11
to scala-l...@googlegroups.com
Hi all,

this weekend, I was stumped that abstracting over parallel and
sequential collections with the generic GenSeq etc types is completely
broken. See this example:

val items = (0 to 10000)
val parItems: collection.GenSeq[Int] = if (true) items.par else items.seq
parItems.map {
_ => Thread.currentThread.getName
}

ParVector(Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Threa...

In summary, an operation is never executed in parallel if the static
type of the collection is not a parallel one which is defeating the
purpose of the generic collections construction completely. This is,
because dynamic dispatch - as I would have expected it - is
circumvented by deciding if something should be run in parallel by the
instance of the CanBuildFrom which is chosen statically instead of at
runtime.

I find this a bit upsetting since we had quite a discussion months ago
about the design of the parallel collections and it seems no one
(including me) did actually test that the resulting implementation did
what was promised.

Another problem here is that you can't write

val parItems = if (true) items.par else items.seq

without the type annotation, because type inference fails badly here
(though that seems to be fixed in trunk already).

--
Johannes

-----------------------------------------------
Johannes Rudolph
http://virtual-void.net

Aleksandar Prokopec

unread,
Jul 4, 2011, 11:25:06 AM7/4/11
to scala-language
I believe this is due to the fact that the `map` in `ParIterableLike`
checks whether the builder factory is parallel, instead of
instantiating the builder and checking if the builder is parallel.

I'll fix this.

Cheers,
Alex

On Jul 4, 1:18 pm, Johannes Rudolph <johannes.rudo...@googlemail.com>
wrote:

Aleksandar Prokopec

unread,
Jul 5, 2011, 6:56:26 AM7/5/11
to scala-language
One question is: How would you write a unit test for this?
One possibility I'm thinking of is to compare times with a version
known to be sequential. But since partest runs tests in parallel
already, such a test would fail, since there would not be enough
processors available to speed up execution.

Cheers,
Alex

√iktor Ҡlang

unread,
Jul 5, 2011, 7:04:43 AM7/5/11
to scala-l...@googlegroups.com
On Tue, Jul 5, 2011 at 12:56 PM, Aleksandar Prokopec <aleksanda...@gmail.com> wrote:
One question is: How would you write a unit test for this?
One possibility I'm thinking of is to compare times with a version
known to be sequential. But since partest runs tests in parallel
already, such a test would fail, since there would not be enough
processors available to speed up execution.

Assert on the behavior of the builder you said was wrong?
 

Cheers,
Alex


>
> I find this a bit upsetting since we had quite a discussion months ago
> about the design of the parallel collections and it seems no one
> (including me) did actually test that the resulting implementation did
> what was promised.
>



--
Viktor Klang

Akka Tech Lead
Typesafe - Enterprise-Grade Scala from the Experts

Twitter: @viktorklang

Tony Morris

unread,
Jul 5, 2011, 7:08:26 AM7/5/11
to scala-l...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/07/11 20:56, Aleksandar Prokopec wrote:
> One question is: How would you write a unit test for this?

forall s, t in ExecutionStrategy. s(f) == t(f)

- --
Tony Morris
http://tmorris.net/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4S8KoACgkQmnpgrYe6r629YwCfeeDBjt20cyuY5u6qaqSgHdBL
bCsAnjn6Y/0q3gHgw66wh/Uiac4UPAdV
=Dtk7
-----END PGP SIGNATURE-----

Chris Marshall

unread,
Jul 5, 2011, 7:17:58 AM7/5/11
to scala-l...@googlegroups.com
If the underlying pool were pluggable, you could insert a "sequential" pool in the tests which tracks whether it was used or not:

def beginTest = { pool = new TestPool; Parallel.pool = pool }

def runTest {
  ...
  assert(pool.wasUsed)
}

def endTest = { pool.shutdown() }

Johannes Rudolph

unread,
Jul 5, 2011, 7:24:25 AM7/5/11
to scala-l...@googlegroups.com
Hi Aleksandar,

On Mon, Jul 4, 2011 at 5:25 PM, Aleksandar Prokopec
<aleksanda...@gmail.com> wrote:
> I believe this is due to the fact that the `map` in `ParIterableLike`
> checks whether the builder factory is parallel, instead of
> instantiating the builder and checking if the builder is parallel.
>
> I'll fix this.

Thanks for taking care of this, should I create a ticket anyways?

Aleksandar Prokopec

unread,
Jul 5, 2011, 9:42:25 AM7/5/11
to scala-language
Hi!
Sure, you can do that, just in case I've missed something when I was
making changes.

As for the omitting the type annotation, I've just observed in the
REPL that a structural type is inferred, not GenSeq[Int].

The builder that gets resolved is only visible inside the collection
method.
Plugging a custom pool may be the way to go for tests.

Cheers,
Alex


On Jul 5, 1:24 pm, Johannes Rudolph <johannes.rudo...@googlemail.com>
wrote:
> Hi Aleksandar,
>
> On Mon, Jul 4, 2011 at 5:25 PM, Aleksandar Prokopec
>
Reply all
Reply to author
Forward
0 new messages