Help with parallelism?

43 views
Skip to first unread message

Adam DeConinck

unread,
Feb 25, 2012, 11:53:59 AM2/25/12
to juli...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey all,

I know the dev list isn't necessarily the best place for "I think I'm
doing this wrong" messages, but I don't know if there's a better forum
yet. Feel free to point me there if so. :-)

To try out Julia's parallelism, I tried out writing a basic parallel
word-count program. I have two working functions "wordcount(text)",
which produces a HashTable of word counts from a string "text" (map);
and "wcreduce(wcs...)" which produces aggregate counts when passed one
or more of these HashTables (reduce).

The problem arises when I try to run this on multiple cores.
Ultra-simple case: split the text into two chunks, @spawn each to a
worker, fetch them and reduce.

$ julia -p 2

julia> @everywhere load("wordcount.j")
...
julia> ref1 = @spawn wordcount(chunk1)
RemoteRef(1,1,3)

julia> ref2 = @spawn wordcount(chunk2)
RemoteRef(2,1,4)

julia> wc1 = fetch(ref1)
<shows successful wordcount HashTable here>

julia> wc2 = fetch(ref2)

...And then it just hangs. This happens whenever I try to @spawn and
then fetch two parallel processes. I've tried a few silly things
(like doing julia -p 3 and only using 2 cores), but the same thing
always happens. When I finally Ctrl-C out of this, I see the
following error:

julia> wc2 = fetch(ref2)
^Cdeserialization error:

But nothing else.

I realize there's probably a more elegant way to do the
parallelization than this, but the case of doing two @spawns and then
coming back later to fetch() them should work. Am I missing something
obviously wrong in what I'm doing?

(If relevant, my wordcount and wcreduce are in
https://github.com/ajdecon/julia-wordcount.)

Thanks!
Adam

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iQEcBAEBAgAGBQJPSRGZAAoJEPRLxJtcEabADjAH/0Pwh5Jf9zs00lgieToIlNEK
CmptGchXfSffMDCrlI/cxVeV1xsw1lq3vjAsFgFCQkhJG7nNXTDoHZgrsiQAe8Uz
7OrDp/FBqtZIukiM83vJOUHspCN+CmI/9ZlwswqESqUGKSjj/DfDXUKbFryALQE5
89tc4+vjCj6IFMVDqJYb1lgPbTzPMmEXLgkudvPWPIlnux1grJlbk+0iTYZ6kplS
omuYeslLhC7YDEwETdIu63QqLGQiyn45piyRpRQITzHDpBrZQo2mbwbagkoBru6K
+sHPOQS53W0jQFl9YgElN/txZWEYvOiGUgA5fRMlpqw1eFbB1PnN+QDgko/LD1k=
=1qmC
-----END PGP SIGNATURE-----

Viral Shah

unread,
Feb 25, 2012, 2:41:26 PM2/25/12
to juli...@googlegroups.com
Can you file this as an issue so that we don't lose this?

I cloned your repo and tried it out, but I don't even get as far as you.


julia> @everywhere load("wordcount.j")

julia> ref1 = @spawn wordcount(chunk1)
RemoteRef(1,1,3)

julia> exception on 1: in anonymous: chunk1 not defined

-viral

Adam DeConinck

unread,
Feb 25, 2012, 2:58:25 PM2/25/12
to juli...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/25/2012 01:41 PM, Viral Shah wrote:
> Can you file this as an issue so that we don't lose this?
>
> I cloned your repo and tried it out, but I don't even get as far as you.
>
>
> julia> @everywhere load("wordcount.j")
>
> julia> ref1 = @spawn wordcount(chunk1)
> RemoteRef(1,1,3)
>
> julia> exception on 1: in anonymous: chunk1 not defined
>

Oops! I skipped a step in my previous email. chunk1 and chunk2 are just
two blocks of text and aren't defined in wordcount.j. Sorry for the
confusion.

To duplicate my issue, try running the testrun.j in my repo: "julia -p 2
testrun.j".

I'll file an issue.

Thanks,
Adam


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iQEcBAEBAgAGBQJPST1aAAoJEPRLxJtcEabA3+4H/iT/ORGVKaFPjjMbDYsSz1vB
k7TKFJzOAvrZX2odu1+UV/oz3d+yLkjKY92hhxiK1PGOqiMbkVbOBq1da8xpCKU/
ArT6cxXONYiY2hgqwWX8BgKz6Ncet6bEcbJdCgj31Qt/mSFhd1lmAEGzVUlQYQR7
2tKJaGdOvh8q1N1JexmmIHTPbITBsg3ZoaVYRTqQhSMmzujF0Q0oFNI/aC861tyt
kD8vkVFCR0nfHHRJNOWA0kOm6kU/hhvDjYUOVm+54HsiALUM+Izhu2DjtwY3sx2s
+SDPDyDj5HOdotUq/sBi/TZxnMdh5Lfydpk3YXKC0NCFLsuNbTehwf56oZv49u4=
=0FqI
-----END PGP SIGNATURE-----

Viral Shah

unread,
Feb 25, 2012, 3:11:10 PM2/25/12
to juli...@googlegroups.com
Yes, now I get the same behaviour. Just tried running on one processor, and I get this:

julia> load("testrun.j")
Spawning process 1...
Spawning process 2...
Fetching process 1...
Fetching process 2...
Done fetching.
key not found: follow
in ref at /Users/viral/julia/j/table.j:19
in wcreduce at wordcount.j:23
in include at src/boot.j:192
in load at /Users/viral/julia/j/util.j:174
in load at /Users/viral/julia/j/util.j:186
in wcreduce at wordcount.j:28
in include at src/boot.j:192
in load at /Users/viral/julia/j/util.j:174
in load at /Users/viral/julia/j/util.j:186
at testrun.j:24
in include at src/boot.j:192
in load at /Users/viral/julia/j/util.j:174
in load at /Users/viral/julia/j/util.j:186
in load at /Users/viral/julia/j/util.j:197

-viral

Jeff Bezanson

unread,
Feb 25, 2012, 3:47:25 PM2/25/12
to juli...@googlegroups.com
HashTables need custom serialize/deserialize methods written because
of the undefined references in their arrays. Pretty sure this is the
problem. I haven't gotten around to it yet.

Stefan Karpinski

unread,
Feb 25, 2012, 4:01:58 PM2/25/12
to juli...@googlegroups.com
Open an issue?

Adam DeConinck

unread,
Feb 25, 2012, 4:18:03 PM2/25/12
to juli...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

On 02/25/2012 02:47 PM, Jeff Bezanson wrote:
> HashTables need custom serialize/deserialize methods written
> because of the undefined references in their arrays. Pretty sure
> this is the problem. I haven't gotten around to it yet.
>

That makes sense, I think. I'm noticing that when I do
single-processor runs (like Viral) the calls do succeed... Are calls
to the "local" processor succeeding because @spawn (and associated
methods) are smart enough not to bother with serialization? Or is
there something more complex going on?

I've opened an issue for this problem, referencing this thread:
https://github.com/JuliaLang/julia/issues/463

> On Sat, Feb 25, 2012 at 3:11 PM, Viral Shah <vi...@mayin.org>
> wrote:
>> Yes, now I get the same behaviour. Just tried running on one
>> processor, and I get this:
>>
>> julia> load("testrun.j") Spawning process 1... Spawning process
>> 2... Fetching process 1... Fetching process 2... Done fetching.
>> key not found: follow in ref at /Users/viral/julia/j/table.j:19
>> in wcreduce at wordcount.j:23

Odd... I explicitly catch KeyErrors in wcreduce at that line, as
that's how I check whether a new key needs to be added or I can
accumulate the word count.

try
counts[k]=counts[k]+v # Line 23
catch ex
if typeof(ex)==KeyError
counts[k]=v
else
throw(ex)
end
end


I wonder why the error made it out of the function rather than being
handled here... For what it's worth, when I run this myself with one
processor, the function succeeds. I'll try duplicating this; what
commit are you running on?

Thanks for all the help tracking this down, guys!

Cheers,
Adam

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iQEcBAEBAgAGBQJPSVACAAoJEPRLxJtcEabAZGYH/0jMud5xMiH1QRE9wbGJpS3h
BMXrCrIhqZv8gp7Kl7LhuPDrWx1yftgSecwVEfKjpzQRBySbr5sixc+rmynQTtCL
LPJZRqzwHjIpqjJCDKr9TC3pncKgtYM0Ar+L2O1g1jBU0hRMh2qy8IR740o10a8b
j0VxIG8V3YzEFzjYaqyh8siaRA20RWVa5Gx8YbHuU2Gz7dfKacFshcK6K7TGRRBs
Eh8ltrvFE4jsOr2lsuxzEinOvR78eCYrrIq5m346Hn5fCGEb5nsBVt9PxWOcozUZ
brtahR0+5JUfVXn0CBwGA/VqYP0Y8NGtP9/mw54jP0styIBhhECLO/L/SR0vtiQ=
=priW
-----END PGP SIGNATURE-----

Viral Shah

unread,
Feb 25, 2012, 11:45:10 PM2/25/12
to juli...@googlegroups.com
Now, the second ccall does succeed after Jeff's fixes with 2 processors, but the error still remains. I am now on commit 10aabddc3834223568a87721149d05765e7e9997

~/julia/julia -p 2 testrun.j

Spawning process 1...
Spawning process 2...
Fetching process 1...
Fetching process 2...
Done fetching.
key not found: follow
in ref at /Users/viral/julia/j/table.j:19

in wcreduce at string:23


in include at src/boot.j:192

in process_options at /Users/viral/julia/j/client.j:163
in _start at /Users/viral/julia/j/client.j:201
in wcreduce at string:28


in include at src/boot.j:192

in process_options at /Users/viral/julia/j/client.j:163
in _start at /Users/viral/julia/j/client.j:201


at testrun.j:24
in include at src/boot.j:192

in process_options at /Users/viral/julia/j/client.j:163
in _start at /Users/viral/julia/j/client.j:201


-viral

Jeff Bezanson

unread,
Feb 26, 2012, 12:04:56 AM2/26/12
to juli...@googlegroups.com
Sorry, I screwed something up. Fixed.

Incidentally, much better than the try/catch would be

counts[w] = get(counts, w, 0)+1

Adam DeConinck

unread,
Feb 26, 2012, 12:29:36 AM2/26/12
to juli...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/25/2012 11:04 PM, Jeff Bezanson wrote:
> Sorry, I screwed something up. Fixed.

I've confirmed that the fix works for me.

>
> Incidentally, much better than the try/catch would be
>
> counts[w] = get(counts, w, 0)+1
>

This works much better as well! I was not aware of the default value
for this function. The try/catch did feel a little hacky to me, but I
was distracted by the fetch() issue. :-)

Thanks!
Adam


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iQEcBAEBAgAGBQJPScM4AAoJEPRLxJtcEabAWiUIALSqhs03yJGhEhgtacyQm1V3
5MQAzzEg995+RNl4uvIq+choTCtmtWyld6Ic6gtNRrAc3vL49Tm/VRGD3dpR2sGN
xXXMFtVhG+bueMwQbcLie+BHAY5TDUpz1oyh9BxoMoGoIqLNqEawrXRlOJGBsVHt
Xv8hbaoPRRKRUnEdOyRN8/By41h4k3UKwAeWkH02RvYuGIivLHIHcpaeQACSbLq/
7cC2Pf4CG6nP66qYP90nA2Iw+1sqOh6zi4eSDGjaAr6NAewCqjkbgPzMsjGiv8An
u3QgmarZSt7s5y5LCeTfUjCGgIsdfD8Af+DclSDg+mY6jGqULXp1UPnUkXrnfVU=
=rrE4
-----END PGP SIGNATURE-----

Viral Shah

unread,
Feb 26, 2012, 12:29:52 AM2/26/12
to juli...@googlegroups.com
Sweet, works now! This is a nice example also. We should develop it to its fullest extent (more processors, adding/removing processors on the fly, maybe even some crawling when httpclient is ready, indexing the dataset, etc.) and include it in the examples.

-viral

Reply all
Reply to author
Forward
0 new messages