Simple parallel for loop example

4,241 views
Skip to first unread message

Lars Ruthotto

unread,
Nov 6, 2013, 11:08:38 PM11/6/13
to julia...@googlegroups.com
I am relatively new to Julia and doing some simple experiments. So far, I am very impressed by it's nice and intuitive syntax and performance. Good job!

However, I have a simple question regarding parallel for loops the manual could not answer for me. Say I am interested in parallelizing this code

a = zeros(100000)
for i=1:100000
  a[i] = i
end

In the manual it is said (and I verified) that

a = zeros(100000)
@parallel for i=1:100000
  a[i] = i
end

does not give the correct result. Unfortunately it does not say (or I couldn't find it) how this can be done in Julia? Does anyone have an idea?

Thanks!
Lars

Stefan Karpinski

unread,
Nov 7, 2013, 5:20:47 PM11/7/13
to Julia Users
Julia's parallelism is distributed, so you are trying to write to unshared memory from multiple processes. Which won't be very effective. You could make `a` into a distributed array and then each processor could write its own parts of `a`, but I'm not sure that will actually give you any kind of speedup. There is no user-level interface to shared-memory parallelism currently.

Tim Holy

unread,
Nov 7, 2013, 6:25:50 PM11/7/13
to julia...@googlegroups.com
On Thursday, November 07, 2013 02:20:47 PM Stefan Karpinski wrote:
> There is no user-level interface to shared-memory parallelism
> currently.

True, but there's the PTools package:
https://github.com/amitmurthy/PTools.jl
and you're welcome to grab the code here:
https://github.com/JuliaLang/julia/pull/4580

--Tim

Billou Bielour

unread,
Nov 8, 2013, 4:19:58 AM11/8/13
to julia...@googlegroups.com
I've been using the pmap example from the documentation:

function pmap(f, lst)
    np = nprocs()  # determine the number of processes available
    n = length(lst)
    results = cell(n)
    i = 1
    # function to produce the next work item from the queue.
    # in this case it's just an index.
    nextidx() = (idx=i; i+=1; idx)
    @sync begin
        for p=1:np
            if p != myid() || np == 1
                @async begin
                    while true
                        idx = nextidx()
                        if idx > n
                            break
                        end
                        results[idx] = remotecall_fetch(p, f, lst[idx])
                    end
                end
            end
        end
    end
    results
end

I have a quite expansive function f and I just want to run it on each processor on my local machine. It works well with 3 process, but any additional ones crash without explicit error message (just "connection lost"). I'm not sure why... one reason may be that the worker run out of memory, as my function f depends on a parameter vector and some large data matrices. Any idea ?

Alan Edelman

unread,
Nov 8, 2013, 8:52:56 AM11/8/13
to julia...@googlegroups.com
It would be good if the documentation would give clarity as to what is happening.
Here's a good college try at such understanding

What you see is that every process has it's own local a
and in some round robin fashion the local a[i] is getting i
(which presumably is wiped out at the end of the call)

Warning: printing from other processors is still flaky
If you run this several times you will get varied printing , but the one
I copied is the clearest


In [1]:
addprocs(4)
Out[1]:
4-element Array{Any,1}:
 2
 3
 4
 5
In [24]:
 
@everywhere a = [1 2 3 4]
@parallel  for i=1:4
 a[i]=i*1000
 print(a)
end
 
 
	From worker 3:	1000	2	3	4
	From worker 4:	1	2000	3	4
	From worker 5:	1	2	3000	4
	From worker 2:	1	2	3	4000



Lars Ruthotto

unread,
Nov 8, 2013, 2:21:57 PM11/8/13
to julia...@googlegroups.com
Thank you for your answers. I now understand better what actually happens the example. As I am targeting a shared memory machine, I will have a close look on Ptools later and wait for shared memory support in future versions.

After watching this nice tutorial, I found a simple way to parallelize the above expression using distributed arrays

a = fetch( @parallel [i for i=1:100000] )

Of course there is no big gain in performance for this simple example. However, when computing some scalar functions it is already useful on my machine (MacBook Pro, nprocs()=4)

----- parallel -------
tic(); a = fetch( @parallel [sin(i)+cos(i) for i=1:100000] ); toc();
elapsed time: 0.010863766 seconds

----- serial -------
tic(); a =  [sin(i)+cos(i) for i=1:100000] ; toc();
elapsed time: 0.024743432 seconds

Jiahao Chen

unread,
Nov 12, 2013, 2:47:47 PM11/12/13
to julia...@googlegroups.com
> tic(); a = fetch( @parallel [sin(i)+cos(i) for i=1:100000] ); toc();

In this example, it looks like the fetch() doesn't do anything. @parallel [...] creates a DArray, and fetch(DArray) returns the same DArray.

Joachim Dahl

unread,
Aug 18, 2014, 6:36:57 AM8/18/14
to julia...@googlegroups.com
I came across this post wondering about the same.  After reading the current documentation it is not clear to me whether parallelizing such a loop using shared memory is easily achieved in Julia 0.3,  or if the same difficulty remains. 

Bradley Setzler

unread,
Aug 18, 2014, 10:32:17 AM8/18/14
to julia...@googlegroups.com
I found that the easiest way was to use two files - one file contains the function to be run in parallel, the other file uses Require() to load the function in parallel, and pmap to call the function.


Best,
Bradley

Alex

unread,
Sep 9, 2014, 3:42:02 PM9/9/14
to julia...@googlegroups.com
Bradley, 

That's an awesome tutorial. Thanks for putting that together. 

Lars Ruthotto

unread,
Sep 10, 2014, 9:21:54 AM9/10/14
to julia...@googlegroups.com
Thanks, Bradley. I really like your example and in fact I have played with pmap already. I think it is a great tool for getting into distributed computing since - as far as I know - pmap sends the different input variables to different workers and communicates back the result).

In some cases shared memory access might be more feasible (such as in the example I posted above). Does anybody how to do that in parallel?
Reply all
Reply to author
Forward
0 new messages