Thank you for your answers. I now understand better what actually happens the example. As I am targeting a shared memory machine, I will have a close look on Ptools later and wait for shared memory support in future versions.
After watching this nice
tutorial, I found a simple way to parallelize the above expression using distributed arrays
a = fetch( @parallel [i for i=1:100000] )
Of course there is no big gain in performance for this simple example. However, when computing some scalar functions it is already useful on my machine (MacBook Pro, nprocs()=4)
----- parallel -------
tic(); a = fetch( @parallel [sin(i)+cos(i) for i=1:100000] ); toc();
elapsed time: 0.010863766 seconds
----- serial -------
tic(); a = [sin(i)+cos(i) for i=1:100000] ; toc();
elapsed time: 0.024743432 seconds