pmap not respecting srand set seed settings

188 views
Skip to first unread message

Steve Kay

unread,
Feb 26, 2015, 10:37:19 AM2/26/15
to julia...@googlegroups.com
There is a really nice example of using pmap for parallel bootstrapping purposes on http://juliaeconomics.com/2014/06/18/parallel-processing-in-julia-bootstrapping-the-mle/ .
If you rerun the code however, it's clear that the pmap function does not respect the set seed command srand. I've tried various small changes but nothing is working (although a lot of nuances around parallel computing are a bit above my level). Is it possible to  get pmap to respect the seed setting? I hope there is as pmap is superb otherwise.

Any help much appreciated.

Best,

Steve

Ivar Nesje

unread,
Feb 26, 2015, 10:58:05 AM2/26/15
to julia...@googlegroups.com
No, that is not possible with pmap.

Ivar Nesje

unread,
Feb 26, 2015, 11:04:30 AM2/26/15
to julia...@googlegroups.com
I think something like @everywhere srand(seed) would partially work, but you'd still suffer from non determinism in the scheduler that might run different portions of the array in different processes depending on the current load on your computer.

Andreas Noack

unread,
Feb 26, 2015, 11:16:25 AM2/26/15
to julia...@googlegroups.com
@everywhere srand(seed) would give reproducibility, but it would probably not be a good idea since the exact same random variates will be generated on each process. Maybe something like 

for p in workers()
@spawnat p srand(seed + p)
end

However, out RNG gives no guarantees about independence of these stream, but for this bootstrap example I would be surprised if the generated variates wouldn't be good enough.

Steve Kay

unread,
Feb 26, 2015, 3:04:45 PM2/26/15
to julia...@googlegroups.com
Thanks for the comments. - nice to know it's not my usual programming inadequacies. I like the 

for p in workers()
@spawnat p srand(seed + p)
end

idea. It would be even better if instead of resetting the seed it did a (imaginary) @spawnat p jumpahead(seed,(p*X))  where X was larger than the number of bootstrap reps for each worker. Here jumpahead(seed,b) took the state of the random number generator when seed=seed and then moves it on b steps. Very easy for me to come up with imaginary commands  - way past my ability to actually program them! 

Andreas Noack

unread,
Feb 26, 2015, 4:58:40 PM2/26/15
to julia...@googlegroups.com
Would that be to get the exact same variates as the serial execution would create?

Greg Plowman

unread,
Feb 27, 2015, 5:40:50 AM2/27/15
to julia...@googlegroups.com
How interesting. This was the exact same example I discovered to help develop my first parallel simulation.
The key take away for me was splitting into two files:
Tip: For parallel processing in Julia, separate your code into two files, one file containing the functions and parameters that need to be run in parallel, and the other file managing the processing and collecting results. Use the require command in the managing file to import the functions and parameters to all processors.
Originally the "required" file (that runs on all processors) seeded the RNG (`srand(2)`), which I thought was odd. As Andreas pointed out, this just means you'll run the same thing n times on n workers.
So I changed it to srand(seed + myid()), which is pretty much equivalent to Andreas':
for p in workers() @spawnat p srand(seed + p) end
Would that be to get the exact same variates as the serial execution would create?

Well for me, I want "random" seeding but with reproducibility.
Mostly I want "randomly"-seeded random sims, rather than predetermined (which doesn't seem very random). But I want the ability to rerun with same seed if required)
For serial execution I could achieve this by seeding with say the current time, and outputting the seed with the sim results.
For parallel, I could do similar, say <current time> + <worker id>
But there is the question of RNG stream independence (apparently more so when the seeds are very similar).
So jump ahead would be ideal in my case, effectively allowing independent, parallel random seeding, but with reproducibility.

Steve Kay

unread,
Feb 28, 2015, 9:11:17 AM2/28/15
to julia...@googlegroups.com
Thanks for your response Andreas, Yes I was thinking of being able to reproduce a serial execution. Although that obviously isn't any where near as important as being able to reproduce a piece of work. I've just spent three hours writing a reply to you saying your suggestion doesn't work - only now realised I had my code set up wrong and had put srand=N instead of srand(N). I hate programming!  I just create a user type which has a srand field and pass pmap a vector (one for each worker) of such a type - works a treat.Thanks for your help.  To get around criticism of independence of stream I could just "burn" (as in MCMC runs) a number of draws in each stream (criticism seems to be regarding the first N runs). Does anyone have any idea what size this burn N should be (can't find any papers on it) given the particular MT generator Julia uses? 

Steve

Ivar Nesje

unread,
Feb 28, 2015, 5:35:32 PM2/28/15
to julia...@googlegroups.com
If you find evidence that there is anything wrong with the first N random numbers in the standard Random Number Generator (RNG) in Julia, we would consider that a bug, fix it, and release a new point release faster than a Lenovo CTO is able to arrange a meeting with someone who understands the https protocol.

You should still be aware that we optimize for speed and statistical bias, so you should not rely on the built-in RNG for cryptography or other security related purposes.

Andreas Noack

unread,
Mar 2, 2015, 2:30:44 PM3/2/15
to julia...@googlegroups.com
Steve, I don't think that method works. The mapping between the argument to srand and the internal state of the MT is quite complicated. We are calling a seed function in the library we are using that maps an integer to a state vector so srand(1) and srand(2) end up as two quite different streams. I think this is quite on purpose to avoid that many of the same streams get repeated over and over. Also, I think the seed function tries to select a state vector that doesn't require you to burn numbers.
Reply all
Reply to author
Forward
0 new messages