Setting random seed

227 views
Skip to first unread message

mban...@googlemail.com

unread,
Apr 23, 2016, 8:56:26 AM4/23/16
to hddm-users
Dear HDDM community,

What is a good way to set the random seed?

I've been a running a few chains in parallel now but when I wanted to inspect them to check for convergence, I noticed that they were all the same. I would like to seed the random number generator independently on each node.

I have tried calling this before sampling from the HDDM object:
pymc.numpy.random.seed(1)
np.random.seed(1)
or
random.seed(1)

None of this works. I was actually thinking that this would be straightforward to implement but it doesn't look that easy to me.

How would you do it?

Best,
Michael

Samuel Mathias

unread,
Apr 23, 2016, 9:21:39 AM4/23/16
to hddm-users
I don't understand. You've set the seed to be the same value each time, and you get the same sample values? This is to be expected. Or are you using a different seed value for each independent chain. The latter should produce different chains.

--
Samuel R. Mathias, Ph.D.
Associate Research Scientist (ARS)
Neurocognition, Neurocomputation and Neurogenetics (n3) Division
Yale University School of Medicine
40 Temple Street, Room 694
New Haven CT 06510
http://www.srmathias.com




--
You received this message because you are subscribed to the Google Groups "hddm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mban...@googlemail.com

unread,
Apr 23, 2016, 9:43:00 AM4/23/16
to hddm-users
Hi Samuel,

Sorry, I was a bit unclear here.

I just learned how to use the computer cluster with my Python code. I split up the sampling procedure into 11 chains. When I looked at the results, all chains had produced identical traces. I therefore wanted to use a different seed for each chain; for instance the number ID of the chain to get different traces each time and still keep it reproducible.

I tried locally if I was able to set the random seed. So I set the seed once ("pymc.numpy.random.seed(1)", "np.random.seed(1)", or "random.seed(1)"), ran a model, set the seed again, ran a new model again, but obtained different results. (I think this caused your confusion: originally, my problem was that the chains were all identical - now, I wanted the models to yield identical results to check that I understood how to set the seed.)

That the results were different is unexpected because Thomas actually says ( https://groups.google.com/forum/#!topic/pymc/av1DDMfUolU ) that it would be possible to do this with numpy.random.seed. I also found this example here, which works fine for me ( http://nbviewer.jupyter.org/gist/aflaxman/e0d61076595251e26199 ).

I was thinking that my command was not showing any effect in the environment where the HDDM sampling actually happens. So I played around a bit with the "sample" method in hierarchical.py and added something like this to line 650:

# Seed the random number generator
randseed = kwargs.pop('randseed', None)
if randseed != None:
    # pm.numpy.random.seed(randseed)
    np.random.seed(randseed)

(I'm still quite new to Python.) But I was hoping that this would be the environment where my command should have an effect.

Thanks & best,
Michael

mban...@googlemail.com

unread,
Apr 29, 2016, 9:09:33 AM4/29/16
to hddm-users
Hi,

I solved the problem now and the chains are not identical anymore. The problem was unrelated to the random seed. Still it would be nice to know how to set the seed for future use.

Thanks,
Michael

Samuel Mathias

unread,
Apr 29, 2016, 9:13:31 AM4/29/16
to hddm-...@googlegroups.com
Sounds to me like seeding the random number generator was working exactly as expected ...

--
Samuel R. Mathias, Ph.D.
Associate Research Scientist (ARS)
Neurocognition, Neurocomputation and Neurogenetics (n3) Division
Yale University School of Medicine
40 Temple Street, Room 694
New Haven CT 06510
http://www.srmathias.com




mban...@googlemail.com

unread,
Apr 29, 2016, 10:45:15 AM4/29/16
to hddm-users
I didn't define a random seed manually. It all happened internally in the depths of hddm, probably pymc. It remains a mystery to me.

Yes, it is nice that the chains produce different samples now. But it would be also nice to be able to reproduce exactly the same samples next time. I never got that to work.

Sam Mathias

unread,
Apr 29, 2016, 10:53:43 AM4/29/16
to hddm-users
But you said in your original email that you did:

"When I looked at the results, all chains had produced identical traces."

I'm not sure why you would want to produce exactly the same MCMC results twice, but it certainly sounded like you managed that previously. Since random-number generation PyMC2, and therefore HDDM, is all done through numpy, spending some time reading through that documentation is likely to tell you why it worked on some occasions and not others.

--
Samuel R. Mathias, Ph.D.
Associate Research Scientist (ARS)
Neurocognition, Neurocomputation and Neurogenetics (n3) Division
Yale University School of Medicine
40 Temple Street, Room 694
New Haven CT 06511
http://www.srmathias.com

mban...@googlemail.com

unread,
May 2, 2016, 3:04:59 AM5/2/16
to hddm-users
Hi Samuel,

The reason the chains looked the same was not related to the random seed. I noticed that the chains, which I was running in parallel, appeared to produce identical samples because they (accidentally) wrote to the same database (db) file. So the chain that finished last overwrite the previous results.

I erroneously assumed that it had something to do with the random number generator using the same seed on each of my nodes. This is why I started looking into the topic of setting the random seed.

So, basically, (compared to when I started this thread) I'm mostly happy now because I can run multiple chains in parallel. But still I'm somewhat puzzled about how one would generally tell HDDM/PyMC which random seed to use. I couldn't get that to work. When working with non-deterministic algorithm, I often find it comforting to know that there is an option.

Best,
Michael

Sam Mathias

unread,
May 2, 2016, 9:38:01 AM5/2/16
to hddm-users
I see, sorry for the confusion. If it is still bothering you, I suggest coding up a minimal example of a PyMC script (not HDDM) that highlights the issue and post it as an "issue" on the PyMC GitHub page. The primary PyMC developer, Chris Fonnesbeck, is very helpful and responds to most problems in a day or two.

--
Samuel R. Mathias, Ph.D.
Associate Research Scientist (ARS)
Neurocognition, Neurocomputation and Neurogenetics (n3) Division
Yale University School of Medicine
40 Temple Street, Room 694
New Haven CT 06511
http://www.srmathias.com

Reply all
Reply to author
Forward
0 new messages