Hi Joe,
I'm having a look at the issue of random number control in liam2 [I am still using 0.8.2], and what we would need to do to bring the kind of control we have implemented in Genesis, to liam2. Howard already mentioned this in a separate thread.
I see the Mersenne twister is being called from numpy using numpy.random. If things have moved on since 0.8.2, please accept my apologies for not keeping up! However I am assuming the random numbers are done the same way in the current version [I quickly looked at the version 0.9 documentation but did not see a reference to random number control].
There were no change in that regard in 0.9. Only more random distributions were added.
The main question for us is I think how we can control the random number seed. But I am also not clear whether the M-T simply requires a seed, as with the linear congruential approach we use in Genesis, or whether the sequence from the M-T is dictated in a much more complex fashion [I have seen mention of 600+ internal states]. The worst case scenario for us would be replacing the RNG as well as controlling the seed; the simplest would be controlling the seed if that's sufficient with the M-T algorithm.
It depends what you mean by control. Setting a seed in M-T entirely defines the numbers you get subsequently. The only thing is that once you start using the random "sequence", you don't know what the "new seed" is.
You could try setting the seed for each draw, but I fear that would be awfully slow and possibly severely limit the random nature of the numbers it generates if you make the slightest mistake. Using one RandomState() object per random "user" (random expression) and simply keep them in memory seem like a better alternative. This is what I would do anyway. It would waste a bit of memory (a bit less than 3Kb by random "user") but that seems reasonable. Note that due to the way LIAM2 handles conditionals, a difference in state of individuals has no influence on the number of draws. With the scheme I envisioned, (more) births would not influence draws of other individuals either since the newborns draws are at the end of the vector/sequence. However, I just realized a change in mortality between two models (assuming one remove() dead people) would shift the whole sequence unless special care is taken to avoid this…
Should we decide to use a different random number generator [RNG] such as a linear congruential algorithm, the process looks like it would be comparatively straightforward:
You might want to look at the PCG family of random generators (http://www.pcg-random.org) as a possibly better alternative to LCG. It is very new and not battle-tested nor peer-reviewed yet but looks promising for this kind of usage.
- We would need Python code for the RNG itself - eg create our own .py module if absolutely necessary [the linear congruential generator just seems to use arithmetic function and the modulus operator]
- We would need to replace calls to numpy.random at various points within exprmisc.py, regressions.py, alignment.py, align_link.py
That is certainly possible but I would use numpy (or even numexpr), not "pure" Python to compute those, or you will probably run into performance problems.
However to use our own code for generating the seed based on some case and runtime factors, I am not sure what would be involved. The algorithm we currently use for this in Genesis is quite complex and depends on the object identifier [which person etc], a variable identifier, time period and run number. These would need to be passed to the RNG when it gets called, but I do not understand how to refer to them. Do you have any guidance on this, or would I need to be more specific about what we would be trying to do?
You would need to get most of that information from the context. The time period and object identifier are already available there. You could add the run number in there too (wherever it comes from initially – a command-line argument?) somewhere in simulation.py. The "variable identifier" should be added in Process.run (in process.py) and possibly ProcessGroup.run_guarded if you want to use the position within a procedure as I intended.
Hope it helps,
Gaëtan
PS: Please keep us updated on your experiments, whether the results are good or bad.