Random numbers and the Poisson distribution

833 views
Skip to first unread message

spaceLem

unread,
Sep 15, 2014, 10:05:03 AM9/15/14
to julia...@googlegroups.com


Hi all,

I work in disease modelling, using use a mix of C++ and Octave. I'm fairly new to Julia, although I've managed to get a basic model up and running, and at 1/5.5 times the speed of C++ I'm pretty impressed (although hoping to close in on C++). I'd love to be able to work in one language all the time, and I'm feeling that I'm not far off.

I have two questions regarding random numbers and the Poisson distribution. One algorithm I use has a number of possible events, each with an associated event rate. From these events, you choose a time step dt, then the number of times each event happens is Poisson distributed with lambda = rate of the event * dt. In Octave I could write code along these lines (simplified to get the gist of things):

rates =  50*rand(1,6);
rates
(3) = 0;
dt
= 0.1;
K
= poissrnd(rates*dt); % = [1 6 0 4 3 4]

where K is an array giving the number of times each event occurs. In Julia, I would write

using Distributions
rates
= 50rand(6)
rates
[3] = 0
dt
= 0.1
K
= zeros(Float64,6)
for i = 1:6; K[i] = rand(Poisson(rates[i] * dt)); end

This gives: ERROR: lambda must be positive
 in Poisson at /Users/spacelem/.julia/Distributions/src/univariate/poisson.jl:4
 in anonymous at no file:1

Which brings me to my first question: how best to handle when the events have zero rates (which is not uncommon, and needs to be handled)? The correct behaviour is that an event with zero rate occurs zero times.

I found that editing poisson.jl (mentioned in the error), and changing line 4 from if l > 0 to if l >= 0, then the error goes away and when events have zero rates, they correctly occur zero times. I know the Poisson distribution is technically only defined for lambda > 0, but it really does make sense to handle the case lambda = 0 as returning 0. Somehow I feel that editing that file was probably not the correct thing to do (although it was great that I was able to), but I'd like to follow good practice, and I'm going to run into problems if I ever need to share my code.

And my second question: in Octave I can specify an array of lambdas and get back an array of random numbers distributed according to those event rates. Is it possible to do the same in Julia? (I can write rand(6) and get a vector of uniformly distributed random numbers, but I need the for loop to do the same for other distributions). In Octave you can pretty much write something like X += dX * poissrnd(rates * dt); all in one line (where X is a vector of populations, and dX is the event rate / state change matrix), and it would be nice to be as elegant as that in Julia.

Thank you very much in advance.

Regards,
Jamie

Andreas Noack

unread,
Sep 15, 2014, 10:33:25 AM9/15/14
to julia...@googlegroups.com
I can see that R accepts zero as rate parameter so maybe we should do the same. Could you open a pull request to Distributions.jl with that change?

Regarding the vectorized version, the answer is that you can do almost what you want with a comprehension, i.e. something like X += dX*[rand(Poisson(r*dt)) for r in rates].

Med venlig hilsen

Andreas Noack

John Myles White

unread,
Sep 15, 2014, 11:03:32 AM9/15/14
to julia...@googlegroups.com
I’m not so sure that we should follow the lead of Octave and R here. Neither of those languages reify distributions as types, so changes to their RNG’s don’t affect other operations on those same distributions.

In contrast, the proposed change here would break a lot of other code in Distributions that assumes that every Poisson object defines a non-delta distribution. At a minimum, all of the following functions would need to have branches inserted into them to work around the lambda = 0 case:

* entropy
* kurtosis
* skewness

In addition, every person who ever wrote code in the future that worked with Poisson objects would need to know that our definition of the Poisson distribution contradicted the definition found in textbooks. This would affect people writing code to estimate KL divergences, for example.

 — John

spaceLem

unread,
Sep 15, 2014, 11:07:04 AM9/15/14
to julia...@googlegroups.com
Thanks for your response! I can do that (once I figure out how).

Regards,
Jamie

spaceLem

unread,
Sep 15, 2014, 11:25:19 AM9/15/14
to julia...@googlegroups.com
Hi John, thanks for your response.

If you don't think silently accepting lambda = 0 is good practice, how best should I approach this? The only other thing I can think of is writing my own function wrapper specifically to catch the case where lambda = 0.

I don't know how other people work with random numbers, but my suspicions are that being able to get a random number when lambda = 0 might be more common than needing skew and kurtosis for that case (although I do accept these are all problematic cases, ballooning to infinity). The Poisson distribution is very important to modellers, and zero rates do need to be handled somehow (currently R, Octave, and the GSL in C++ all accept lambda = 0, so the textbook definition isn't necessarily the popular one! On the other hand, Scipy returns an error).

Regards,
Jamie

Andreas Noack

unread,
Sep 15, 2014, 11:43:00 AM9/15/14
to julia...@googlegroups.com
This is not a big deal to me, but what exactly would break by allowing λ=0? Returning inf for skewness and kurtosis and an error for entropy doesn't seem problematic to me. I don't do estimation of KL divergences, but it appears a bit strange to that extending the parameter space with a point could make a difference as the degenerate case can be approximated arbitrarily well with the non degenerate distribution (except for floating point rounding).

Med venlig hilsen

Andreas Noack

John Myles White

unread,
Sep 15, 2014, 11:43:13 AM9/15/14
to julia...@googlegroups.com
Why not just write out an explicit check in your code for the special case? Here's an example of how one might do that:

n_samples = 10
n_lambdas = 3
lambdas = [0.0, 0.1, 0.2]

samples = Array(Int, n_samples, n_lambdas)

for (i, lambda) in enumerate(lambdas)
for sample in 1:n_samples
if rate == 0.0
samples[sample, i] = 0
else
samples[sample, i] = rand(Poisson(lambda))
end
end
end

In my mind, this is the best possible approach to this kind of problem: you want slighly unusual behavior, so you should make the place where your goals are slightly unusual as explicit as possible so that anyone who reads your code will understand exactly what you're doing and exactly where you're breaking from textbook definitions.

As for precedent, I believe the list of systems you've provided breaks down as follows:

RNG's take in numbers, not objects:

* R
* Octave
* GSL

RNG's take in objects, not numbers:

* SciPy
* Julia

So it seems like the split in behavior is perfectly explicable in terms of the types of arguments you provide to RNG's. This seems like a very good argument for not changing the behavior of Distributions.

 -- John

John Myles White

unread,
Sep 15, 2014, 11:51:39 AM9/15/14
to julia...@googlegroups.com
The KL divergence (and other comparisons) between this edge case and any standard instance of the Poisson distribution are problematic because the inclusion of this point in parameter space changes the support of the distribution from the entire set of non-negative integers to the set containing only zero.

 -- John

David Gonzales

unread,
Sep 15, 2014, 1:12:33 PM9/15/14
to julia...@googlegroups.com
or with comprehension and unicode-sugar:

using Distributions
Λ = 50rand(6) ; Λ[3] = 0.0 ;
Δt = 0.1
> 0.0 ? rand(Poisson(λ*Δt)) : 0 for λ in Λ]

@assert Julia > max(Matlab,Octave,c++)

nbbb

unread,
Sep 16, 2014, 5:52:30 AM9/16/14
to julia...@googlegroups.com

In addition, every person who ever wrote code in the future that worked with Poisson objects would need to know that our definition of the Poisson distribution contradicted the definition found in textbooks.

but does it really? the pmf should be

P(n) = lambda^n/n! exp[-lambda] .

plugging in lambda = 0 gives P(n) = KroneckerDelta(n,0) since 0^0=1 . everything is fine.  i agree that the support is different but that seems to be a numerical problem when calculating KL divergences, not a mathematical one.


spaceLem

unread,
Sep 16, 2014, 7:30:20 AM9/16/14
to julia...@googlegroups.com
I suppose it comes from one's perspective -- from a modeller's point of view, a zero rate is certainly not a special case! The error was surprising, which made me baulk at handling it.

From a purity argument, I agree with you -- Poisson(0) is not defined, however from a usefulness argument, simply returning 0 is what people (especially those coming from most other languages) would expect (after all, the limit of P(0) as lambda -> 0 is 0), and returning infinity for skewness etc. rather than an error also makes a geometric sense.

However, I'll probably handle it as David Gonzales' nice one liner for the moment.

Thanks for your advice!

Regards,
Jamie

Simon Byrne

unread,
Sep 16, 2014, 8:14:58 AM9/16/14
to julia...@googlegroups.com
As this situation arises in other cases, I've opened an issue here:

The skewness shouldn't be Inf as it arises as 0/0. We could return a NaN though.

Alan Edelman

unread,
Sep 16, 2014, 9:21:01 AM9/16/14
to julia...@googlegroups.com
Would be good to sort out the implications, but I'm hoping that atomic measures
are not as problematic as JMW fears.  
Reply all
Reply to author
Forward
0 new messages