logarithm of (normal cdf)

Andrew M. Ross

unread,

Apr 14, 2003, 1:42:16 PM4/14/03

to

Hi Everyone,

I'm looking for an algorithm that will compute the logarithm
of the usual Gaussian CDF function. Note that I'm not interested
in the "lognormal" distribution, but in CDF values from the classic
Normal distribution that are close to 1.0, so that I need the
extra precision from a logarithmic representation (also, I'm going
to multiply a bunch of them together, so logs are the natural way).

I know that the GAUSS software has a function called "lncdf()" that
will do this. Has anybody seen an equivalent for Matlab, or at least
something in C or Fortran that I can translate?

Thanks,
Andrew Ross
Industrial and Systems Engineering
Lehigh Univ.
Bethlehem, Pennsylvania, USA
amr5...@lehigh.edu
--
remove all digits <=4 from my e-mail address to reply

Axel Vogt

unread,

Apr 14, 2003, 3:57:57 PM4/14/03

to

"Andrew M. Ross" wrote:
>
> Hi Everyone,
>
> I'm looking for an algorithm that will compute the logarithm
> of the usual Gaussian CDF function. Note that I'm not interested
> in the "lognormal" distribution, but in CDF values from the classic
> Normal distribution that are close to 1.0, so that I need the
> extra precision from a logarithmic representation (also, I'm going
> to multiply a bunch of them together, so logs are the natural way).
>
> I know that the GAUSS software has a function called "lncdf()" that
> will do this. Has anybody seen an equivalent for Matlab, or at least
> something in C or Fortran that I can translate?

There is a classical asymptotic for cdfN(x): 1-exp(-x^2/2)/(x*sqrt(2*Pi))
For x < 7.8 the error should be less than 1E-16 (but is increasing for
smaller x, for x=5 it is .12647141E-10). Does that help you?

Peter J. Acklam

unread,

Apr 14, 2003, 4:27:13 PM4/14/03

to

"Andrew M. Ross" <amr5...@lehigh.edu> wrote:

> I'm looking for an algorithm that will compute the logarithm of
> the usual Gaussian CDF function. Note that I'm not interested
> in the "lognormal" distribution, but in CDF values from the
> classic Normal distribution that are close to 1.0, so that I
> need the extra precision from a logarithmic representation
> (also, I'm going to multiply a bunch of them together, so logs
> are the natural way).

MATLAB is not as well suited for this as other numerical
software. It doesn't have the logarithm of the distribution
functions nor does it have the "log1p" function which is very
convenient for these kind of problems.

Here is one way to solve problems like this. Assume x, y, z,
... are all close to one.

x * y * z * ...

= exp( log( x * y * z * ... ) )

= exp( log(x) + log(y) + log(z) + ... )

= exp( log(1 - x') + log(1 - y') + log(1 - z') + ... )

where x' = 1 - x, y' = 1 - y, etc.

= exp( log1p(-x') + log1p(y') + log1p(z') + ... )

where log1p(t) = log(1+t).

Peter

--
I wish dialog boxes had a butten saying "Whatever". I hate being
forced to answer "Yes" or "No" to a question I have no opinion on
whatsoever. There ought to be a button matching my indifference.

Alan Miller

unread,

Apr 14, 2003, 6:38:53 PM4/14/03

to

"Andrew M. Ross" <amr5...@lehigh.edu> wrote in message
news:3E9AF2F8...@lehigh.edu...

Have you looked at Applied Statistics algorithm AS66 ?
It was published in the Royal Statistical Society's journal, Applied
Statistics, vol.22, pages 424-427, 1973.
The AS algorithms, in Fortran, can be downloaded from statlib. Some of
them, including this one, can be downloaded from my web site.

Cheers

--
Alan Miller
http://users.bigpond.net.au/amiller
Retired Statistician

Peter Perkins

unread,

Apr 15, 2003, 10:06:18 AM4/15/03

to

> I'm looking for an algorithm that will compute the logarithm
> of the usual Gaussian CDF function. Note that I'm not interested
> in the "lognormal" distribution, but in CDF values from the classic
> Normal distribution that are close to 1.0, so that I need the
> extra precision from a logarithmic representation

You might try the following:

F(x) = 1 - (1-F(x)) = 1 - S(x) (S(x) is the survival function)
log(1-y) =(approx) -y - (y.^2)./2 (for y << 1)

so log(F(x)) for large x can be computed as

S = 0.5 * erfc(x ./ sqrt(2))
logOneMinusS = -S.*(1 + S./2)

If x is really far out in the tail, you can get away with log(1-y) = -y. Hope
this helps.

Is it not possible just to work in the opposite tail of the distribution and
avoid all this?

- Peter Perkins
The MathWorks, Inc

Ken Smith

unread,

Apr 15, 2003, 10:35:20 AM4/15/03

to

"Andrew M. Ross" <amr5...@lehigh.edu> wrote in message
news:3E9AF2F8...@lehigh.edu...

> Hi Everyone,
>
> I'm looking for an algorithm that will compute the logarithm
> of the usual Gaussian CDF function. Note that I'm not interested
> in the "lognormal" distribution, but in CDF values from the classic
> Normal distribution that are close to 1.0, so that I need the
> extra precision from a logarithmic representation (also, I'm going
> to multiply a bunch of them together, so logs are the natural way).
>
> I know that the GAUSS software has a function called "lncdf()" that
> will do this. Has anybody seen an equivalent for Matlab, or at least
> something in C or Fortran that I can translate?

You can use the error function:

function lp=lncdf(z)
%LNCDF Log of Gaussian CDF function
%
% LP = LNCDF(Z) returns
% log[ int_{-inf}^{Z} exp(-0.5*z^2)/sqrt(2*pi) dz ]

lp = -log(2)-0.5*z.^2+log(erfcx(-z/sqrt(2)));

Peter J. Acklam

unread,

Apr 15, 2003, 12:29:12 PM4/15/03

to

"Ken Smith" <ksm...@asiDASHspace.com> wrote:

> You can use the error function:
>
> function lp=lncdf(z)
> %LNCDF Log of Gaussian CDF function
> %
> % LP = LNCDF(Z) returns
> % log[ int_{-inf}^{Z} exp(-0.5*z^2)/sqrt(2*pi) dz ]
>
> lp = -log(2)-0.5*z.^2+log(erfcx(-z/sqrt(2)));

No, no!

First of all you must do the maths correctly. If we let Phi(x) be
the standard normal cumulative distribution function, we get

Phi(x) = 0.5 * erfc(-x / sqrt(2))

and

erfcx(x) = exp(x^2) * erfc(x)

so

erfc(x) = exp(-x^2) * erfcx(x)

and

Phi(x) = 0.5 * exp(-x.^2 / 2) .* erfcx(-x / sqrt(2))

Then

log(Phi(x)) = log(0.5) - x.^2 / 2 + log(erfcx(-x / sqrt(2))) (*)

Secondly, this doesn't help us much because, when x is large and
positive, the two terms

log(0.5) - x.^2 / 2 and log(erfcx(-x / sqrt(2)))

have the same sign, but different magnitude, which leads to a
major loss of precision. Even for moderately large values of x,
the expression (*) is numerically very poor.

Ken Smith

unread,

Apr 15, 2003, 2:05:37 PM4/15/03

to

"Peter J. Acklam" <pjac...@online.no> wrote in message
news:3ckjzp...@online.no...

> "Ken Smith" <ksm...@asiDASHspace.com> wrote:
>
> > You can use the error function:
> >
> > function lp=lncdf(z)
> > %LNCDF Log of Gaussian CDF function
> > %
> > % LP = LNCDF(Z) returns
> > % log[ int_{-inf}^{Z} exp(-0.5*z^2)/sqrt(2*pi) dz ]
> >
> > lp = -log(2)-0.5*z.^2+log(erfcx(-z/sqrt(2)));
>
> No, no!
>
> First of all you must do the maths correctly. If we let Phi(x) be
> the standard normal cumulative distribution function, we get
>
> Phi(x) = 0.5 * erfc(-x / sqrt(2))
>
> and
>
> erfcx(x) = exp(x^2) * erfc(x)
>
> so
>
> erfc(x) = exp(-x^2) * erfcx(x)
>
> and
>
> Phi(x) = 0.5 * exp(-x.^2 / 2) .* erfcx(-x / sqrt(2))
>
> Then
>
> log(Phi(x)) = log(0.5) - x.^2 / 2 + log(erfcx(-x / sqrt(2))) (*)

This looks identical to my expression.

> Secondly, this doesn't help us much because, when x is large and
> positive, the two terms
>
> log(0.5) - x.^2 / 2 and log(erfcx(-x / sqrt(2)))
>
> have the same sign, but different magnitude, which leads to a
> major loss of precision. Even for moderately large values of x,
> the expression (*) is numerically very poor.

You meant they have the same magnitude but different sign. And you're
right, this was a bad suggestion.

-Ken

Peter J. Acklam

unread,

Apr 16, 2003, 2:59:25 AM4/16/03

to

"Ken Smith" <ksm...@asiDASHspace.com> wrote:

>
> "Peter J. Acklam" <pjac...@online.no> wrote:
>
> > Then
> >
> > log(Phi(x)) = log(0.5) - x.^2 / 2 + log(erfcx(-x / sqrt(2))) (*)
>
> This looks identical to my expression.

You're right. I misread your posting. Sorry about that.

> > Secondly, this doesn't help us much because, when x is large and
> > positive, the two terms
> >
> > log(0.5) - x.^2 / 2 and log(erfcx(-x / sqrt(2)))
> >
> > have the same sign, but different magnitude, which leads to a
> > major loss of precision. Even for moderately large values of x,
> > the expression (*) is numerically very poor.
>
> You meant they have the same magnitude but different sign.

You're right again.

> And you're right, this was a bad suggestion.

At least I got my point right. :-)

Peter

Ian Smith

unread,

Apr 16, 2003, 3:56:16 AM4/16/03

to

"Andrew M. Ross" <amr5...@lehigh.edu> wrote in message news:<3E9AF2F8...@lehigh.edu>...

I don't think you really need logs.

Suppose we have an array of values v and we want the product of the
cumulative Normal density function of these values. Then it might be
coded as follows

pprod = 1
for(i=0;i<array_size;i++)pprod=pprod*cdfnormal(v[i]);

but we can just as easily calculate it with

qprod = 0
for(i=0;i<array_size;i++)qprod=qprod+cdfnormal(-v[i])*(1-qprod);
pprod=1-qprod;

Ian Smith

Peter J. Acklam

unread,

Apr 16, 2003, 4:11:44 PM4/16/03

to

iandj...@aol.com (Ian Smith) wrote:

> I don't think you really need logs.
>
> Suppose we have an array of values v and we want the product of the
> cumulative Normal density function of these values. Then it might be
> coded as follows
>
> pprod = 1
> for(i=0;i<array_size;i++)pprod=pprod*cdfnormal(v[i]);
>
> but we can just as easily calculate it with
>
> qprod = 0
> for(i=0;i<array_size;i++)qprod=qprod+cdfnormal(-v[i])*(1-qprod);
> pprod=1-qprod;

How is that going to solve the problem? "qprod" is close to zero,
which is fine, but "1-qprod" will be very close to 1, perhaps
numerically identical to 1, which is exactly what the poster is
trying to avoid.

Ian Smith

unread,

Apr 17, 2003, 4:42:39 AM4/17/03

to

pjac...@online.no (Peter J. Acklam) wrote in message news:<4r4yqk...@online.no>...

> iandj...@aol.com (Ian Smith) wrote:
>
> > I don't think you really need logs.
> >
> > Suppose we have an array of values v and we want the product of the
> > cumulative Normal density function of these values. Then it might be
> > coded as follows
> >
> > pprod = 1
> > for(i=0;i<array_size;i++)pprod=pprod*cdfnormal(v[i]);
> >
> > but we can just as easily calculate it with
> >
> > qprod = 0
> > for(i=0;i<array_size;i++)qprod=qprod+cdfnormal(-v[i])*(1-qprod);
> > pprod=1-qprod;
>
> How is that going to solve the problem? "qprod" is close to zero,
> which is fine, but "1-qprod" will be very close to 1, perhaps
> numerically identical to 1, which is exactly what the poster is
> trying to avoid.
>
> Peter

You cannot avoid the answer being close to 1. All you can do is
produce an accurate calculation as to how close to 1 the answer is.

Producing a logcdfnormal is easy. You need a function logm1(x) to
calculate log(1-x) accurately for small x and then logcdfnormal(value)
= logm1(cdfnormal(-value)). pprob is then exp(sum of logcdfnormals)
which not surprisingly can still be close to 1.

qprob is then -expm1(sum of logcdfnormals) where expm1(x) is used to
calculate exp(x)-1 accurately for small x.

Equivalently if we calculate qprob as before, then the sum of
logcdfnormals can be calculated as logm1(qprob)

My powers of explanation are not very good but hopefully you can see
that nothing extra has been achieved by using logs.

I should add that my "algorithm" was not intended to be used. In
practice if array_size is large then simple summations of the
logcdfnormals will not be accurate - but that's a different problem.

Ian Smith

Isaac Asher

unread,

Jan 13, 2017, 2:12:11 AM1/13/17

to

"Andrew M. Ross" <amr5...@lehigh.edu> wrote in message <3E9AF2F8...@lehigh.edu>...

Ok so this is an old thread but I had the same problem and I think I found a good answer. I posted it over on stackexchange here:

http://stats.stackexchange.com/a/256009/145180

I'm using python so it is easy for me...

For Matlab, it looks like you need to call their mex'd code to calculate w(z); shouldn't be too hard though.

Zhongfang He

unread,

Mar 17, 2017, 11:40:08 AM3/17/17

to

I recently had a similar problem when trying to compute the log of Gaussian CDFs for estimating an ordered probit model. The ad hoc approximation I use to circumvent the numerical issue is as follows: First specify a threshold p which is small, say 1e-100, but log(p) is readily computable in Matlab. If the Gaussian CDF \Phi(x) >= p, then just use the log function. Otherwise, use the Taylor expansion at p. I found that using the Taylor approximation up to the square or the cubic terms suffice for my job.