[erlang-questions] Gaussian Distribution

48 views
Skip to first unread message

Frank Recker

unread,
Oct 28, 2012, 11:58:09 AM10/28/12
to erlang-q...@erlang.org, haskel...@haskell.org
Hi,

at work, I often need the values the cumulative distribution function of
the Gaussian distribution. The code for this function in haskell, erlang
and perl and the corresponding mathematical paper can be found at
git://github.com/frecker/gaussian-distribution.git .

Regards,
Frank






_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Siraaj Khandkar

unread,
Oct 28, 2012, 12:19:38 PM10/28/12
to Frank Recker, erlang-q...@erlang.org, haskel...@haskell.org
On Oct 28, 2012, at 11:58 AM, Frank Recker wrote:

> Hi,
>
> at work, I often need the values the cumulative distribution function of
> the Gaussian distribution. The code for this function in haskell, erlang
> and perl and the corresponding mathematical paper can be found at
> git://github.com/frecker/gaussian-distribution.git .

Thank you!


--
Siraaj Khandkar
.o.
..o
ooo

Richard O'Keefe

unread,
Oct 28, 2012, 9:52:28 PM10/28/12
to Erlang Users' List

On 29/10/2012, at 4:58 AM, Frank Recker wrote:

> Hi,
>
> at work, I often need the values the cumulative distribution function of
> the Gaussian distribution. The code for this function in haskell, erlang
> and perl and the corresponding mathematical paper can be found at
> git://github.com/frecker/gaussian-distribution.git .

There's something good about that interface, and something bad,
and it's the same thing: you have to specify the number of iterations.
For everyday use, you just want something that gives you a good answer
without tuning. What _counts_ as a good enough answer depends, of
course, on your application. I adapted John D. Cook's C++ code and
used R-compatible names. (What I _really_ wanted this for was
Smalltalk. The Erlang code is new.) Since Erlang is built on top of
C, and since C 99 compilers are required to provide erf(), it's
straightforward to calculate

Phi(x) = (1 + erf(x / sqrt(2))) / 2

Where John D. Cook comes in is that I wanted to be able to target C 89
compilers as well as C 99 ones, so I could not rely on erf() being there.
Experimentally, the absolute error of pnorm/1 is below 1.0e-7 over the
range -8 to +8.

-module(norm).
-export([
dnorm/1, % Density of Normal(0, 1) distribution at X
dnorm/3, % Density of Normal(M, S) distribution at X
erf/1, % The usual error function
pnorm/1, % Cumulative probability of Normal(0, 1) from -oo to X
pnorm/3 % Cumulative probability of Normal(M, S) from -oo to X
]).

dnorm(X) ->
0.39894228040143267794 * math:exp((X*X)/2.0).

dnorm(X, M, S) ->
dnorm((X-M)/S).

% Phi(x) = (1+erf(x/sqrt 2))/2.
% The absolute error is less than 1.0e-7.

pnorm(X) ->
(erf(X * 0.70710678118654752440) + 1.0) * 0.5.

pnorm(X, M, S) ->
pnorm((X-M)/S).

% The following code was written by John D. Cook.
% The original can be found at http://www.johndcook.com/cpp_erf.html
% It is based on formula 7.1.26 of Abramowitz & Stegun.
% The absolute error seems to be less than 1.4e-7;
% the relative error is good except near 0.

erf(X) ->
if X < 0 ->
S = -1.0, A = -X
; true ->
S = 1.0, A = X
end,
T = 1.0/(1.0 + 0.3275911*A),
Y = 1.0 - (((((1.061405429*T - 1.453152027)*T) + 1.421413741)*T -
0.284496736)*T + 0.254829592)*T*math:exp(-A*A),
S * Y.

Jachym Holecek

unread,
Oct 29, 2012, 5:31:33 AM10/29/12
to Frank Recker, erlang-q...@erlang.org
# Frank Recker 2012-10-28:
> at work, I often need the values the cumulative distribution function of
> the Gaussian distribution. The code for this function in haskell, erlang
> and perl and the corresponding mathematical paper can be found at
> git://github.com/frecker/gaussian-distribution.git .

One more to the mix -- inverse CDF for normal distribution. Only mildly
tested.

BR,
-- Jachym

%% Function normal_quant/1 is and Erlang version of:
%%
%% http://home.online.no/~pjacklam/notes/invnorm/impl/acklam/bc/ltqnorm.bc
%%
%% Lower tail quantile for standard normal distribution function.
%% This function returns an approximation of the inverse cumulative
%% standard normal distribution function. I.e., given P, it returns
%% an approximation to the X satisfying P = Pr{Z <= X} where Z is a
%% random variable from the standard normal distribution.
%%
%% The algorithm uses a minimax approximation by rational functions
%% and the result has a relative error whose absolute value is less
%% than 1.15e-9.
%%
%% The shell syntax is (or should be, anyway) POSIX bc.
%%
%% Author: Peter John Acklam
%% Time-stamp: 2005-03-10 14:13:52 +01:00
%% E-mail: pjac...@online.no
%% WWW URL: http://home.online.no/~pjacklam

%% Inverse CDF of standard normal distribution.
normal_quant(P) when is_float(P) ->
Q = P - 0.5,
A = quant_fabs(Q),
if A =< 0.47575 ->
%% Rational approximation for central region.
R = Q*Q,
((((( -39.69683028665376*R + 220.9460984245205)*R - 275.9285104469687)*R +
138.3577518672690)*R - 30.66479806614716)*R + 2.506628277459239)*Q /
((((( -54.47609879822406*R + 161.5858368580409)*R - 155.6989798598866)*R +
66.80131188771972)*R - 13.28068155288572)*R + 1);
A > 0.47575 ->
%% Rational approximation for tails.
if Q > 0 ->
-quant_tails(1 - P);
true ->
quant_tails(P)
end
end.

quant_tails(P) when is_float(P) ->
R = math:sqrt(-2 * math:log(P)),
(((((-0.007784894002430293*R - 0.3223964580411365)*R - 2.400758277161838)*R -
2.549732539343734)*R + 4.374664141464968) * R + 2.938163982698783) /
(((( 0.007784695709041462*R + 0.3224671290700398)*R + 2.445134137142996)*R +
3.754408661907416)*R + 1).

quant_fabs(X) when is_float(X), X < 0.0 ->
-X;
quant_fabs(X) when is_float(X) ->
X.

%% Inverse CDF of generic normal distribution.
normal_quant(P, Mean, Sigma) when is_float(Mean), is_float(Sigma) ->
Mean + math:sqrt(Sigma)*normal_quant(P).

Frank Recker

unread,
Oct 29, 2012, 6:59:58 AM10/29/12
to erlang-q...@erlang.org
Your points are correct, but I had reasons for my design choices.

- math:erf/1 was not available in my windows version of erlang

- under linux math:erf/1 delivers the value 1 for large x (x>7), which is
problematic. The exact value is always in the open interval (-1,1).

- I wanted to know the maximum error of the calculated approximation

and of course

- it was fun, to derive the formular, implement it and test the
convergency of the calculated values for different N.

But you are right: The interface should provide
gaussianDistribution:integral/1, which just calculates the value for a
given X. I put it on my todo-list ;-)

Frank

On Mon, October 29, 2012 02:52, Richard O'Keefe wrote:
> On 29/10/2012, at 4:58 AM, Frank Recker wrote:
>> Hi,
>> at work, I often need the values the cumulative distribution function
of
>> the Gaussian distribution. The code for this function in haskell,
erlang
>> and perl and the corresponding mathematical paper can be found at
git://github.com/frecker/gaussian-distribution.git .

Björn-Egil Dahlberg

unread,
Oct 29, 2012, 7:34:43 AM10/29/12
to erlang-q...@erlang.org
Do we lack a proper statistics library in erlang?

If so, should we add one to stdlib? or should it be a standalone app?

I realize that there are already a couple out there, for example
basho_stats but perhaps we need a unified one. One that is a lib and not
a process framework.

I began some thoughts for a statistics lib at
https://github.com/psyeugenic/matstat

I added what I usually need when doing measurements and looked at what
other languages/runtimes has implemented but back to the question. Do we
lack a library and should we organize us to begin constructing one together?

// Björn-Egil

Frank Recker

unread,
Oct 29, 2012, 8:31:55 AM10/29/12
to erlang-q...@erlang.org
I think the Gnu Scientific Library (http://www.gnu.org/software/gsl/) has
everything one ever needs. But if you cannot include it (as it is the case
for me) a nativ erlang library would be great.

Frank

Robert Virding

unread,
Oct 29, 2012, 9:10:59 AM10/29/12
to Frank Recker, erlang-q...@erlang.org
Wouldn't a better solution be to have a BEAM implementation of the error function implemented when it doesn't occur in the system math library so that the math module always provides this function?

Robert

Frank Recker

unread,
Oct 29, 2012, 11:52:55 AM10/29/12
to erlang-q...@erlang.org
Sorry, but I don't overlook the ramifications of this approach. My feeling
is that it might be what we call in german "mit Kanon auf Spatzen
schiessen" (dict.leo.org translates this to "to crack a nut with a
sledgehammer").

Frank

Robert Virding

unread,
Oct 31, 2012, 2:35:34 PM10/31/12
to Frank Recker, erlang-q...@erlang.org
Maybe if we were just looking at your case this would be overkill. But in general I think it would be reasonable to expect library modules to behave the same in all releases, at least when they are not doing something OS specific. Which is not being done here.

Evan Miller

unread,
Oct 31, 2012, 9:59:01 PM10/31/12
to Frank Recker, erlang-q...@erlang.org
On Mon, Oct 29, 2012 at 5:59 AM, Frank Recker <frank....@arcor.de> wrote:
> Your points are correct, but I had reasons for my design choices.
>
> - math:erf/1 was not available in my windows version of erlang
>
> - under linux math:erf/1 delivers the value 1 for large x (x>7), which is
> problematic. The exact value is always in the open interval (-1,1).
>

From the docs:

"erfc(X) returns 1.0 - erf(X), computed by methods that avoid
cancellation for large X."

You can calculate the CDF for large x with

Phi(x) = 1 - erfc(x / sqrt(2)) / 2
--
Evan Miller
http://www.evanmiller.org/

Evan Miller

unread,
Oct 31, 2012, 10:01:12 PM10/31/12
to Robert Virding, erlang-q...@erlang.org
On Mon, Oct 29, 2012 at 8:10 AM, Robert Virding
<robert....@erlang-solutions.com> wrote:
> Wouldn't a better solution be to have a BEAM implementation of the error function implemented when it doesn't occur in the system math library so that the math module always provides this function?

+1

>
> Robert

Richard O'Keefe

unread,
Nov 1, 2012, 1:52:09 AM11/1/12
to Evan Miller, erlang-q...@erlang.org
> On Mon, Oct 29, 2012 at 5:59 AM, Frank Recker <frank....@arcor.de> wrote:
>> Your points are correct, but I had reasons for my design choices.
>>
>> - math:erf/1 was not available in my windows version of erlang
>>
>> - under linux math:erf/1 delivers the value 1 for large x (x>7), which is
>> problematic. The exact value is always in the open interval (-1,1).

This is floating-point we are talking about.
For sufficiently large x, the closest floating point number WILL
be 1, like it or not. erf() is doing the best it possibly can.

Frank Recker

unread,
Nov 1, 2012, 1:54:02 PM11/1/12
to erlang-q...@erlang.org
I agree. And I frankly can hardly see a reason why math:erf/1 is not
available under windows.

Frank

On Wed, October 31, 2012 19:35, Robert Virding wrote:
> Maybe if we were just looking at your case this would be overkill. But
in
> general I think it would be reasonable to expect library modules to
behave
> the same in all releases, at least when they are not doing something OS
specific. Which is not being done here.
> Robert
> ----- Original Message -----
>> From: "Frank Recker" <frank....@arcor.de>
>> To: erlang-q...@erlang.org
>> Sent: Monday, 29 October, 2012 8:52:55 AM
>> Subject: Re: [erlang-questions] Gaussian Distribution
>> Sorry, but I don't overlook the ramifications of this approach. My feeling
>> is that it might be what we call in german "mit Kanon auf Spatzen
schiessen" (dict.leo.org translates this to "to crack a nut with a
sledgehammer").
>> Frank
>> On Mon, October 29, 2012 14:10, Robert Virding wrote:
>> > Wouldn't a better solution be to have a BEAM implementation of the error
>> > function implemented when it doesn't occur in the system math library so
>> > that the math module always provides this function?
>> > Robert
>> > ----- Original Message -----
>> >> From: "Frank Recker" <frank....@arcor.de>
>> >> To: erlang-q...@erlang.org
>> >> Sent: Monday, 29 October, 2012 3:59:58 AM
>> >> Subject: Re: [erlang-questions] Gaussian Distribution
>> >> Your points are correct, but I had reasons for my design choices. -
math:erf/1 was not available in my windows version of erlang - under
linux math:erf/1 delivers the value 1 for large x (x>7), which is
>> >> problematic. The exact value is always in the open interval
>> >> (-1,1).

Frank Recker

unread,
Nov 1, 2012, 4:33:51 PM11/1/12
to erlang-q...@erlang.org
On Thu, November 1, 2012 06:52, Richard O'Keefe wrote:
>> On Mon, Oct 29, 2012 at 5:59 AM, Frank Recker <frank....@arcor.de>
>> wrote:
>>> Your points are correct, but I had reasons for my design choices.
>>>
>>> - math:erf/1 was not available in my windows version of erlang
>>>
>>> - under linux math:erf/1 delivers the value 1 for large x (x>7), which
>>> is
>>> problematic. The exact value is always in the open interval (-1,1).
>
> This is floating-point we are talking about.
> For sufficiently large x, the closest floating point number WILL
> be 1, like it or not. erf() is doing the best it possibly can.

Right, but at you cannot blame the floating accuray of erlang alone. Here
is an example under Linux (where erf exists):

1> math:erf(6.0).
1.0
2> gaussianDistribution:integral(6.0,500).
0.9999999991312835

Frank
Reply all
Reply to author
Forward
0 new messages