Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

GEOMETRIC mean computation: TEST of proposed tecniques for

1 view
Skip to first unread message

pamela fluente

unread,
Jul 19, 2007, 3:25:21 AM7/19/07
to
Continuing from:
http://groups.google.it/group/sci.math/browse_thread/thread/c2307b8feb885a0d/8ec987275257c26f?lnk=raot&hl=it#8ec987275257c26f

I don't see my posts (?). Perhaps Google is holding them.
I will try starting a new thread with some tests on your algorithm.


Result (I have attached some fantasy label: please correct if
mistaken)
Second number is milliseconds
========================================================


1000000 Numbers Random (0,1)

Naive accumulation: 0, 21
Naive exponentiation: 0,368280459683138, 199
Raymond sum log: 0,368280459683132, 70
Raymond sum log blocks: 0,368280459683132, 92
Peter multiplication blocks: 0,368280459683136, 62
Peter log blocks: 0,368280459683132, 91
David bounding: 0,368280459683126, 81

Naive accumulation: 0, 19
Naive exponentiation: 0,367757141069834, 199
Raymond sum log: 0,36775714106984, 71
Raymond sum log blocks: 0,36775714106984, 92
Peter multiplication blocks: 0,367757141069837, 63
Peter log blocks: 0,36775714106984, 91
David bounding: 0,367757141069844, 81


1000000 Numbers Random (0,5000)

Naive accumulation: +Infinito, 341
Naive exponentiation: 1840,71557682207, 193
Raymond sum log: 1840,71557682178, 70
Raymond sum log blocks: 1840,71557682178, 94
Peter multiplication blocks: 1840,7155768222, 119
Peter log blocks: 1840,71557682178, 91
David bounding: 1840,71557682221, 297

Naive accumulation: +Infinito, 340
Naive exponentiation: 1840,19546318342, 191
Raymond sum log: 1840,19546318354, 71
Raymond sum log blocks: 1840,19546318354, 92
Peter multiplication blocks: 1840,19546318367, 119
Peter log blocks: 1840,19546318354, 91
David bounding: 1840,19546318349, 296


Quick Test Program (please correct if wrong). I
=========================================

'GEOMETRIC MEAN FOR A STREAM OF ARBITRARY LENGTH

Dim SomeThreshold As Double = 10000000
Dim TempStore As Double
Dim SomeEpsilon As Double = 0.0001

'Fill a list of number -------------------
Dim n As Long = 1000000
Dim Random As New Random
Dim LongListOfNumbers As New List(Of Double)
For i As Long = 0 To n - 1
LongListOfNumbers.Add(Random.NextDouble() * 5000)
Next i
Dim Watch As New Stopwatch
Dim Sb As New System.Text.StringBuilder
Watch.Start()

'TEST ===========================

Watch.Reset() : Watch.Start()
'METHOD1 ----------------------------------
'Naive way: not works, gives overflow

Dim GeometricMean1 As Double = 1
For Each Number As Double In LongListOfNumbers
GeometricMean1 *= Number
Next Number
GeometricMean1 ^= (1 / n)

Sb.Append(vbCrLf & "Naive accumulation: " & GeometricMean1 &
", " & Watch.ElapsedMilliseconds)


Watch.Reset() : Watch.Start()
'METHOD1 ----------------------------------
'Naive way: variant with immediate exponentiation

Dim GeometricMean1A As Double = 1
For Each Number As Double In LongListOfNumbers
GeometricMean1A *= Number ^ (1 / n)
Next Number

Sb.Append(vbCrLf & "Naive exponentiation: " & GeometricMean1A
& ", " & Watch.ElapsedMilliseconds)


Watch.Reset() : Watch.Start()
'METHOD2 ----------------------------------
'Raymond proposal

Dim GeometricMean2 As Double = 0
For Each Number As Double In LongListOfNumbers
GeometricMean2 += Math.Log(Number)
Next Number
GeometricMean2 = Math.Exp(GeometricMean2 / n)

Sb.Append(vbCrLf & "Raymond sum log: " & GeometricMean2 & ", "
& Watch.ElapsedMilliseconds)


Watch.Reset() : Watch.Start()
'METHOD2A ----------------------------------
'Raymond proposal in blocks

TempStore = 1
Dim GeometricMean2A As Double = 0
For Each Number As Double In LongListOfNumbers
TempStore *= Number
If Math.Abs(TempStore) > SomeThreshold Then
GeometricMean2A += Math.Log(TempStore)
TempStore = 1
ElseIf Math.Abs(TempStore) < SomeThreshold Then
'This necessary for numbers in (0,1)!
GeometricMean2A += Math.Log(TempStore)
TempStore = 1
End If
Next Number
If TempStore <> 1 Then GeometricMean2A += Math.Log(TempStore)
GeometricMean2A = Math.Exp(GeometricMean2A / n)

Sb.Append(vbCrLf & "Raymond sum log blocks: " &
GeometricMean2A & ", " & Watch.ElapsedMilliseconds)


Watch.Reset() : Watch.Start()
'METHOD3 ----------------------------------
'Peter proposal

TempStore = 1
Dim GeometricMean3 As Double = 1
For Each Number As Double In LongListOfNumbers
TempStore *= Number
If Math.Abs(TempStore) > SomeThreshold Then
GeometricMean3 = GeometricMean3 * TempStore ^ (1 / n)
TempStore = 1
ElseIf Math.Abs(TempStore) < SomeEpsilon Then
'This necessary for numbers in (0,1)!
GeometricMean3 = GeometricMean3 * TempStore ^ (1 / n)
TempStore = 1
End If
Next Number
If TempStore <> 1 Then GeometricMean3 = GeometricMean3 *
TempStore ^ (1 / n)

Sb.Append(vbCrLf & "Peter multiplication blocks: " &
GeometricMean3 & ", " & Watch.ElapsedMilliseconds)

Watch.Reset() : Watch.Start()
'METHOD4 ----------------------------------
'Peter proposal / Variant with sum and log

TempStore = 0
Dim GeometricMean4 As Double = 0
For Each Number As Double In LongListOfNumbers
TempStore += Math.Log(Number)
If Math.Abs(TempStore) > SomeThreshold Then
GeometricMean4 += TempStore
TempStore = 0
ElseIf Math.Abs(TempStore) < SomeEpsilon Then
'This necessary for numbers in (0,1) ??
GeometricMean4 += TempStore
TempStore = 0
End If
Next Number
If TempStore <> 0 Then GeometricMean4 += TempStore
GeometricMean4 = Math.Exp(GeometricMean4 / n)

Sb.Append(vbCrLf & "Peter log blocks: " & GeometricMean4 & ",
" & Watch.ElapsedMilliseconds)


Watch.Reset() : Watch.Start()
'METHOD5 ----------------------------------
'Beeworks proposal
'????????? recursion ? (stack concern?)


Watch.Reset() : Watch.Start()
'METHOD6 ----------------------------------
'David proposal

Dim GeometricMean6 As Double = 1
Dim BigExponent As Integer = 0
For Each Number As Double In LongListOfNumbers

GeometricMean6 *= Number

Do While Math.Abs(GeometricMean6) > SomeThreshold
GeometricMean6 /= 2
BigExponent += 1
Loop
Do While Math.Abs(GeometricMean6) < SomeEpsilon
GeometricMean6 *= 2
BigExponent -= 1
Loop

Next Number
GeometricMean6 = Math.Exp((Math.Log(GeometricMean6) +
BigExponent * Math.Log(2)) / n)

Sb.Append(vbCrLf & "David bounding: " & GeometricMean6 & ", "
& Watch.ElapsedMilliseconds)


Me.RichTextBox1.AppendText(vbCrLf & vbCrLf & Sb.ToString)

David Bernier

unread,
Jul 19, 2007, 5:59:16 AM7/19/07
to
pamela fluente wrote:
> Continuing from:
> http://groups.google.it/group/sci.math/browse_thread/thread/c2307b8feb885a0d/8ec987275257c26f?lnk=raot&hl=it#8ec987275257c26f
>
> I don't see my posts (?). Perhaps Google is holding them.
> I will try starting a new thread with some tests on your algorithm.

Google can be late in adding messages to Google Groups archives.
I post through a news-server in Canada, and it is independent
completely from Google Groups.

I recommend the product of 20 terms then log(.) of Raymond Manzoni.
It would be faster than my idea. Also, it would be quite helpful
for those who know about computing geometric means efficiently
if

----> you can give constants C>0 and D>0 such that
----> C < | a(n)* a(n+1)* .... a(n+19)| < D
----> *always* for your data...

In the spirit of Raymond Manzoni's idea, it is very important
that multiplying 20 consecutive terms does *not* cause
Overflow or Underflow....

Only you can tell people of sci.math the maximum D and the
minimum C. Also, do you use floating point numbers? If so,
are they 32-bit, 64-bit or something else?

You see, without C and D, we can't say if there is a
possibility of Overflow or Underflow when multiplying
20 terms (both Overflow and Underflow are bad).

Regards,

David Bernier

pamela fluente

unread,
Jul 19, 2007, 4:50:07 PM7/19/07
to
On 19 Lug, 11:59, David Bernier <david...@videotron.ca> wrote:

> Google can be late in adding messages to Google Groups archives.
> I post through a news-server in Canada, and it is independent
> completely from Google Groups.

Could you provide the URL? I used to like Google, but now it's
getting really slooow.
Perhaps too crowded, respect to their means.
Please, suggest some free and speedy place where read and post news.

>
> I recommend the product of 20 terms then log(.) of Raymond Manzoni.
> It would be faster than my idea. Also, it would be quite helpful
> for those who know about computing geometric means efficiently
> if


Actually I did not implement it in that way. I preferred to use a
bound.
Do not think that it changes about performance, but I regard
arbitrary choosing
a group size, and also just the product of a few terms could well
overflow.

I liked initially you idea, but when I went to implement it I
discovered the shortcomings:
Bigexponent blowing up and overflow reappering at the end with
2^Bigexponent .
Even correcting that all those division/multiplication by 2 are
often slowing down.
For numbers in (0,1) is not too bad, though.

So who is the winner so far ? Who knows!
Problem is that perhaps also the rounding error should be
considered.
For pure performances the winners seem:

It seems that for numbers out of (0,1) - Raymond sum log
70 Vs 120 approx
In (0,1) - Peter multiplication blocks
60 Vs 70 approx

Probably, considering all, the best would be the Raymond suggestion
done in blocks.
(In my implementation block size is variable and due to limiting
bounds.)


Here is a summary:

=============================
1000000 Numbers Random (0,1)

Naive accumulation: 0, 22
Naive exponentiation: 0,367823977303577, 199
Raymond sum log: 0,367823977303575, 70
Raymond sum log blocks: 0,367823977303575, 93
Peter multiplication blocks: 0,367823977303568, 54
Peter log blocks: 0,367823977303575, 91
David bounding: 0,367823977303565, 81


==================================
1000000 Numbers Random (0,5000)


Naive accumulation: +Infinito, 340
Naive exponentiation: 1838,58110783913, 191
Raymond sum log: 1838,58110783929, 70
Raymond sum log blocks: 1838,58110783929, 94
Peter multiplication blocks: 1838,58110783902, 120
Peter log blocks: 1838,58110783929, 92
David bounding: 1838,58110783901, 297


So what are your final remarks?

Thanks all for the many ideas contributed. You really
enlarged my views!

-P

David Bernier

unread,
Jul 19, 2007, 8:46:45 PM7/19/07
to
pamela fluente wrote:
> On 19 Lug, 11:59, David Bernier <david...@videotron.ca> wrote:
>
>> Google can be late in adding messages to Google Groups archives.
>> I post through a news-server in Canada, and it is independent
>> completely from Google Groups.
>
> Could you provide the URL? I used to like Google, but now it's
> getting really slooow.
> Perhaps too crowded, respect to their means.
> Please, suggest some free and speedy place where read and post news.

I can post using a Videotron news-server because, in the
house where I live, we subscribe to Videotron cable TV and
Internet service. Since you appear to live in Italy,
unfortunately Videotron won't give you a user-name for
e-mail or USENET access.

From the IP address in the headers section of your post,
it seems that you might be getting Internet service from
McLink, with homepage:
http://www.mclink.it/

Your ISP may have its own news-server from which you can read
sci.math and post to it.

If you just want to read news from newsgroups, you can
configure Thunderbird (Mozilla foundation) to access
"free" news-servers.

cf.:
http://freenews.maxbaud.net/

reader2.news.tin.it may be a good choice.

If your ISP has no news-servers, and you don't want to post
from Google, you can get a free account for sci.math posting
from here:
http://mathforum.org/kb/help.jspa

Other than that, some people chose to pay companies who
offer USENET post/read privileges and access.


>> I recommend the product of 20 terms then log(.) of Raymond Manzoni.
>> It would be faster than my idea. Also, it would be quite helpful
>> for those who know about computing geometric means efficiently
>> if
>
>
> Actually I did not implement it in that way. I preferred to use a
> bound.
> Do not think that it changes about performance, but I regard
> arbitrary choosing
> a group size, and also just the product of a few terms could well
> overflow.
>
> I liked initially you idea, but when I went to implement it I
> discovered the shortcomings:
> Bigexponent blowing up and overflow reappering at the end with
> 2^Bigexponent .

In my experiment computing 10^9 !, Bigexponent was stored
in two long integers: the millions part and the units part.

The right idea is to take logs at the end.
log(2^Bigexponent) = Bigexponent*log(2), so
I convert Bigexponent to a double precision floating point
number, and then compute
Bigexponent*log(2). Then add to that log(mantissa),
divide the result by the number of terms for
which you want the geometric mean, and finally
take exp(last_result), where last_result is your
latest result.

Instead of 2, one could divide or multiply
the float PartialProduct by 1000
when either 0.001>PartialProduct or 1000<PartialProduct.
When multiplying by a number over 10^7, we divide by
1000 twice, and so on.

> Even correcting that all those division/multiplication by 2 are
> often slowing down.
> For numbers in (0,1) is not too bad, though.

Dividing or multiplying by 1000 (or 1000000) should speed
things up, however the best method for you is perhaps
best found by experiments with Raymond's method and the
others proposed already.

> So who is the winner so far ? Who knows!
> Problem is that perhaps also the rounding error should be
> considered.

Yes. You could test the accuracy of different methods
by taking the geometric mean of the integers
from 1 to 10^8 or from 1 to 10^9. High accuracy
approximations similar to Stirling's formula exist
to compute log(10^8 !) or log(10^9 !).

> For pure performances the winners seem:

[...]

When you mentioned 1,000,000,000,000 numbers, I figured
that that would be 4000 gigabytes of data if each number
has 32 bits and 8000 gigabytes for 64-bit numbers.

If I may ask, where do the 10^12 numbers come from?

David Bernier

pamela fluente

unread,
Jul 20, 2007, 4:49:39 AM7/20/07
to
On 20 Lug, 02:46, David Bernier <david...@videotron.ca> wrote:


> In my experiment computing 10^9 !, Bigexponent was stored
> in two long integers: the millions part and the units part.
>
> The right idea is to take logs at the end.
> log(2^Bigexponent) = Bigexponent*log(2), so
> I convert Bigexponent to a double precision floating point
> number, and then compute
> Bigexponent*log(2). Then add to that log(mantissa),
> divide the result by the number of terms for
> which you want the geometric mean, and finally
> take exp(last_result), where last_result is your
> latest result.

For that I used:


GeometricMean6 = Math.Exp((Math.Log(GeometricMean6) +
BigExponent * Math.Log(2)) / n)

which seemed enough to avoid the overflow.

> Instead of 2, one could divide or multiply
> the float PartialProduct by 1000
> when either 0.001>PartialProduct or 1000<PartialProduct.
> When multiplying by a number over 10^7, we divide by
> 1000 twice, and so on.

What !?? With this change you become the winner !! :-)))
What's going on here ?

See the new result!! Unless this morning my CPU ha changed its mind,
something is going on here:


1000000 Random Numbers in (0,5000) you get 54 millisec

Naive accumulation: +Infinito, 340
Naive exponentiation: 1837,33300618127, 192
Raymond sum log: 1837,33300618142, 71
Raymond sum log blocks: 1837,33300618142, 92
Peter multiplication blocks: 1837,33300618139, 119
Peter log blocks: 1837,33300618142, 91
David bounding: 1837,33300618139, 54

1000000 Random Numbers in (0,1) you get 39 millisec

Naive accumulation: 0, 21
Naive exponentiation: 0,367500523209765, 202
Raymond sum log: 0,367500523209741, 70
Raymond sum log blocks: 0,367500523209741, 93
Peter multiplication blocks: 0,367500523209729, 54
Peter log blocks: 0,367500523209741, 92
David bounding: 0,367500523209733, 39


Here is the code:

'METHOD6 ----------------------------------
'David proposal

Dim GeometricMean6 As Double = 1
Dim BigExponent As Integer = 0

Const Factor As Double = 1000


For Each Number As Double In LongListOfNumbers

GeometricMean6 *= Number

Do While Math.Abs(GeometricMean6) > SomeThreshold
GeometricMean6 /= Factor


BigExponent += 1
Loop
Do While Math.Abs(GeometricMean6) < SomeEpsilon

GeometricMean6 *= Factor
BigExponent -= 1
Loop

Next Number
GeometricMean6 = Math.Exp((Math.Log(GeometricMean6) +

BigExponent * Math.Log(Factor)) / n)


Or the same in C#


//METHOD6 ----------------------------------
//David proposal

double GeometricMean6 = 1;
int BigExponent = 0;
const double Factor = 1000;
foreach (double Number in LongListOfNumbers) {

GeometricMean6 *= Number;

while (Math.Abs(GeometricMean6) > SomeThreshold) {
GeometricMean6 /= Factor;
BigExponent += 1;
}
while (Math.Abs(GeometricMean6) < SomeEpsilon) {
GeometricMean6 *= Factor;
BigExponent -= 1;
}

}
GeometricMean6 = Math.Exp((Math.Log(GeometricMean6) + BigExponent *

Math.Log(Factor)) / n);

dividing by 10.000 :

Naive accumulation: +Infinito, 339
Naive exponentiation: 1838,11155410447, 191
Raymond sum log: 1838,111554104, 70
Raymond sum log blocks: 1838,111554104, 94
Peter multiplication blocks: 1838,11155410435, 119
Peter log blocks: 1838,111554104, 91
David bounding: 1838,11155410441, 53


Naive accumulation: 0, 26
Naive exponentiation: 0,367685094740025, 199
Raymond sum log: 0,367685094740025, 70
Raymond sum log blocks: 0,367685094740025, 92
Peter multiplication blocks: 0,367685094740025, 54
Peter log blocks: 0,367685094740025, 91
David bounding: 0,367685094740026, 38

>
> > Even correcting that all those division/multiplication by 2 are
> > often slowing down.
> > For numbers in (0,1) is not too bad, though.
>

>


> > So who is the winner so far ? Who knows!
> > Problem is that perhaps also the rounding error should be
> > considered.
>
> Yes. You could test the accuracy of different methods
> by taking the geometric mean of the integers
> from 1 to 10^8 or from 1 to 10^9. High accuracy
> approximations similar to Stirling's formula exist
> to compute log(10^8 !) or log(10^9 !).
>

Apart comparison, which anyway would be done versus other computed
values,
what are the *theoretical tecniques* to compare these methods?

Is anybody able to provide a brief sketch of *approximation
analysis* done just
with pencil and paper ?

[And how is it called this kind of analysis. Are there poiinters for
it?]

>
> If I may ask, where do the 10^12 numbers come from?
>

I just mean a stream of data. Something that cannot be hold in
memory. (I learned
from programming groups that sometimes you need to brutally cut out
the possibility people tend to be problem-specific.
Programming is much about being general.


A curiosity. When I multiply many numbers greater than 1 I am going
to overflow,
and this is clear.

When I multiply many numbers in [0,1] I am going to ...
"underflow" ?! :-)) to 0 (zero)
How is this phenomenon called, is it also an "overflow" or has a
specific name ?

-P

David Bernier

unread,
Jul 20, 2007, 7:26:59 AM7/20/07
to
pamela fluente wrote:
> On 20 Lug, 02:46, David Bernier <david...@videotron.ca> wrote:
[...]

> A curiosity. When I multiply many numbers greater than 1 I am going
> to overflow,
> and this is clear.
>
> When I multiply many numbers in [0,1] I am going to ...
> "underflow" ?! :-)) to 0 (zero)
> How is this phenomenon called, is it also an "overflow" or has a
> specific name ?

In the IEEE 754 standard, there are so-called "denormalized numbers".
These numbers are so small (in absolute value) that the
mantissa part has fewer significant bits than for
"normalized numbers". If we multiply random non-zero
numbers in (0,1), we either go straight from
normalized to real underflow (practically 0 ?), or else
at some stage we pass through one or more "denormalized numbers"...

See e.g. the section _The Answer_ from:
< http://xona.com/2006/07/26.html > .

David Bernier

Raymond Manzoni

unread,
Jul 21, 2007, 5:33:59 AM7/21/07
to
David Bernier a écrit :

Hi David,

In fact if your language allows exception handling you may write code
without this knowledge (merely supposing that unapropriate data will be
uncommon!) :

- in a global way (in C++ for example) :

try
//fast complete (say) 32 terms code
except (...)
{
//slower and more cautious code
}

- in a local way (catching failure in one of the sum term only) :

#define MAX 31

while (s+MAX <= sMax)
{
try
r = log(s[0]*...*s[MAX]); //or log(fabs(...))
except (...)
{
//something more cautious log(0) not wished!!!
r= 0;
for (int i=0; i<=MAX; i++)
if (s[i] != 0.0)
r += log(fabs(s[i]));
}
sum += r;
s += MAX;
}

The exception catching method may be of use too for the other methods
proposed....

Best regards!
Raymond

Raymond Manzoni

unread,
Jul 21, 2007, 5:45:57 AM7/21/07
to
Raymond Manzoni a écrit :

> }
> sum += r;
> s += MAX;
Oops s += MAX+1;
> }

Ok these programmation points were a little OT let's go back to
mathematics! :-)

Raymond

pamela fluente

unread,
Jul 21, 2007, 5:48:00 AM7/21/07
to
On 21 Lug, 11:33, Raymond Manzoni <raym...@free.fr> wrote:
> David Bernier a écrit :
>
>
>
>
>
> > pamela fluente wrote:
> >> Continuing from:
> >>http://groups.google.it/group/sci.math/browse_thread/thread/c2307b8fe...
> Raymond- Nascondi testo tra virgolette -
>
> - Mostra testo tra virgolette -

Yes Raymond. I also thought about that.

Instead of checking that the mean is greater that an upport threshold
of below a lower one, using a try catch on overflow.

However, I think this would slow down terribly the method because then
you would need several
operation to reduce your mean within reasonable bounds, and this slows
down.
If you dont, you remain "close" to the error condition, you are going
to get lots of exceptions
and this is also quite slowing down.

So probably we need a careful choice of the 2 threshold and the
reduction factor (2, 1000, or whatever) .


-P


0 new messages