Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Efficient Histogram Algorithm?

77 views
Skip to first unread message

Julian Francis

unread,
Oct 11, 2010, 5:17:43 AM10/11/10
to
Dear all,

Does anyone know of a way to implement a fast histogram algorithm in
Mathematica?

My first attempt (In44) was based around a table and using the count
function. A bit of thought indicates that this might not be very
efficient as it would seem to involve examining the list 255 times,
when only once is necessary. The second attempt involved using the
built in library function BinCounts, but surprisingly this was only
about twice as fast as my initial naive version, which makes me wonder
if it is as efficient as it could be. I can assume for my purposes
that the list will always have integer values between 0 and 255, and
that there are 256 bins.

I wasn't able to think of a way to implement the histogram function in
a functional programming style that was also efficient, although it is
fairly easy to write in an imperative language. This may reflect the
vast majority of my programming experience has been in imperative
languages.

I am using an old version of Mathematica (v4.1) (but on a modern PC);
but if anyone has dramatically different times for the BinCounts
function, then maybe it is time for me to upgrade.

My motivation is a pattern recognition algorithm I am implementing,
but it requires a large number of histograms to be created, hence the
running time of an individual histogram is actually quite important.

Thanks for any help,
Julian.

In[44]:=
jhistogram[list_] := Table[Count[list, n], {n, 0, 255}]

In[51]:=
\!\(Timing[\(randomList = Table[Random[Integer, 255], {10\^6}];\)]\)

Out[51]=
{1.373 Second, Null}

In[54]:=
Timing[hist = jhistogram[randomList];]

Out[54]=
{2.34 Second, Null}

In[56]:=
<< Statistics`

In[57]:=
Timing[hist = BinCounts[randomList, {0, 255}];]

Out[57]=
{0.936 Second, Null}

Bob Hanlon

unread,
Oct 12, 2010, 4:23:40 AM10/12/10
to

Tally is much faster but was introduced in Version 6.

jhistogram[list_] :=
Table[Count[list, n], {n, 0, 255}]

jhistogram3[list_] :=
Sort[Tally[list]][[All, 2]]

randomList = RandomInteger[{0, 255}, 10^6];

Timing[hist1 =
jhistogram[randomList];][[1]]

0.360395

For BinCounts you need to use 256 as the upper bound

Timing[hist2 =
BinCounts[randomList, {0, 256}];][[1]]

0.278348

Timing[hist3 =
jhistogram3[randomList];][[1]]

0.003623

hist1 == hist2 == hist3

True

Total[hist1] == Total[hist2] ==
Total[hist3] == 10^6

True


Bob Hanlon

---- Julian Francis <julian.w...@gmail.com> wrote:

=============

Sjoerd C. de Vries

unread,
Oct 12, 2010, 4:26:05 AM10/12/10
to
My laptop is 5 years old but on V7 it takes BinCounts 0.468 s to
calculate this.

Cheers -- Sjoerd

On Oct 11, 11:17 am, Julian Francis <julian.w.fran...@gmail.com>
wrote:

Ray Koopman

unread,
Oct 12, 2010, 4:26:15 AM10/12/10
to

Here's a compiled routine that's twice as fast as BinCounts. Also,
note that you need to give BinCounts a non-intuitive set of bins.

In[1]:= << Statistics`

In[2]:= jhistogram[list_] := Table[Count[list,n],{n,0,255}]

In[3]:= freqs = Compile[{{data,_Integer,1}},
Module[{bins = Table[0,{256}]},
Scan[bins[[#]]++&,data+1]; bins]];

In[4]:= Timing@Length[randomList = Table[Random[Integer,255],{10^6}]]
Timing@Length[hist1 = jhistogram[randomList]]
Timing@Length[hist2 = BinCounts[randomList,{-1,255}]]
Timing@Length[hist3 = freqs[randomList]]
SameQ[hist1,hist2,hist3]

Out[4]= {1.26 Second,1000000}
Out[5]= {2.5 Second,256}
Out[6]= {0.9 Second,256}
Out[7]= {0.45 Second,256}
Out[8]= True

Julian Francis

unread,
Oct 12, 2010, 1:50:55 PM10/12/10
to
Dear all,

Thanks very much for all your replies. It looks like I can get a
performance increase using Ray's compiled function. There seems to be
an enormous speed up using the Tally function, but for me, that means
shelling out for a new Mathematica license. The new home pricing
structure makes that a bit more bearable. Maybe it is time!

Thanks all.

Regards,
Julian.

Darren Glosemeyer

unread,
Oct 12, 2010, 1:51:17 PM10/12/10
to

BinCounts is really a function for continuous values. Given that we know
the values of interest are integers from 0 to 255, this could be made
even faster by using Tally (added in version 6).

In[1]:= jhistogram[list_] := Table[Count[list, n], {n, 0, 255}]

In[2]:= freqs = Compile[{{data, _Integer, 1}},


Module[{bins = Table[0, {256}]}, Scan[bins[[#]]++ &, data + 1];
bins]];

In[3]:= Timing@Length[randomList = Table[Random[Integer, 255], {10^6}]]

Out[3]= {0.391, 1000000}

In[4]:= Timing@Length[hist1 = jhistogram[randomList]]

Out[4]= {0.531, 256}

In[5]:= tallybased = With[{tallies = Tally[randomList]},
Split[
Sort[Join[tallies,
Transpose[{Range[0, 255],
ConstantArray[0,
256]}]]], #[[1]] == #2[[1]] &][[All, -1, -1]]]; //
Timing

-17
Out[5]= {1.40946 10 , Null}

In[6]:= SameQ[hist1, tallybased]

Out[6]= True


Tally gives a list of {value, count} pairs for the elements in the list.
The code above works by getting the tallies for the values in list,
adding elements of the form {i, 0} (this is done so we can make sure a 0
count is present if the value is not in the list) for i from 0 to 255,
sorting so like i values are next to each other and the 0 count will
come first, splitting based on the i value, and picking out the count
from the last {i,count} pair in the group for each i so we only get the
actual counts for the list and not any additional padding 0s.

Darren Glosemeyer
Wolfram Research

Julian Francis

unread,
Oct 15, 2010, 2:02:45 PM10/15/10
to

Have gone with jhistogram3:

jhistogram3[list_] :=
Sort[Tally[Join[Range[0, 255], list]]][[All, 2]] - 1

In[96]:= Timing[hist3 = jhistogram3[randomList];][[1]]


Out[96]= 0.015

In[98]:= tallybased =


With[{tallies = Tally[randomList]},
Split[Sort[
Join[tallies,
Transpose[{Range[0, 255],
ConstantArray[0,
256]}]]], #[[1]] == #2[[1]] &][[All, -1, -1]]]; // Timing

Out[98]= {0.016, Null}

It is based on Bob's solution, but incorporates (insight from Darren)
a fix where if there were missing values in the random array, it
wouldn't have returned a value (of 0) for that.
The Range ensures that all values in the histogram will return, then I
subtract one from each of the histogram buckets to compensate for the
fact that I put an extra one in in the
first place.

They perform pretty similarly, I just find the first one easier for me
to understand.

Thanks all.
Julian.

Julian Francis

unread,
Oct 16, 2010, 12:15:17 PM10/16/10
to

Have gone with jhistogram3:

jhistogram3[list_] :=
Sort[Tally[Join[Range[0, 255], list]]][[All, 2]] - 1

In[96]:= Timing[hist3 = jhistogram3[randomList];][[1]]


Out[96]= 0.015

In[98]:= tallybased =


With[{tallies = Tally[randomList]},
Split[Sort[
Join[tallies,
Transpose[{Range[0, 255],
ConstantArray[0,
256]}]]], #[[1]] == #2[[1]] &][[All, -1, -1]]]; // Timing

Out[98]= {0.016, Null}

0 new messages