Does anyone know of a way to implement a fast histogram algorithm in
Mathematica?
My first attempt (In44) was based around a table and using the count
function. A bit of thought indicates that this might not be very
efficient as it would seem to involve examining the list 255 times,
when only once is necessary. The second attempt involved using the
built in library function BinCounts, but surprisingly this was only
about twice as fast as my initial naive version, which makes me wonder
if it is as efficient as it could be. I can assume for my purposes
that the list will always have integer values between 0 and 255, and
that there are 256 bins.
I wasn't able to think of a way to implement the histogram function in
a functional programming style that was also efficient, although it is
fairly easy to write in an imperative language. This may reflect the
vast majority of my programming experience has been in imperative
languages.
I am using an old version of Mathematica (v4.1) (but on a modern PC);
but if anyone has dramatically different times for the BinCounts
function, then maybe it is time for me to upgrade.
My motivation is a pattern recognition algorithm I am implementing,
but it requires a large number of histograms to be created, hence the
running time of an individual histogram is actually quite important.
Thanks for any help,
Julian.
In[44]:=
jhistogram[list_] := Table[Count[list, n], {n, 0, 255}]
In[51]:=
\!\(Timing[\(randomList = Table[Random[Integer, 255], {10\^6}];\)]\)
Out[51]=
{1.373 Second, Null}
In[54]:=
Timing[hist = jhistogram[randomList];]
Out[54]=
{2.34 Second, Null}
In[56]:=
<< Statistics`
In[57]:=
Timing[hist = BinCounts[randomList, {0, 255}];]
Out[57]=
{0.936 Second, Null}
jhistogram[list_] :=
Table[Count[list, n], {n, 0, 255}]
jhistogram3[list_] :=
Sort[Tally[list]][[All, 2]]
randomList = RandomInteger[{0, 255}, 10^6];
Timing[hist1 =
jhistogram[randomList];][[1]]
0.360395
For BinCounts you need to use 256 as the upper bound
Timing[hist2 =
BinCounts[randomList, {0, 256}];][[1]]
0.278348
Timing[hist3 =
jhistogram3[randomList];][[1]]
0.003623
hist1 == hist2 == hist3
True
Total[hist1] == Total[hist2] ==
Total[hist3] == 10^6
True
Bob Hanlon
---- Julian Francis <julian.w...@gmail.com> wrote:
=============
Cheers -- Sjoerd
On Oct 11, 11:17 am, Julian Francis <julian.w.fran...@gmail.com>
wrote:
Here's a compiled routine that's twice as fast as BinCounts. Also,
note that you need to give BinCounts a non-intuitive set of bins.
In[1]:= << Statistics`
In[2]:= jhistogram[list_] := Table[Count[list,n],{n,0,255}]
In[3]:= freqs = Compile[{{data,_Integer,1}},
Module[{bins = Table[0,{256}]},
Scan[bins[[#]]++&,data+1]; bins]];
In[4]:= Timing@Length[randomList = Table[Random[Integer,255],{10^6}]]
Timing@Length[hist1 = jhistogram[randomList]]
Timing@Length[hist2 = BinCounts[randomList,{-1,255}]]
Timing@Length[hist3 = freqs[randomList]]
SameQ[hist1,hist2,hist3]
Out[4]= {1.26 Second,1000000}
Out[5]= {2.5 Second,256}
Out[6]= {0.9 Second,256}
Out[7]= {0.45 Second,256}
Out[8]= True
Thanks very much for all your replies. It looks like I can get a
performance increase using Ray's compiled function. There seems to be
an enormous speed up using the Tally function, but for me, that means
shelling out for a new Mathematica license. The new home pricing
structure makes that a bit more bearable. Maybe it is time!
Thanks all.
Regards,
Julian.
BinCounts is really a function for continuous values. Given that we know
the values of interest are integers from 0 to 255, this could be made
even faster by using Tally (added in version 6).
In[1]:= jhistogram[list_] := Table[Count[list, n], {n, 0, 255}]
In[2]:= freqs = Compile[{{data, _Integer, 1}},
Module[{bins = Table[0, {256}]}, Scan[bins[[#]]++ &, data + 1];
bins]];
In[3]:= Timing@Length[randomList = Table[Random[Integer, 255], {10^6}]]
Out[3]= {0.391, 1000000}
In[4]:= Timing@Length[hist1 = jhistogram[randomList]]
Out[4]= {0.531, 256}
In[5]:= tallybased = With[{tallies = Tally[randomList]},
Split[
Sort[Join[tallies,
Transpose[{Range[0, 255],
ConstantArray[0,
256]}]]], #[[1]] == #2[[1]] &][[All, -1, -1]]]; //
Timing
-17
Out[5]= {1.40946 10 , Null}
In[6]:= SameQ[hist1, tallybased]
Out[6]= True
Tally gives a list of {value, count} pairs for the elements in the list.
The code above works by getting the tallies for the values in list,
adding elements of the form {i, 0} (this is done so we can make sure a 0
count is present if the value is not in the list) for i from 0 to 255,
sorting so like i values are next to each other and the 0 count will
come first, splitting based on the i value, and picking out the count
from the last {i,count} pair in the group for each i so we only get the
actual counts for the list and not any additional padding 0s.
Darren Glosemeyer
Wolfram Research
jhistogram3[list_] :=
Sort[Tally[Join[Range[0, 255], list]]][[All, 2]] - 1
In[96]:= Timing[hist3 = jhistogram3[randomList];][[1]]
Out[96]= 0.015
In[98]:= tallybased =
With[{tallies = Tally[randomList]},
Split[Sort[
Join[tallies,
Transpose[{Range[0, 255],
ConstantArray[0,
256]}]]], #[[1]] == #2[[1]] &][[All, -1, -1]]]; // Timing
Out[98]= {0.016, Null}
It is based on Bob's solution, but incorporates (insight from Darren)
a fix where if there were missing values in the random array, it
wouldn't have returned a value (of 0) for that.
The Range ensures that all values in the histogram will return, then I
subtract one from each of the histogram buckets to compensate for the
fact that I put an extra one in in the
first place.
They perform pretty similarly, I just find the first one easier for me
to understand.
Thanks all.
Julian.
jhistogram3[list_] :=
Sort[Tally[Join[Range[0, 255], list]]][[All, 2]] - 1
In[96]:= Timing[hist3 = jhistogram3[randomList];][[1]]
Out[96]= 0.015
In[98]:= tallybased =
With[{tallies = Tally[randomList]},
Split[Sort[
Join[tallies,
Transpose[{Range[0, 255],
ConstantArray[0,
256]}]]], #[[1]] == #2[[1]] &][[All, -1, -1]]]; // Timing
Out[98]= {0.016, Null}