Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Faster/Slower Version of K. Hinsen's "histogram" [source]

2 views
Skip to first unread message

Jeff Templon

unread,
Jan 11, 1999, 3:00:00 AM1/11/99
to
Hi,

I am writing a document for our software crew about why it might
be good to use some Python glue in the Next Greatest Version.
In so doing, I converted one of my old fitting codes from
Fortran to Python. I needed to generate a spectrum to fit,
so I pull out RNG from the NumPy distribution and the "histogram"
function from Konrad's Statistics.py.

Holy maroley. My disk goes into hyperdrive, Python interpreter
RSS at 24 megabytes ...

The story: his routine creates a 2D array which has as dimensions
<number of histogram bins>, <number of data points> if I understand
the code correctly. In my case, this was something like 1000 x 9000!
I had two other calls to histograms with the same bin count,
and data sizes of 1600 and 450.

I wrote a different version which uses a sieve-like algorithm
to sort the data into a histogram. I am publishing it so that
others can improve it (please let me know, I am fairly new to
serious use of NumPy). Also hopefully it will be useful to
someone (Konrad, feel free to include it as "bighistogram"
or whatever in your package.)

It is slower and faster. It uses twice as much CPU time as
does Konrad's version, but only 7% of the wall clock time
on my 32 MB machine.

Below is included stat2.py which contains the new histogram
generator, as well as a test program gausgen3.py. Konrad,
my call to your histogram function looks funny, since I removed
the "transpose" call in the return statment (otherwise I'd
just have to transpose it back!)

Timings:
using Konrad's version, 2.95s user, 2.75s sys, 3:04 total
(took so long due to heavy paging)
using my new version, 10.59s user, 0.13s sys, 12.16s total
(more CPU but much less clock)

J "keeping swap partitions alive in nuclear physics" T

ps: Konrad, you use a variable named "range" in Statistics.histogram
which caused me a few moments grief, pondering the message
"call of non-function" when I started to modify your routine.
I changed the name to "hrange".


stat2.py
==================================================
import Numeric

# first several lines lifted verbatim from Konrad's code.
# note I changed "range" to "hrange" for obvious reasons.

def histogram(data, nbins, hrange = None):
data = Numeric.array(data, Numeric.Float)
if hrange is None:
min = Numeric.minimum.reduce(data)
max = Numeric.maximum.reduce(data)
else:
min, max = hrange
data = Numeric.repeat(data,
Numeric.logical_and(Numeric.less_equal(data, max),
Numeric.greater_equal(data,
min)))
bin_width = (max-min)/nbins
bins = min + bin_width*Numeric.arange(nbins)
loedge = bins
hiedge = bins + bin_width
counts = Numeric.zeros(nbins)
tdata = Numeric.array(data,copy=1)

for k in range(nbins) :
abovelo = Numeric.greater_equal(tdata, loedge[k])
belowhi = Numeric.less( tdata, hiedge[k])
counts[k] = Numeric.add.reduce(Numeric.logical_and(abovelo,belowhi))
tdata = Numeric.repeat(tdata, Numeric.logical_not(belowhi))

bins = bins + bin_width/2
return Numeric.array([bins, counts])

==================================================
gausgen3.py
==================================================
#-> gausgen3 -- generate a test histo for curfit program
#-> uses gauss generator from RNG (comes with Numeric)
#-> also uses slightly modified histogram routine from Statistics.py
#-> by Konrad Hinsen http://starship.skyport.net/crew/hinsen/scientific.htm
#-> JAT 01/10/99

from Numeric import *
import RNG
import Statistics
import stat2

#-> this section is sort of the "global data" for the program
#-> for a program of this size, it's not really needed

#-> upper and lower bounds, binsize for histo

lbincen = 0.0 # these are actually the first and last bin centers
ubincen = 10.0
binsize = 0.01 # size of each bin

nbins = (ubincen-lbincen)/binsize + 1
lolim = lbincen - binsize/2.0
hilim = ubincen + binsize/2.0

def main() :

dist1 = RNG.NormalDistribution( 4.42, 0.78 )
dist2 = RNG.NormalDistribution( 7.12, 2.43 )
dist3 = RNG.UniformDistribution(0.0, 10.0)

rng1 = RNG.CreateGenerator(0, dist1)
rng2 = RNG.CreateGenerator(0, dist2)
rng3 = RNG.CreateGenerator(0, dist3)

## st = Statistics # Konrad's histogram
st = stat2 # Jeff's histogram

his1 = st.histogram(rng1.sample(450), nbins, (lolim,hilim) )
his2 = st.histogram(rng2.sample(8965), nbins, (lolim,hilim) )
his3 = st.histogram(rng3.sample(1631), nbins, (lolim,hilim) )

histot = his1[1] + his2[1] + his3[1]
xax = his1[0]

print 'Writing histo to his.out ...'

of = open('his.out', 'w')

for i in range(len(histot)) :
of.write(repr(xax[i]) + ' ' + repr(histot[i]) + '\n')

of.close

print 'Done'

main()

0 new messages