Code speedup tips

Sean Richards

unread,

Mar 1, 2003, 6:36:44 PM3/1/03

to

Hi,

Just started looking at Python as a tool for investigating 2d Cellular
Automata. I am a complete novice in both disciplines so be gentle (I
don't pretend to be a skilled programmer either) ;) Found some code for
a 1d CA which I have crudely modified to work with 2d CA. I would
appreciate some advice on how to make this code more efficient, as it
already takes about 1 minute with a 200X200 array on a 1.2ghz processor.
I can see that the nested loops are the bottleneck but what better
alternatives does Python have for iterating over a 2d array. I have
only being playing with Python for a very short time and there is so
much out there that I am getting myself a bit lost in all the
information. Anyway here is the code with a *very* simple rule as an
example - read it and weep :)

----8<-----------------------------------------------------------------

# Simple cellular automata - code 1022
# Rule specifies that a cell should become black if any
# of its neighbours were black on previous step
# Original code by Frank Buss - http://www.frank-buss.de/automaton/rule30.html

# import the Numeric module - used for the arrays
from Numeric import *
# import the Tk module - used to display the CA
from Tkinter import *

def CA_Function(Length):
# Create the two lattices
current = zeros((Length,Length),Int)
next = zeros((Length,Length),Int)

# Set the start cell to black
current[len(current)/2,len(current)/2] = 1
next[len(current)/2,len(current)/2] = 1

# Apply the rule
for step in xrange(1,(Length/2)-1):
for i in xrange(1,len(current)-1):
for j in xrange(1,len(current)-1):
if current[i-1,j]== 1 or current[i,j-1]== 1 or \
current[i,j+1]== 1 or current[i+1,j]== 1:
next[i,j] = 1
# Swap the lattices at each step
(current,next) = (next,current)
# Draw the lattice
CA_Draw(current)

def CA_Draw(Lattice):
for x in range(1,len(Lattice)-1):
for y in range(1,len(Lattice)-1):
if Lattice[x,y]:
img.put("black",(x,y))
else:
img.put("white",(x,y))

# Initialise the Tk interface
root = Tk()
root.title('2d CA')

# Create the empty image
Length = 200
img=PhotoImage(width=Length,height=Length)

# Apply the function
CA_Function(Length)

# Display image
lbl=Label(root,image=img)
lbl.pack()
root.mainloop()

----8<-----------------------------------------------------------------

On the 200x200 array that gives me 40,000 elements, I go over the entire
array 100 times -> 4,000,000 iterations
Then to fill the image is another 200x200 operations -> 40,000

Any tips on how to make this more efficient would be greatly appreciated.

Sean

--
+---------------------------------------------------------------+
| All spelling errors are intentional and are there to show new |
| and improved ways of spelling old words. |
+---------------------------------------------------------------+

Dagur Páll Ammendrup

unread,

Mar 1, 2003, 6:47:53 PM3/1/03

to sean...@myrealbox.com

Hi

If you haven't already, I recommend reading this:

http://manatee.mojam.com/~skip/python/fastpython.html

-Dagur

Carl Banks

unread,

Mar 1, 2003, 9:02:23 PM3/1/03

to

Sean Richards wrote:
> Hi,

Hello.

[snip]

> I can see that the nested loops are the bottleneck but what better
> alternatives does Python have for iterating over a 2d array.

The nested loops are the bottleneck; however, Numeric does have
alternatives. In fact, Numeric was invented so that you don't have to
iterate over arrays (at the Python level, anyways). Iterating over a
Numeric array is almost always the wrong way to do it.

What you want to do is take advantage of Numeric's built-in iteration,
which is much faster. I'm sure you're aware that adding two arrays
adds all the elements in them. You can apply the automaton in the
same way.

Consider the simple automaton rule here. A cell's value in the
following iteration depends on the values of four pixels from the
current iteration: the cells immediately above, below, left of, and
right of the cell in question. A simple way to apply this rule is to
add the four pixel values, and the resulting pixel is 1 if the sum is
nonzero. We can add the pixel values

Now, you might be thinking, "But Numeric operations only work with
corresponding entries. I can add a[3,2] to b[3,2], but not a[3,2] to
a[3,4]. How can I do this quickly with Numeric?"

The answer is to use slicing. What you want to do is take various
slices from the lattice, each of them of the same size, but offset so
that when you add them, the proper cells line up.

To exemplify: suppose you have an array A of dimension 10x10. Let B
and C be the following slices of A:

B = A[:-2,:-2]
C = A[2:,2:]

Now, let D = B + C. Both B and C are 8x8 arrays, so this is possible.
Consider what the result D[3,3] will be. It would be B[3,3]+C[3,3],
of course. B[3,3] is A[3,3], while C[3,3] is A[5,5]. The end result
is that D[3,3] == A[3,3]+A[5,5]. Using this technique, the loops can
be gotten rid of.

Understand? It's kind of hard to put your head around, I imagine,
especially if you don't understand slicing.

Here is how I rewrote CA_Function. There are some details in it I've
left unexplained; feel free to ask for a clarification:

def CA_Function(Length,Steps):

# Create the two lattices
current = zeros((Length,Length),Int)

workspace = zeros((Length-2,Length-2),Int)

# Create slices of the current lattice
# We don't have to do this every step because slicing shares memory
left = current[0:-2,1:-1]
up = current[1:-1,0:-2]
right = current[2:,1:-1]
down = current[1:-1,2:]

# Set the start cell to black

current[Length/2,Length/2] = 1

# Apply the rule
for step in xrange(Steps):
# In the workspace, add the shifted lattices
# This uses a bunch of += because it's faster
workspace[:] = left
workspace[:] += up
workspace[:] += right
workspace[:] += down
# Set the current lattice
current[1:-1,1:-1] = where(workspace,1,0)

--
CARL BANKS

Andrew Henshaw

unread,

Mar 1, 2003, 10:48:35 PM3/1/03

to

Carl Banks wrote:

> Sean Richards wrote:
>> Hi,
>
...snip...

> # Apply the rule
> for step in xrange(Steps):
> # In the workspace, add the shifted lattices
> # This uses a bunch of += because it's faster
> workspace[:] = left
> workspace[:] += up
> workspace[:] += right
> workspace[:] += down
> # Set the current lattice
> current[1:-1,1:-1] = where(workspace,1,0)
>

You also might want to use bitwise or, so that you can elimnate the 'where'
step

# Apply the rule
for step in xrange(Steps):

workspace[:] = left
workspace[:] |= up
workspace[:] |= right
workspace[:] |= down

# Set the current lattice

current[1:-1,1:-1] = workspace

Sean Richards

unread,

Mar 1, 2003, 11:05:45 PM3/1/03

to

Great stuff, thankyou :)

Raymond Hettinger

unread,

Mar 2, 2003, 12:01:03 AM3/2/03

to

"Carl Banks"

Nice job.

Sean Richards

unread,

Mar 2, 2003, 2:47:23 AM3/2/03

to

On Sun, 02 Mar 2003 02:02:23 GMT, Carl Banks wrote:

[Excellent description of array slicing snipped]

Thanks Carl that was just what I needed. I had to draw the array slices
on a piece of graph paper to get a complete idea of what was happening
but I have now got my head around it. That is such a simple concept
(once you understand it) and so powerful. Just by altering the
combination of array slices I can create any neighbourhood I want. That
has made my day - thankyou :)

> Here is how I rewrote CA_Function. There are some details in it I've
> left unexplained; feel free to ask for a clarification:
>
>
> def CA_Function(Length,Steps):
> # Create the two lattices
> current = zeros((Length,Length),Int)
> workspace = zeros((Length-2,Length-2),Int)
>
> # Create slices of the current lattice
> # We don't have to do this every step because slicing shares memory
> left = current[0:-2,1:-1]
> up = current[1:-1,0:-2]
> right = current[2:,1:-1]
> down = current[1:-1,2:]
>
> # Set the start cell to black
> current[Length/2,Length/2] = 1
>
> # Apply the rule
> for step in xrange(Steps):
> # In the workspace, add the shifted lattices
> # This uses a bunch of += because it's faster
> workspace[:] = left
> workspace[:] += up
> workspace[:] += right
> workspace[:] += down
> # Set the current lattice
> current[1:-1,1:-1] = where(workspace,1,0)

OK this was a big improvement. Using the original nested loop version it
took ~12min for a 500x500 array, using array slices dropped the time to
~1min. That is a pretty significant reduction. One small thing I needed
to do to get the rule to work correctly was add the line ..
'workspace[:] += current[1:-1,1:-1]'
when you calculate the workspace. If we don't do this then on step 2 the
centre pixel becomes white instead of staying black. This effect
propagates until the end result is a diamond of alternating black and
white pixels instead of a solid black diamond.
That was a great help Carl, my total Python 'uptime' is only a couple of
hours but it looks like it is going to be an excellent tool.

Thanks, Sean

Miki Tebeka

unread,

Mar 2, 2003, 3:37:12 AM3/2/03

to

Hello Sean,

> Any tips on how to make this more efficient would be greatly appreciated.

1. Try using psyco (http://psyco.sourceforge.net/)
2. Try using python2.3 (still in alpha but stable and around 20% faster than 2.2)

HTH.
Miki

Fernando Perez

unread,

Mar 2, 2003, 3:52:37 AM3/2/03

to

Sean Richards wrote:

> OK this was a big improvement. Using the original nested loop version it
> took ~12min for a 500x500 array, using array slices dropped the time to
> ~1min. That is a pretty significant reduction. One small thing I needed
> to do to get the rule to work correctly was add the line ..
> 'workspace[:] += current[1:-1,1:-1]'
> when you calculate the workspace. If we don't do this then on step 2 the
> centre pixel becomes white instead of staying black. This effect
> propagates until the end result is a diamond of alternating black and
> white pixels instead of a solid black diamond.
> That was a great help Carl, my total Python 'uptime' is only a couple of
> hours but it looks like it is going to be an excellent tool.

You should take a look at scipy, in particular the weave tool:
http://www.scipy.org/site_content/weave

Here:
http://www.scipy.org/site_content/weave/python_performance.html
you'll find a good comparison of various approaches to problems similar to
what you have.

Numeric is always the right starting point, and Carl's solution was
excellent. But you may find situations where it's not enough. And the
nice thing about python is that there's still a lot of options.

Look also at f2py (http://cens.ioc.ee/projects/f2py2e) if Fortran rocks your
boat. As a tool, f2py is just amazing in how easy it makes it to wrap
existing fortran codes.

Finally, Pyrex and psyco are always options, but I'm not familiar yet with
them enough to say more about them.

Best,

f.

Justin Shaw

unread,

Mar 2, 2003, 9:07:18 AM3/2/03

to

> Just started looking at Python as a tool for investigating 2d Cellular
> Automata.

Sean,
What are you using to draw the results? I have coded up some of
Wolfram's stuff and had a bottle neck in the drawing of the pictures.

Justin

Sean Richards

unread,

Mar 4, 2003, 1:26:28 AM3/4/03

to

On Sun, 2 Mar 2003 09:07:18 -0500, Justin Shaw wrote:

> What are you using to draw the results? I have coded up some of
> Wolfram's stuff and had a bottle neck in the drawing of the pictures.

Hi,

I am using a Tkinter PhotoImage to draw the CA. I ran the python
profiler over the new improved code and it uses ~25% of the processing
time to draw the picture. So you are right there is a bottle neck in the
drawing of the picture. Did you come up with a quicker way of displaying
the array as an image?

Sean Richards

unread,

Mar 4, 2003, 1:30:42 AM3/4/03

to

On Sun, 02 Mar 2003 01:52:37 -0700, Fernando Perez wrote:

> You should take a look at scipy, in particular the weave tool:
> http://www.scipy.org/site_content/weave
>
> Here:
> http://www.scipy.org/site_content/weave/python_performance.html
> you'll find a good comparison of various approaches to problems similar to
> what you have.

Using inline C looks pretty interesting. Perhaps I will have another go
at getting scipy installed.

A. Lloyd Flanagan

unread,

Mar 4, 2003, 10:01:39 AM3/4/03

to

Sean Richards <som...@invalid.com> wrote in message news:<slrnb62hdc...@hugin.valhalla.net>...

>
> Any tips on how to make this more efficient would be greatly appreciated.
>
> Sean

You probably don't need it if you're using Numeric, but you might try
the array standard module. It avoids a lot of the overhead of lists.

Michael Hudson

unread,

Mar 4, 2003, 10:56:58 AM3/4/03

to

alloydf...@attbi.com (A. Lloyd Flanagan) writes:

> You probably don't need it if you're using Numeric, but you might try
> the array standard module. It avoids a lot of the overhead of lists.

Usually, arrays only save space, not time. I'm not sure which was the
issue here...

Cheers,
M.

--
If you give someone Fortran, he has Fortran.
If you give someone Lisp, he has any language he pleases.
-- Guy L. Steele Jr, quoted by David Rush in comp.lang.scheme.scsh

Carl Banks

unread,

Mar 4, 2003, 11:15:25 AM3/4/03

to

Sean Richards wrote:
> On Sun, 2 Mar 2003 09:07:18 -0500, Justin Shaw wrote:
>
>> What are you using to draw the results? I have coded up some of
>> Wolfram's stuff and had a bottle neck in the drawing of the pictures.
>
> Hi,
>
> I am using a Tkinter PhotoImage to draw the CA. I ran the python
> profiler over the new improved code and it uses ~25% of the processing
> time to draw the picture. So you are right there is a bottle neck in the
> drawing of the picture. Did you come up with a quicker way of displaying
> the array as an image?

Again, setting an image pixel-by-pixel is almost always the wrong way
to do it.

I don't know anything about this PhotoImage thing, but I warrant
there's a way to set every pixel at once, maybe using data passed in a
string. Find it, then use Numeric operations to get the data in the
required format, then use the Numeric tostring method to extract the
data into a string, and finally pass in the string.

To exemplify, here is how I would convert Numeric data into an image
using Python Imaging Library:

img = Image.fromstring('L',(width,height),
((1-lattice)*255).astype(Int8).tostring())

(One other thing: Python has added a buffer protocol, to be supported
by objects that store raw data. If Numeric and PhotoImage both
support it, you might not need the tostring call. In fact, the
example above didn't require it on my system.)

--
CARL BANKS

Fernando Perez

unread,

Mar 5, 2003, 5:57:43 PM3/5/03

to

Sean Richards wrote:

> Hi,
>
> I am using a Tkinter PhotoImage to draw the CA. I ran the python
> profiler over the new improved code and it uses ~25% of the processing
> time to draw the picture. So you are right there is a bottle neck in the
> drawing of the picture. Did you come up with a quicker way of displaying
> the array as an image?

The attached code below is an updated version of view.py from the NumTut
tutorial for Numeric. I just sent it to Paul Dubois last week to fix the
existing one, which is badly broken wrt threading behavior.

Since I don't know if this is publically available yet, feel free to use it
in the meantime. It uses .tostring() and goes through a temp file, so I
don't know if it will indeed be any faster than what you are doing already.

But you can tweak it for performance if you need, by trying to use an mmaped
file, for example. If you improve it significantly, don't hesitate to send
your changes to Paul.

You should also look at mayavi. Last week Prabhu added a tool for viewing
arrays as images, which is fantastic (imv.py). Not much faster than this
view.py, but far more flexible and powerful.

And mayavi will blow your socks off for scientific visualization.

Cheers,

f.

view.py

Sean Richards

unread,

Mar 7, 2003, 10:04:33 PM3/7/03

to

On Tue, 04 Mar 2003 16:15:25 GMT, Carl Banks wrote:

> To exemplify, here is how I would convert Numeric data into an image
> using Python Imaging Library:
>
> img = Image.fromstring('L',(width,height),
> ((1-lattice)*255).astype(Int8).tostring())

That was good. In the original implementation of the code I first looked
at the author used a PhotoImage so I just followed his example. Using
the original code a 1 dimensional CA of length 1000 takes 80.5 secs and
does 499581 function calls to complete 500 steps. I re-implemented this
using the array slicing technique you covered earlier, bitwise operators,
optimised the way the image was created and used the PIL example above
to display the final image. Now takes 0.290 secs and 570 function calls
to do 500 steps of the 1000 long 1 dimensional CA.
Would have taken me months to figure out all those cool tricks so thanks
Carl and everyone else who helped.

Sean :)

Frank Buss

unread,

Mar 9, 2003, 8:11:06 AM3/9/03

to

Thanks for all the tips. I've updated my page, using a class for hiding all
the slice handling:

http://www.frank-buss.de/automaton/rule30.html

But I think for the calculation of the sum of all 1's, it could be faster.
Currently it is implemented like this:

# print the number of '1's for all automaton steps
for step in range(length / 2):
print reduce(lambda x, y: x+y, rule30.current)
rule30.step()

Where 'current' is a Numeric's array. Any tips?

Another optimization idea: using only 1 bit for every cell, instead of a
full Int.

PS: for 2D cellular optimization see http://cell-auto.com/optimisation/

--
Frank Buß, f...@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

Frank Buss

unread,

Mar 9, 2003, 9:41:30 AM3/9/03

to

Frank Buss <f...@frank-buss.de> wrote:

> But I think for the calculation of the sum of all 1's, it could be
> faster. Currently it is implemented like this:
>
> # print the number of '1's for all automaton steps
> for step in range(length / 2):
> print reduce(lambda x, y: x+y, rule30.current)
> rule30.step()
>
> Where 'current' is a Numeric's array. Any tips?

Changjune Kim sent me the idea, to use ".count(1)". This doesn't work with
the Numeric array (perhaps it should be added to the Numeric array-class),
but after ".tolist()" it is 5 times faster: 11 seconds for 5000 arrays of
size 10000 with tolist().count(1) and 52 seconds for my lambda construct.

for step in range(length / 2):

print(rule30.current.tolist().count(1))
rule30.step()

Alex Martelli

unread,

Mar 9, 2003, 12:48:28 PM3/9/03

to

Frank Buss wrote:

>> But I think for the calculation of the sum of all 1's, it could be
>> faster. Currently it is implemented like this:
>>
>> # print the number of '1's for all automaton steps
>> for step in range(length / 2):
>> print reduce(lambda x, y: x+y, rule30.current)
>> rule30.step()
>>
>> Where 'current' is a Numeric's array. Any tips?
>
> Changjune Kim sent me the idea, to use ".count(1)". This doesn't work with
> the Numeric array (perhaps it should be added to the Numeric array-class),
> but after ".tolist()" it is 5 times faster: 11 seconds for 5000 arrays of
> size 10000 with tolist().count(1) and 52 seconds for my lambda construct.
>
> for step in range(length / 2):
> print(rule30.current.tolist().count(1))
> rule30.step()

My machine and yours seem to have similar speed -- for 5000
arrays of 10000 ones each:

47.10 lambdareduce
30.85 addreduce
11.32 count1s
0.33 sum

the last line is simply calling Numeric.sum(rule30.current)
and is of course the speediest solution. addreduce uses
operator.add instead of your lambda, by the way.

Alex

Frank Buss

unread,

Mar 9, 2003, 1:21:24 PM3/9/03

to

Alex Martelli <al...@aleax.it> wrote:

> My machine and yours seem to have similar speed -- for 5000
> arrays of 10000 ones each:
>
> 47.10 lambdareduce
> 30.85 addreduce
> 11.32 count1s
> 0.33 sum

Thanks, sum is great! It took 3 seconds on my machine, but I stopped the
time of the whole program. So Python has a data thruput of over 16 MB per
second on my 1.5 GHz Pentium 4. Looks like it is good suited for my CAs
(http://www.frank-buss.de/automaton/). Perhaps I should do more with it, at
least for interactive experiments.

Currently most of it is implemented in Java, because it's easier for the
user to open it in their browsers, without the need to load and to install
a Python environment (and all additional required Python packages) first.

A. Lloyd Flanagan

unread,

Mar 10, 2003, 7:28:10 AM3/10/03

to

Frank Buss <f...@frank-buss.de> wrote in message news:<b4g0n4$l3b$1...@newsreader2.netcologne.de>...

> Currently most of it is implemented in Java, because it's easier for the
> user to open it in their browsers, without the need to load and to install
> a Python environment (and all additional required Python packages) first.

There's a version of python implemented in Java. Go to www.python.org
and look for 'Jython'. It might work better for you.