Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

squeeze out some performance

83 views
Skip to first unread message

Robert Voigtländer

unread,
Dec 6, 2013, 3:47:54 AM12/6/13
to
Hi,

I try to squeeze out some performance of the code pasted on the link below.
http://pastebin.com/gMnqprST

The code will be used to continuously analyze sonar sensor data. I set this up to calculate all coordinates in a sonar cone without heavy use of trigonometry (assuming that this way is faster in the end).

I optimized as much as I could. Maybe one of you has another bright idea to squeeze out a bit more?

Thanks
Robert

Jeremy Sanders

unread,
Dec 6, 2013, 4:46:46 AM12/6/13
to pytho...@python.org
This sort of code is probably harder to make faster in pure python. You
could try profiling it to see where the hot spots are. Perhaps the choice of
arrays or sets might have some speed impact.

One idea would be to use something like cython to compile your python code
to an extension module, with some hints to the types of the various values.

I would go down the geometry route. If you can restate your problem in terms
of geometry, it might be possible to replace all that code with a few numpy
array operations.

e.g. for finding pixels in a circle of radius 50
import numpy as np
radiussqd = np.fromfunction(lambda y,x: (y-50)**2+(x-50)**2, (100,100) )
all_y, all_x = np.indices((100,100))
yvals = all_y[radiussqd < 50**2]

Jeremy


Chris Angelico

unread,
Dec 6, 2013, 7:13:34 AM12/6/13
to pytho...@python.org
On Fri, Dec 6, 2013 at 8:46 PM, Jeremy Sanders <jer...@jeremysanders.net> wrote:
> This sort of code is probably harder to make faster in pure python. You
> could try profiling it to see where the hot spots are. Perhaps the choice of
> arrays or sets might have some speed impact.

I'd make this recommendation MUCH stronger.

Rule 1 of optimization: Don't.
Rule 2 (for experts only): Don't yet.

Once you find that your program actually is running too slowly, then
AND ONLY THEN do you start looking at tightening something up. You'll
be amazed how little you need to change; start with good clean
idiomatic code, and then if it takes too long, you tweak just a couple
of things and it's fast enough. And when you do come to the
tweaking...

Rule 3: Measure twice, cut once.
Rule 4: Actually, measure twenty times, cut once.

Profile your code to find out what's actually slow. This is very
important. Here's an example from a real application (not in Python,
it's in a semantically-similar language called Pike):

https://github.com/Rosuav/Gypsum/blob/d9907e1507c52189c83ae25f5d7be85235b616fa/window.pike

I noticed that I could saturate one CPU core by typing commands very
quickly. Okay. That gets us past the first two rules (it's a MUD
client, it should not be able to saturate one core of an i5). The code
looks roughly like this:

paint():
for line in lines:
if line_is_visible:
paint_line(line)

paint_line():
for piece_of_text in text:
if highlighted: draw_highlighted()
else: draw_not_highlighted()

My first guess was that the actual drawing was taking the time, since
that's a whole lot of GTK calls. But no; the actual problem was the
iteration across all lines and then finding out if they're visible or
not (possibly because it obliterates the CPU caches). Once the
scrollback got to a million lines or so, that was prohibitively
expensive. I didn't realize that until I actually profiled the code
and _measured_ where the time was being spent.

How fast does your code run? How fast do you need it to run? Lots of
optimization questions are answered by "Yaknow what, it don't even
matter", unless you're running in a tight loop, or on a
microcontroller, or something. Halving the time taken sounds great
until you see that it's currently taking 0.0001 seconds and happens in
response to user action.

ChrisA

Robert Voigtländer

unread,
Dec 6, 2013, 11:29:30 AM12/6/13
to
Thanks for your replies.

I already did some basic profiling and optimized a lot. Especially with help of a goof python performance tips list I found.

I think I'll follow the cython path.
The geometry approach also sound good. But it's way above my math/geometry knowledge.

Thanks for your input!

Mark Lawrence

unread,
Dec 6, 2013, 11:36:03 AM12/6/13
to pytho...@python.org
On 06/12/2013 16:29, Robert Voigtl�nder wrote:
> Thanks for your replies.
>
> I already did some basic profiling and optimized a lot. Especially > with help of a goof python performance tips list I found.
>

Wonderful typo -----^ :)

> I think I'll follow the cython path.
> The geometry approach also sound good. But it's way above my math/geometry knowledge.
>
> Thanks for your input!
>


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

Robert Voigtländer

unread,
Dec 6, 2013, 11:44:38 AM12/6/13
to
Am Freitag, 6. Dezember 2013 17:36:03 UTC+1 schrieb Mark Lawrence:

> > I already did some basic profiling and optimized a lot. Especially > with help of a goof python performance tips list I found.
>
> Wonderful typo -----^ :)
>

Oh well :-) ... it was a good one. Just had a quick look at Cython. Looks great. Thanks for the tip.

John Ladasky

unread,
Dec 6, 2013, 11:52:47 AM12/6/13
to
On Friday, December 6, 2013 12:47:54 AM UTC-8, Robert Voigtländer wrote:

> I try to squeeze out some performance of the code pasted on the link below.
> http://pastebin.com/gMnqprST

Several comments:

1) I find this program to be very difficult to read, largely because there's a whole LOT of duplicated code. Look at lines 53-80, and lines 108-287, and lines 294-311. It makes it harder to see what this algorithm actually does. Is there a way to refactor some of this code to use some shared function calls?

2) I looked up the "Bresenham algorithm", and found two references which may be relevant. The original algorithm was one which computed good raster approximations to straight lines. The second algorithm described may be more pertinent to you, because it draws arcs of circles.

http://en.wikipedia.org/wiki/Bresenham's_line_algorithm
http://en.wikipedia.org/wiki/Midpoint_circle_algorithm

Both of these algorithms are old, from the 1960's, and can be implemented using very simple CPU register operations and minimal memory. Both of the web pages I referenced have extensive example code and pseudocode, and discuss optimization. If you need speed, is this really a job for Python?

3) I THINK that I see some code -- those duplicated parts -- which might benefit from the use of multiprocessing (assuming that you have a multi-core CPU). But I would have to read more deeply to be sure. I need to understand the algorithm more completely, and exactly how you have modified it for your needs.

Joel Goldstick

unread,
Dec 6, 2013, 5:07:56 PM12/6/13
to John Ladasky, pytho...@python.org
On Fri, Dec 6, 2013 at 11:52 AM, John Ladasky <john_l...@sbcglobal.net> wrote:
On Friday, December 6, 2013 12:47:54 AM UTC-8, Robert Voigtländer wrote:

> I try to squeeze out some performance of the code pasted on the link below.
> http://pastebin.com/gMnqprST

Not that this will speed up your code but you have this:

    if not clockwise:
        s = start
        start = end
        end = s

Python people would write:
    end, start = start, end
 

You have quite a few if statements that involve multiple comparisons of the same variable.  Did you know you can do things like this in python:

>>> x = 4
>>> 2 < x < 7
True
>>> x = 55
>>> 2 < x < 7
False
 
Several comments:

1) I find this program to be very difficult to read, largely because there's a whole LOT of duplicated code.  Look at lines 53-80, and lines 108-287, and lines 294-311.  It makes it harder to see what this algorithm actually does.  Is there a way to refactor some of this code to use some shared function calls?

2) I looked up the "Bresenham algorithm", and found two references which may be relevant.  The original algorithm was one which computed good raster approximations to straight lines.  The second algorithm described may be more pertinent to you, because it draws arcs of circles.

    http://en.wikipedia.org/wiki/Bresenham's_line_algorithm
    http://en.wikipedia.org/wiki/Midpoint_circle_algorithm

Both of these algorithms are old, from the 1960's, and can be implemented using very simple CPU register operations and minimal memory.  Both of the web pages I referenced have extensive example code and pseudocode, and discuss optimization.  If you need speed, is this really a job for Python?

3) I THINK that I see some code -- those duplicated parts -- which might benefit from the use of multiprocessing (assuming that you have a multi-core CPU).  But I would have to read more deeply to be sure.  I need to understand the algorithm more completely, and exactly how you have modified it for your needs.

Mark Lawrence

unread,
Dec 6, 2013, 5:38:57 PM12/6/13
to pytho...@python.org
On 06/12/2013 16:52, John Ladasky wrote:
> On Friday, December 6, 2013 12:47:54 AM UTC-8, Robert Voigtl�nder wrote:
>
>> I try to squeeze out some performance of the code pasted on the link below.
>> http://pastebin.com/gMnqprST
>
> Several comments:
>
> 1) I find this program to be very difficult to read, largely because there's a whole LOT of duplicated code. Look at lines 53-80, and lines 108-287, and lines 294-311. It makes it harder to see what this algorithm actually does. Is there a way to refactor some of this code to use some shared function calls?
>

A handy tool for detecting duplicated code here
http://clonedigger.sourceforge.net/ for anyone who's interested.

Dan Stromberg

unread,
Dec 6, 2013, 6:01:49 PM12/6/13
to Mark Lawrence, Python List
On Fri, Dec 6, 2013 at 2:38 PM, Mark Lawrence <bream...@yahoo.co.uk> wrote:
On 06/12/2013 16:52, John Ladasky wrote:
On Friday, December 6, 2013 12:47:54 AM UTC-8, Robert Voigtländer wrote:

I try to squeeze out some performance of the code pasted on the link below.
http://pastebin.com/gMnqprST

Several comments:

1) I find this program to be very difficult to read, largely because there's a whole LOT of duplicated code.  Look at lines 53-80, and lines 108-287, and lines 294-311.  It makes it harder to see what this algorithm actually does.  Is there a way to refactor some of this code to use some shared function calls?


A handy tool for detecting duplicated code here http://clonedigger.sourceforge.net/ for anyone who's interested.

Pylint does this too...
 

Robert Voigtländer

unread,
Dec 9, 2013, 9:19:21 AM12/9/13
to
Thanks again. I'll try to compress the code and have a look at the "multiple comparisons" topic.

Robert

Mark Lawrence

unread,
Dec 9, 2013, 9:52:54 AM12/9/13
to pytho...@python.org
Would you be kind enough to compress your messages while you're at it by
reading and actioning https://wiki.python.org/moin/GoogleGroupsPython
thanks, a quick glance above will tell you why :)

Robin Becker

unread,
Dec 9, 2013, 10:54:36 AM12/9/13
to pytho...@python.org
On 06/12/2013 22:07, Joel Goldstick wrote:
..........
>>
>
> Not that this will speed up your code but you have this:
>
> if not clockwise:
> s = start
> start = end
> end = s
>
> Python people would write:
> end, start = start, end

this works for some small number of variables, but on my machine with python 2.7
I start losing with 4 variables eg

> C:\code\optichrome\74663>python -mtimeit -s"a=1;b=2;c=3;d=4" "a,b,c,d=b,c,d,a"
> 1000000 loops, best of 3: 0.206 usec per loop
>
> C:\code\optichrome\74663>python -mtimeit -s"a=1;b=2;c=3;d=4" "t=a;a=b;b=c;c=d;d=t"
> 10000000 loops, best of 3: 0.118 usec per loop


It doesn't seem to make much difference that the variables are related as I see
a similar behaviour for simple assignments

> C:\code\optichrome\74663>python -mtimeit -s"a=1;b=2;c=3;d=4;e=5;f=6;g=7;h=8" "a,b,c,d=e,f,g,h"
> 1000000 loops, best of 3: 0.204 usec per loop
>
> C:\code\optichrome\74663>python -mtimeit -s"a=1;b=2;c=3;d=4;e=5;f=6;g=7;h=8" "a=e;b=f;c=g;d=h"
> 10000000 loops, best of 3: 0.103 usec per loop

for less than 4 variables the tuple method is faster.
--
Robin Becker

Dave Angel

unread,
Dec 9, 2013, 3:46:16 PM12/9/13
to pytho...@python.org
On Mon, 09 Dec 2013 15:54:36 +0000, Robin Becker
<ro...@reportlab.com> wrote:
> On 06/12/2013 22:07, Joel Goldstick wrote:
> > end, start = start, end

> a similar behaviour for simple assignments

> for less than 4 variables the tuple method is faster.

What does speed have to do with it? When you want to swap two
variables, the tuple assignment reads better.

--
DaveA

Mark Lawrence

unread,
Dec 9, 2013, 6:57:57 PM12/9/13
to pytho...@python.org
Actually for optimised code it looks very similar to some code posted
here
http://www.daniweb.com/software-development/python/threads/321181/python-bresenham-circle-arc-algorithm
over three years ago.

Robert Voigtländer

unread,
Dec 10, 2013, 7:14:39 AM12/10/13
to
> Actually for optimised code it looks very similar to some code posted
>
> here
>
> http://www.daniweb.com/software-development/python/threads/321181/python-bresenham-circle-arc-algorithm
>
> over three years ago.
>

This is where it origins from. I just extended it for my needs and now want to optimize it.
List comprehensions instead of some for loops brought another 25%. And made the code shorter.

Robin Becker

unread,
Dec 10, 2013, 7:54:13 AM12/10/13
to pytho...@python.org
Well the OP is asking about performance so I guess the look and feel might be
sacrificed for speed in some circumstances.

The tuple approach is more appealing when the lhs & rhs are connected, but it
seems that even for more general assignments the tuple assignment may be faster
for small numbers of variables. Looking at the output of dis for this case

d,e,f=c,b,a

it seems that we get code like this
LOAD_NAME 3 (c)
LOAD_NAME 2 (b)
LOAD_NAME 1 (a)
ROT_THREE
ROT_TWO
STORE_NAME 4 (d)
STORE_NAME 5 (e)
STORE_NAME 6 (f)



for

d = c
e = b
f = a

we get this
LOAD_NAME 3 (c)
STORE_NAME 4 (d)
LOAD_NAME 2 (b)
STORE_NAME 5 (e)
LOAD_NAME 1 (a)
STORE_NAME 6 (f)

which is not obviously slower, but I consistently get the former to be faster. I
suppose it's a cache issue or something. However, for this particular case when
the variables are not connected I find the tuple assignment less pythonic, but
perhaps faster (even though in this case no tuples are built).
--
Robin Becker

0 new messages