Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

relative speed of incremention syntaxes (or "i=i+1" VS "i+=1")

94 views
Skip to first unread message

Laurent

unread,
Aug 21, 2011, 12:52:23 PM8/21/11
to
Hi Folks,

I was arguing with a guy who was sure that incrementing a variable i with "i += 1" is faster than "i = i + 1". I couldn't tell if he was right or wrong so I did a little benchmark with the very useful timeit module.
Here are the results on my little Linux Eeepc Netbook (using Python 3.2):


Computing, please wait...

Results for 1000000 times "i = i + 1":
0.37591004371643066
0.3827171325683594
0.37238597869873047
0.37305116653442383
0.3725881576538086
0.37294602394104004
0.3712761402130127
0.37357497215270996
0.371567964553833
0.37359118461608887
Total 3.7396 seconds.

Results for 1000000 times "i += 1":
0.3821070194244385
0.3802030086517334
0.3828878402709961
0.3823058605194092
0.3801591396331787
0.38340115547180176
0.3795340061187744
0.38153910636901855
0.3835160732269287
0.381864070892334
Total 3.8175 seconds.

==> "i = i + 1" is 2.08% faster than "i += 1".

I did many tests and "i = i + 1" always seems to be around 2% faster than "i += 1". This is no surprise as the += notation seems to be a syntaxic sugar layer that has to be converted to i = i + 1 anyway. Am I wrong in my interpretation?

Btw here's the trivial Python 3.2 script I made for this benchmark:


import timeit

r = 10
n = 1000000

s1 = "i = i + 1"
s2 = "i += 1"

t1 = timeit.Timer(stmt=s1, setup="i = 0")
t2 = timeit.Timer(stmt=s2, setup="i = 0")

print("Computing, please wait...")

results1 = t1.repeat(repeat=r, number=n)
results2 = t2.repeat(repeat=r, number=n)

print('\nResults for {} times "{}":'.format(n, s1))
sum1 = 0
for result in results1:
print(result)
sum1 += result
print("Total {:.5} seconds.".format(sum1))

print('\nResults for {} times "{}":'.format(n, s2))
sum2 = 0
for result in results2:
print(result)
sum2 += result
print("Total {:.5} seconds.".format(sum2))

print('\n==> "{}" is {:.3}% faster than "{}".'.format(s1,(sum2 / sum1) * 100 - 100, s2))

Comments are welcome...

woooee

unread,
Aug 21, 2011, 12:59:11 PM8/21/11
to
as the += notation seems to be a syntaxic sugar layer that has to be
converted to i = i + 1 anyway.

That has always been my understanding. The faster way is to append to
a list as concatenating usually, requires the original string,
accessing an intermediate block of memory, and the memory for the
final string.
x_list.append(value)
to_string = "".join(x_list)

Laurent

unread,
Aug 21, 2011, 1:03:14 PM8/21/11
to
Well I agree with you about string concatenation, but here I'm talking about integers incrementation...

Irmen de Jong

unread,
Aug 21, 2011, 1:14:19 PM8/21/11
to
On 21-08-11 19:03, Laurent wrote:
> Well I agree with you about string concatenation, but here I'm talking about integers incrementation...

Seems the two forms are not 100% identical:

>>> import dis
>>> def f1(x):
... x=x+1
...
>>> def f2(x):
... x+=1
...
>>>
>>> dis.dis(f1)
2 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 BINARY_ADD
7 STORE_FAST 0 (x)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
>>> dis.dis(f2)
2 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_FAST 0 (x)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
>>>


What the precise difference (semantics and speed) is between the
BINARY_ADD and INPLACE_ADD opcodes, I dunno. Look in the Python source
code or maybe someone knows it from memory :-)

Irmen

Andreas Löscher

unread,
Aug 21, 2011, 1:27:38 PM8/21/11
to
> What the precise difference (semantics and speed) is between the
> BINARY_ADD and INPLACE_ADD opcodes, I dunno. Look in the Python source
> code or maybe someone knows it from memory :-)
>
> Irmen
>
from Python/ceval.c:

1316 case BINARY_ADD:
1317 w = POP();
1318 v = TOP();
1319 if (PyInt_CheckExact(v) && PyInt_CheckExact(w)) {
1320 /* INLINE: int + int */
1321 register long a, b, i;
1322 a = PyInt_AS_LONG(v);
1323 b = PyInt_AS_LONG(w);
1324 /* cast to avoid undefined behaviour
1325 on overflow */
1326 i = (long)((unsigned long)a + b);
1327 if ((i^a) < 0 && (i^b) < 0)
1328 goto slow_add;
1329 x = PyInt_FromLong(i);
1330 }
1331 else if (PyString_CheckExact(v) &&
1332 PyString_CheckExact(w)) {
1333 x = string_concatenate(v, w, f, next_instr);
1334 /* string_concatenate consumed the ref to v */
1335 goto skip_decref_vx;
1336 }
1337 else {
1338 slow_add:
1339 x = PyNumber_Add(v, w);
1340 }
1341 Py_DECREF(v);
1342 skip_decref_vx:
1343 Py_DECREF(w);
1344 SET_TOP(x);
1345 if (x != NULL) continue;
1346 break;

1532 case INPLACE_ADD:
1533 w = POP();
1534 v = TOP();
1535 if (PyInt_CheckExact(v) && PyInt_CheckExact(w)) {
1536 /* INLINE: int + int */
1537 register long a, b, i;
1538 a = PyInt_AS_LONG(v);
1539 b = PyInt_AS_LONG(w);
1540 i = a + b;
1541 if ((i^a) < 0 && (i^b) < 0)
1542 goto slow_iadd;
1543 x = PyInt_FromLong(i);
1544 }
1545 else if (PyString_CheckExact(v) &&
1546 PyString_CheckExact(w)) {
1547 x = string_concatenate(v, w, f, next_instr);
1548 /* string_concatenate consumed the ref to v */
1549 goto skip_decref_v;
1550 }
1551 else {
1552 slow_iadd:
1553 x = PyNumber_InPlaceAdd(v, w);
1554 }
1555 Py_DECREF(v);
1556 skip_decref_v:
1557 Py_DECREF(w);
1558 SET_TOP(x);
1559 if (x != NULL) continue;
1560 break;

As for using Integers, the first case (line 1319 and 1535) are true and
there is no difference in Code. However, Python uses a huge switch-case
construct to execute it's opcodes and INPLACE_ADD cames after
BINARY_ADD, hence the difference in speed.

To be clear, this is nothing you should consider when writing fast code.
Complexity wise they both are the same.

Laurent

unread,
Aug 21, 2011, 1:48:48 PM8/21/11
to
Thanks for these explanations. So 2% speed difference just between "B..." and "I..." entries in a huge alphabetically-ordered switch case? Wow. Maybe there is some material for speed enhancement here...

Hans Mulder

unread,
Aug 21, 2011, 1:57:50 PM8/21/11
to
On 21/08/11 19:14:19, Irmen de Jong wrote:

> What the precise difference (semantics and speed) is between the
> BINARY_ADD and INPLACE_ADD opcodes, I dunno. Look in the Python source
> code or maybe someone knows it from memory :-)

There is a clear difference in semantics: BINARY_ADD always produces
a new object, INPLACE_ADD may modify its left-hand operand in situ
(if it's mutable).

Integers are immutable, so for integers the semantics are the same,
but for lists, for example, the two are different:

>>> x = [2, 3, 5, 7]
>>> y = [11, 13]
>>> x+y
[2, 3, 5, 7, 11, 13]
>>> x
[2, 3, 5, 7] # x still has its original value
>>> x += y
>>> x
[2, 3, 5, 7, 11, 13] # x is now modified
>>>

For integers, I would not expect a measurable difference in speed.

Hope this helps,

-- HansM

Laurent

unread,
Aug 21, 2011, 2:03:16 PM8/21/11
to
Well 2% more time after 1 million iterations so you're right I won't consider it.

Christian Heimes

unread,
Aug 21, 2011, 2:24:17 PM8/21/11
to pytho...@python.org
Am 21.08.2011 19:27, schrieb Andreas Löscher:
> As for using Integers, the first case (line 1319 and 1535) are true and
> there is no difference in Code. However, Python uses a huge switch-case
> construct to execute it's opcodes and INPLACE_ADD cames after
> BINARY_ADD, hence the difference in speed.

I don't think that's the reason. Modern compiles turn a switch statement
into a jump or branch table rather than a linear search like chained
elif statements. Python 3.2 on Linux should also be compiled with
computed gotos.

Christian

Roy Smith

unread,
Aug 21, 2011, 2:52:50 PM8/21/11
to
In article <mailman.282.13139510...@python.org>,
Christian Heimes <li...@cheimes.de> wrote:

> Am 21.08.2011 19:27, schrieb Andreas Löscher:
> > As for using Integers, the first case (line 1319 and 1535) are true and
> > there is no difference in Code. However, Python uses a huge switch-case
> > construct to execute it's opcodes and INPLACE_ADD cames after
> > BINARY_ADD, hence the difference in speed.
>
> I don't think that's the reason. Modern compiles turn a switch statement
> into a jump or branch table rather than a linear search like chained
> elif statements.

This is true even for very small values of "modern". I remember the
Unix v6 C compiler (circa 1977) was able to do this.

Terry Reedy

unread,
Aug 21, 2011, 3:39:53 PM8/21/11
to pytho...@python.org
On 8/21/2011 1:27 PM, Andreas Löscher wrote:
>> What the precise difference (semantics and speed) is between the
>> BINARY_ADD and INPLACE_ADD opcodes, I dunno. Look in the Python source
>> code or maybe someone knows it from memory :-)
>>
>> Irmen
>>
> from Python/ceval.c:
>
> 1316 case BINARY_ADD:
> 1317 w = POP();
> 1318 v = TOP();
> 1319 if (PyInt_CheckExact(v)&& PyInt_CheckExact(w)) {

> 1320 /* INLINE: int + int */
> 1321 register long a, b, i;
> 1322 a = PyInt_AS_LONG(v);
> 1323 b = PyInt_AS_LONG(w);
> 1324 /* cast to avoid undefined behaviour
> 1325 on overflow */
> 1326 i = (long)((unsigned long)a + b);
> 1327 if ((i^a)< 0&& (i^b)< 0)

> 1328 goto slow_add;
> 1329 x = PyInt_FromLong(i);
> 1330 }
> 1331 else if (PyString_CheckExact(v)&&
> 1332 PyString_CheckExact(w)) {
> 1333 x = string_concatenate(v, w, f, next_instr);
> 1334 /* string_concatenate consumed the ref to v */
> 1335 goto skip_decref_vx;
> 1336 }
> 1337 else {
> 1338 slow_add:
> 1339 x = PyNumber_Add(v, w);
> 1340 }
> 1341 Py_DECREF(v);
> 1342 skip_decref_vx:
> 1343 Py_DECREF(w);
> 1344 SET_TOP(x);
> 1345 if (x != NULL) continue;
> 1346 break;
>
> 1532 case INPLACE_ADD:
> 1533 w = POP();
> 1534 v = TOP();
> 1535 if (PyInt_CheckExact(v)&& PyInt_CheckExact(w)) {

> 1536 /* INLINE: int + int */
> 1537 register long a, b, i;
> 1538 a = PyInt_AS_LONG(v);
> 1539 b = PyInt_AS_LONG(w);
> 1540 i = a + b;
> 1541 if ((i^a)< 0&& (i^b)< 0)

With 64 bit 3.2.2 on my Win 7 Pentium, the difference was 4% and with
floats (0.0 and 1.0), 6%

--
Terry Jan Reedy


Laurent

unread,
Aug 21, 2011, 3:53:04 PM8/21/11
to comp.lan...@googlegroups.com, pytho...@python.org

> With 64 bit 3.2.2 on my Win 7 Pentium, the difference was 4% and with
> floats (0.0 and 1.0), 6%

For floats it is understandable. But for integers, seriously, 4% is a lot. I would never have thought an interpreter would have differences like this in syntax for something as fundamental as adding 1.

Laurent

unread,
Aug 21, 2011, 3:55:00 PM8/21/11
to comp.lan...@googlegroups.com, pytho...@python.org
Actually 6% between float themselves is just as not-understandable.

Laurent

unread,
Aug 21, 2011, 4:04:24 PM8/21/11
to
I did the test several times with floats on my machine and the difference is not as big as for integers:


==> "i = i + 1.0" is 0.732% faster than "i += 1.0".

It seems normal as the float addition is supposed to be slower than integer addition, thus the syntaxic difference is comparatively less important.

Laurent

unread,
Aug 21, 2011, 3:53:04 PM8/21/11
to pytho...@python.org

> With 64 bit 3.2.2 on my Win 7 Pentium, the difference was 4% and with
> floats (0.0 and 1.0), 6%

For floats it is understandable. But for integers, seriously, 4% is a lot. I would never have thought an interpreter would have differences like this in syntax for something as fundamental as adding 1.

Laurent

unread,
Aug 21, 2011, 3:55:00 PM8/21/11
to pytho...@python.org

Nobody

unread,
Aug 21, 2011, 5:07:56 PM8/21/11
to
On Sun, 21 Aug 2011 09:52:23 -0700, Laurent wrote:

> I did many tests and "i = i + 1" always seems to be around 2% faster
> than "i += 1". This is no surprise as the += notation seems to be a
> syntaxic sugar layer that has to be converted to i = i + 1 anyway. Am I
> wrong in my interpretation?

It depends. If the value on the left has an __iadd__ method, that will be
called; the value will be updated in-place, so all references to that
object will be affected:

> import numpy as np
> a = np.zeros(3)
> b = a
> a
array([ 0., 0., 0.])
> b
array([ 0., 0., 0.])
> a += 1
> a
array([ 1., 1., 1.])
> b
array([ 1., 1., 1.])

If the value on the left doesn't have an __iadd__ method, then addition is
performed and the name is re-bound to the result:

> a = a + 1
> a
array([ 2., 2., 2.])
> b
array([ 1., 1., 1.])

If you're writing code which could reasonably be expected to work with
arbitrary "numeric" values, you should decide which to use according to
whether in-place modification is appropriate rather than trivial
performance differences. If a difference of a few percent is significant,
Python is probably the wrong language in the first place.

Andreas Löscher

unread,
Aug 21, 2011, 7:17:14 PM8/21/11
to
Am Sonntag, den 21.08.2011, 14:52 -0400 schrieb Roy Smith:
> In article <mailman.282.13139510...@python.org>,
> Christian Heimes <li...@cheimes.de> wrote:
>
> > Am 21.08.2011 19:27, schrieb Andreas Lscher:
> > > As for using Integers, the first case (line 1319 and 1535) are true and
> > > there is no difference in Code. However, Python uses a huge switch-case
> > > construct to execute it's opcodes and INPLACE_ADD cames after
> > > BINARY_ADD, hence the difference in speed.
> >
> > I don't think that's the reason. Modern compiles turn a switch statement
> > into a jump or branch table rather than a linear search like chained
> > elif statements.
>
> This is true even for very small values of "modern". I remember the
> Unix v6 C compiler (circa 1977) was able to do this.

What is the difference in speed between a jump table that is searched
from top to bottom in comparison to an ordinary if-then-elif...? The
difference can only be in the search algorithm regarding the table.
Without optimization (linear search) both are the same. If the compiler
applies some magic the difference can be relevant (linear complexity for
if-then-elif... and O(1) if you would use a dictionary).

Hence the executed code for integers is the same, there must be a slower
path to the code of BINARY_ADD than to INPLACE_ADD.

How would such an jump table work to behave the same liek a
switch-case-statement? Beware, that things like

case PRINT_NEWLINE_TO:
1802 w = stream = POP();
1803 /* fall through to PRINT_NEWLINE */
1804
1805 case PRINT_NEWLINE:

must be supported.


Bye the way:
First line of ceval.c since at least Version 2.4
1
2 /* Execute compiled code */
3
4 /* XXX TO DO:
5 XXX speed up searching for keywords by using a dictionary
6 XXX document it!
7 */

:-)

Andreas Löscher

unread,
Aug 21, 2011, 7:25:34 PM8/21/11
to

It's not as bad as you think. The addition of two integers is a cheap
task (in terms of computation power). If you have two way's to to it,
every little think (jumps in the code etc. ) will have a larger impact
on the execution time than on an expensive operation.

But every improvement on your algorithm will easily result in a
significant shorter execution time than replaceing a+=1 with a=a+1 will
ever do. :-)

Chris Angelico

unread,
Aug 21, 2011, 7:37:45 PM8/21/11
to pytho...@python.org
2011/8/22 Andreas Löscher <andreas....@s2005.tu-chemnitz.de>:

> How would such an jump table work to behave the same liek a
> switch-case-statement?

If your switch statement uses a simple integer enum with sequential
values, then it can be done quite easily. Take this as an example:

switch (argc)
{
case 0: printf("No args at all, this is weird\n"); break;
case 1: printf("No args\n"); break;
case 2: printf("Default for second arg\n");
case 3: printf("Two args\n"); break;
default: printf("Too many args\n"); break;
}

I compiled this using Open Watcom C, looked at the disassembly, and
hereby translate it into pseudocode (I'll email/post the full 80x86
disassembly if you like):

1) Check if argc > 3 (unsigned comparison), if so jump to default case.
2) Left shift argc two places, add a constant offset, fetch a pointer
from there, and jump to it - that's the jump table. One JMP statement.
3) Code follows for each case.

Incidentally, the Open Watcom compiler actually turned several of the
cases into offset-load of the appropriate string pointer, and then a
jump to the single call to printf. The fall-through from 'case 2' to
'case 3' works fine, although it means that 'case 2' has to be
de-optimized from that one simplification.

This type of optimization works best when the case values are
sequential. (If I remove the 'case 0', the compiler decrements argc
and proceeds to continue as above.) Otherwise, the jump table has to
have a lot of copies of the "default" pointer.

Chris Angelico

Terry Reedy

unread,
Aug 21, 2011, 7:38:22 PM8/21/11
to pytho...@python.org
On 8/21/2011 7:17 PM, Andreas Löscher wrote:
> Am Sonntag, den 21.08.2011, 14:52 -0400 schrieb Roy Smith:
>> In article<mailman.282.13139510...@python.org>,
>> Christian Heimes<li...@cheimes.de> wrote:
>>
>>> Am 21.08.2011 19:27, schrieb Andreas Lscher:
>>>> As for using Integers, the first case (line 1319 and 1535) are true and
>>>> there is no difference in Code. However, Python uses a huge switch-case
>>>> construct to execute it's opcodes and INPLACE_ADD cames after
>>>> BINARY_ADD, hence the difference in speed.
>>>
>>> I don't think that's the reason. Modern compiles turn a switch statement
>>> into a jump or branch table rather than a linear search like chained
>>> elif statements.
>>
>> This is true even for very small values of "modern". I remember the
>> Unix v6 C compiler (circa 1977) was able to do this.
>
> What is the difference in speed between a jump table that is searched
> from top to bottom in comparison to an ordinary if-then-elif...? The
> difference can only be in the search algorithm regarding the table.
> Without optimization (linear search) both are the same. If the compiler
> applies some magic the difference can be relevant (linear complexity for
> if-then-elif... and O(1) if you would use a dictionary).

A jump or branch table is applicable when the case value values are all
small ints, like bytes or less. For C, the table is simply an array of
pointers (addressess, with entries for unused byte codes would be a void
pointer). Hence, O(1) access.
https://secure.wikimedia.org/wikipedia/en/wiki/Jump_table

> Hence the executed code for integers is the same, there must be a slower
> path to the code of BINARY_ADD than to INPLACE_ADD.
>
> How would such an jump table work to behave the same liek a
> switch-case-statement? Beware, that things like
>
> case PRINT_NEWLINE_TO:
> 1802 w = stream = POP();
> 1803 /* fall through to PRINT_NEWLINE */

add jump to address of the code for PRINT_NEWLINE

> 1804
> 1805 case PRINT_NEWLINE:
>
> must be supported.

--
Terry Jan Reedy


Chris Angelico

unread,
Aug 21, 2011, 7:41:43 PM8/21/11
to pytho...@python.org
2011/8/22 Andreas Löscher <andreas....@s2005.tu-chemnitz.de>:

> But every improvement on your algorithm will easily result in a
> significant shorter execution time than replaceing a+=1 with a=a+1 will
> ever do. :-)
>

Agreed. If Python needed a faster alternative to "a=a+1", then I would
recommend an "a.inc()" call or something; some way to avoid looking up
the value of 1. (An equivalent to C's ++a operation, if you like.) But
I think you'd be hard-pressed to find any situation where improving
the speed of incrementing will be significant, that wouldn't be better
served by algorithmic improvements (even just using "for a in
range()") or dropping to C.

ChrisA

Laurent Payot

unread,
Aug 21, 2011, 7:49:30 PM8/21/11
to
I made Python my language of choice because of its readability and simpleness, and not because of its speed. But it's always good to know what is the fastest sometimes when you don't want to write a module in C. So I was just wondering if there was a difference. There is, of a few percent. Anyway I will keep on using the 2% slower "i += 1" because for me that's less prone to errors because you write the variable only once, and that's more important than speed.

Andreas Löscher

unread,
Aug 21, 2011, 8:00:38 PM8/21/11
to

:-) too easy or too late
thanks

Terry Reedy

unread,
Aug 21, 2011, 8:35:32 PM8/21/11
to pytho...@python.org
On 8/21/2011 5:07 PM, Nobody wrote:

> If the value on the left has an __iadd__ method, that will be called;

Correct

> the value will be updated in-place,

Not necessarily correct. The target is rebound to the return from the
__iadd__ method. Augmented *assignment* is always assignment. This trips
up people who try

t = (1, [])
t[1] += [1,2] # which *does* extend the array, but raises
TypeError: 'tuple' object does not support item assignment
# instead of
t[1].extend([1,2]) # which extends without raising an error


*IF* (and only if) the target object is mutable, then the __iadd__ may
optionally mutate the target object. But the rebinding takes place
nonetheless. Numbers, the subject of this thread, are not mutable and
are not 'updated in-place'

class test:
def __init__(self, n):
self.n = n
def __iadd__(self, other):
return test(self.n + other.n)
def __repr__(self):
return repr(self.n)

r = test(1)
t = r
t += r
print(r, t)
# 1,2

That said, there is normally no reason to write an __ixxx__ method
unless instances are mutable and the operation can be and is done in
place therein. So the class above is for illustrative purposes only. A
saner example is the following, which treats test examples as mutable
number containers rather than as immutable surrogates.

class test:
def __init__(self, n):
self.n = n
def __add__(self, other):
return test(self.n + other.n)
def __iadd__(self, other):
n = self.n + other.n
self.n = n
return n
def __repr__(self):
return repr(self.n)

r = test(1)
t = r
t += r
print(r, t)
# 2 2

The interpreter cannot enforce that 'x += a' have the same effect as 'x
= x+a', but it would break normal expectations to make the two different.


> so all references to that object will be affected:

Only if the target object is mutable and is mutated by the optional
augmented assignment __ixxx__ methods.

> > import numpy as np
> > a = np.zeros(3)

Numpy arrays meet the qualification above.

> If the value on the left doesn't have an __iadd__ method, then addition is
> performed and the name is re-bound to the result:

As is also done with the results of __ixxx__ methods.

--
Terry Jan Reedy

Steven D'Aprano

unread,
Aug 21, 2011, 9:12:54 PM8/21/11
to
Laurent wrote:

Why? Python integers are rich objects, not native ints. Adding 1 is a
moderately heavyweight operation far more complicated than the same
operation in C.

n=n+1 and n+=1 call different methods and do different things, they are
*not* just different syntax. Your intuition about what should and shouldn't
take the same time should not be trusted.

But really, we're talking about tiny differences in speed. Such trivial
differences are at, or beyond, the limit of what can realistically be
measured on a noisy PC running multiple processes (pretty much all PCs
these days). Here are three runs of each on my computer:


[steve@sylar python]$ python2.5 -m timeit -s 'n=0' 'n = n+1'
1000000 loops, best of 3: 0.508 usec per loop
[steve@sylar python]$ python2.5 -m timeit -s 'n=0' 'n = n+1'
1000000 loops, best of 3: 0.587 usec per loop
[steve@sylar python]$ python2.5 -m timeit -s 'n=0' 'n = n+1'
1000000 loops, best of 3: 0.251 usec per loop


[steve@sylar python]$ python2.5 -m timeit -s 'n=0' 'n += 1'
1000000 loops, best of 3: 0.226 usec per loop
[steve@sylar python]$ python2.5 -m timeit -s 'n=0' 'n += 1'
1000000 loops, best of 3: 0.494 usec per loop
[steve@sylar python]$ python2.5 -m timeit -s 'n=0' 'n += 1'
1000000 loops, best of 3: 0.53 usec per loop

Look at the variation between runs! About 130% variation between the fastest
and slowest for each expression. And there's no reason to think that the
fastest results shown is as fast as it can get. The time is dominated by
noise, not the addition.


For what it's worth, if I try it with a more recent Python:

[steve@sylar python]$ python3.2 -m timeit -s 'n=0' 'n = n+1'
1000000 loops, best of 3: 0.221 usec per loop
[steve@sylar python]$ python3.2 -m timeit -s 'n=0' 'n = n+1'
1000000 loops, best of 3: 0.202 usec per loop
[steve@sylar python]$ python3.2 -m timeit -s 'n=0' 'n = n+1'
1000000 loops, best of 3: 0.244 usec per loop

[steve@sylar python]$ python3.2 -m timeit -s 'n=0' 'n += 1'
1000000 loops, best of 3: 0.49 usec per loop
[steve@sylar python]$ python3.2 -m timeit -s 'n=0' 'n += 1'
1000000 loops, best of 3: 0.176 usec per loop
[steve@sylar python]$ python3.2 -m timeit -s 'n=0' 'n += 1'
1000000 loops, best of 3: 0.49 usec per loop


I simply do not believe that we can justify making *any* claim about the
relative speeds of n=n+1 and n+=1 other than "they are about the same". Any
result you get, faster or slower, will depend more on chance than on any
real or significant difference in the code.


--
Steven

Steven D'Aprano

unread,
Aug 21, 2011, 9:16:58 PM8/21/11
to
Chris Angelico wrote:

> 2011/8/22 Andreas Löscher <andreas....@s2005.tu-chemnitz.de>:
>> But every improvement on your algorithm will easily result in a
>> significant shorter execution time than replaceing a+=1 with a=a+1 will
>> ever do. :-)
>>
>
> Agreed. If Python needed a faster alternative to "a=a+1", then I would
> recommend an "a.inc()" call or something; some way to avoid looking up
> the value of 1.

Keep in mind that ints in Python are objects, not memory locations, and can
be shared. If you are hoping for an in-place increment, I invite you to
consider the following code and try to predict what it would do:

a = 42
b = a
b.inc()
print(a)

--
Steven

Christian Heimes

unread,
Aug 21, 2011, 10:04:08 PM8/21/11
to pytho...@python.org
Am 22.08.2011 01:25, schrieb Andreas Löscher:
> It's not as bad as you think. The addition of two integers is a cheap
> task (in terms of computation power). If you have two way's to to it,
> every little think (jumps in the code etc. ) will have a larger impact
> on the execution time than on an expensive operation.
>
> But every improvement on your algorithm will easily result in a
> significant shorter execution time than replaceing a+=1 with a=a+1 will
> ever do. :-)

You can learn an important lesson here. Since Python is a high level
language without a JIT (yet) integer operations are much slower than in
C or other low level languages. In general it's not a problem for most
people. However if your algorithm is speed bound to integer operations
or has a tight inner for loop than you can gain a considerable amount of
speedup with C code. Reference counting and Python object creation need
several CPU cycles. Also a good algorithm and a modern C compiler can
make use of SIMD instructions, too.

If you ever feel the need to implement something fast e.g. an encryption
algorithm then you'd better off with a C implementation. Cython is my
favourite tool for the job.

Christian

Stephen Hansen

unread,
Aug 21, 2011, 10:08:22 PM8/21/11
to pytho...@python.org

Its, seriously, not even kind of a lot at all. Percentages without
context are meaningless: its 4% slower, sure -- but that is 4% of an
incredibly small probably constant-time amount of time.

Picking "i += 1" over "i = i + 1" based on one being 4% slower is sorta
kinda crazy. The difference in speed is probably related to churn and
cache as much as anything else (its not as consistent on my machine, for
example): or the ceval loop doing a few more case-tests between them as
others have mentioned. All in all, if 4% of a nanomicrofraction of a
chunk of time is that meaningful, you're probably best served not using
Python. :)

That said: my advice is always to avoid += like a plague. It is magic
and impossible to predict without intimate knowledge of exactly what's
on the left-side.

i += 1
n += x

Those two things look very similar, but they may do -completely-
different things depending on just what "n" is.

It may or may not do something that is like:

n = n + x

Or, it may do something that's more akin to

n.extend(x)
n = n

Those aren't even kind of equivalent actions. And things get more
complicated if 'n' is say, n[0] (especially if something goes wrong
between the extend and the rebinding).

Python's usually all explicit and pretty well-defined in how its basic
syntax and behaviors operate, and you usually don't really have to know
details about how a data-type works to predict exactly what it's doing:
in fact, its often beneficial to not pay too much attention to such
details, and just assume the data type will work approximately as you'd
expect. That way people can slip something-something to you and wink and
say of /course/ its a dict, darling. Try it, you'll like it, okay? This
sorta thing is encouraged, but it kinda depends on trusting objects to
behave a certain way and for things to be predictable in both how they
work and how they fail.

With "i = i + 1", I know that generally speaking, my "i" is being
assigned a new object and that's that, no matter what type "i" is.
(Okay: I do know that you could modify __add__ to do something
underhanded here, tweaking internal state and then returning self.
People going out of their way to behave unpredictably is not my
objection: supposedly easy and straight-forward normal Python-fu being
inherently unpredictable is).

For example: I just /know/ that it doesn't matter who or what may have
their own binding to that object before I go and increment it, they
won't be affected and everything just will work fine. With augmented
assignment, I can't be sure of that. Now, while I admit, you generally
do have to keep track in your head of which of your data-types are
mutable vs immutable and take care with sharing mutables, the fact that
"n += x" is described and generally thought of as merely syntactical
sugar for:

n = n + x

... lets one easily think that this should be entirely safe, even with
mutable objects, because if += were merely syntactical sugar, it would
be. But its not! Because += is wiggly. It can do more then one entirely
different kind of behavior.

Anyways. </rant> I've been kinda annoyed at augmented assignment for
years now :P

--

Stephen Hansen
... Also: Ixokai
... Mail: me+list/python (AT) ixokai (DOT) io
... Blog: http://meh.ixokai.io/

signature.asc

Terry Reedy

unread,
Aug 21, 2011, 10:11:46 PM8/21/11
to pytho...@python.org
On 8/21/2011 7:41 PM, Chris Angelico wrote:

> Agreed. If Python needed a faster alternative to "a=a+1", then I would
> recommend an "a.inc()" call or something

But looking up the method name, creating a bound method wrapper, and
making the call would probably be slower than the syntax;-).

--
Terry Jan Reedy

Terry Reedy

unread,
Aug 21, 2011, 10:15:07 PM8/21/11
to pytho...@python.org

For longer variable names, it is also easier and faster to read once one
gets used to the idiom.

number_of_chars += 1 # versus
number_of_chars = number_of_chars + 1

Not repeating was a major reason for the addition.

--
Terry Jan Reedy

Steven D'Aprano

unread,
Aug 22, 2011, 12:14:32 AM8/22/11
to
On Mon, 22 Aug 2011 12:08 pm Stephen Hansen wrote:

> Picking "i += 1" over "i = i + 1" based on one being 4% slower is sorta
> kinda crazy. The difference in speed is probably related to churn and
> cache as much as anything else (its not as consistent on my machine, for
> example): or the ceval loop doing a few more case-tests between them as
> others have mentioned. All in all, if 4% of a nanomicrofraction of a
> chunk of time is that meaningful, you're probably best served not using
> Python. :)

Agreed.

But...

> That said: my advice is always to avoid += like a plague. It is magic
> and impossible to predict without intimate knowledge of exactly what's
> on the left-side.
>
> i += 1
> n += x
>
> Those two things look very similar, but they may do -completely-
> different things depending on just what "n" is.


Technically, the *exact same criticism* can be applied to:

n = n + x

since either n or x could override __add__ or __radd__ and do anything it
bloody well likes. Including in-place modification of *either* argument
(where possible).


[...]


> With "i = i + 1", I know that generally speaking, my "i" is being
> assigned a new object and that's that, no matter what type "i" is.
> (Okay: I do know that you could modify __add__ to do something
> underhanded here, tweaking internal state and then returning self.

What makes you think that's underhanded?

To my mind, __add__ modifying self as a side-effect is unusual, but apart
from breaking the general rule of thumb to avoid side-effects, not
particularly evil. (But I would accept this is a code smell.)


> People going out of their way to behave unpredictably is not my
> objection: supposedly easy and straight-forward normal Python-fu being
> inherently unpredictable is).
>
> For example: I just /know/ that it doesn't matter who or what may have
> their own binding to that object before I go and increment it, they
> won't be affected and everything just will work fine.

But you can't /know/ that at all, unless you know that the object
isn't "underhanded" (however you define that!). And given some arbitrary
object, how can you know that?

--
Steven

casevh

unread,
Aug 22, 2011, 12:14:37 AM8/22/11
to
On Aug 21, 10:27 am, Andreas Löscher <andreas.loesc...@s2005.tu-

chemnitz.de> wrote:
>
> from Python/ceval.c:
>
> 1316            case BINARY_ADD:
> 1317                w = POP();
> 1318                v = TOP();
> 1319                if (PyInt_CheckExact(v) && PyInt_CheckExact(w)) {

> 1320                    /* INLINE: int + int */
> 1321                    register long a, b, i;
> 1322                    a = PyInt_AS_LONG(v);
> 1323                    b = PyInt_AS_LONG(w);
> 1324                    /* cast to avoid undefined behaviour
> 1325                       on overflow */
> 1326                    i = (long)((unsigned long)a + b);
> 1327                    if ((i^a) < 0 && (i^b) < 0)

> 1328                        goto slow_add;
> 1329                    x = PyInt_FromLong(i);
> 1330                }
> 1331                else if (PyString_CheckExact(v) &&
> 1332                         PyString_CheckExact(w)) {
> 1333                    x = string_concatenate(v, w, f, next_instr);
> 1334                    /* string_concatenate consumed the ref to v */
> 1335                    goto skip_decref_vx;
> 1336                }
> 1337                else {
> 1338                  slow_add:
> 1339                    x = PyNumber_Add(v, w);
> 1340                }
> 1341                Py_DECREF(v);
> 1342              skip_decref_vx:
> 1343                Py_DECREF(w);
> 1344                SET_TOP(x);
> 1345                if (x != NULL) continue;
> 1346                break;
>
> 1532            case INPLACE_ADD:
> 1533                w = POP();
> 1534                v = TOP();
> 1535                if (PyInt_CheckExact(v) && PyInt_CheckExact(w)) {

> 1536                    /* INLINE: int + int */
> 1537                    register long a, b, i;
> 1538                    a = PyInt_AS_LONG(v);
> 1539                    b = PyInt_AS_LONG(w);
> 1540                    i = a + b;
> 1541                    if ((i^a) < 0 && (i^b) < 0)

> 1542                        goto slow_iadd;
> 1543                    x = PyInt_FromLong(i);
> 1544                }
> 1545                else if (PyString_CheckExact(v) &&
> 1546                         PyString_CheckExact(w)) {
> 1547                    x = string_concatenate(v, w, f, next_instr);
> 1548                    /* string_concatenate consumed the ref to v */
> 1549                    goto skip_decref_v;
> 1550                }
> 1551                else {
> 1552                  slow_iadd:
> 1553                    x = PyNumber_InPlaceAdd(v, w);
> 1554                }
> 1555                Py_DECREF(v);
> 1556              skip_decref_v:
> 1557                Py_DECREF(w);
> 1558                SET_TOP(x);
> 1559                if (x != NULL) continue;
> 1560                break;
>
> As for using Integers, the first case (line 1319 and 1535) are true and
> there is no difference in Code. However, Python uses a huge switch-case
> construct to execute it's opcodes and INPLACE_ADD cames after
> BINARY_ADD, hence the difference in speed.

That fragment of cevel.c is from a 2.x version. Python 2.x supports
both a PyInt and PyLong type and the cevel loop optimized the PyInt
case only. On my system, I could not measure a difference between
binary and inplace addition.

Python 3.x behaves differently:

TARGET(BINARY_ADD)
w = POP();
v = TOP();
if (PyUnicode_CheckExact(v) &&
PyUnicode_CheckExact(w)) {
x = unicode_concatenate(v, w, f, next_instr);
/* unicode_concatenate consumed the ref to v */
goto skip_decref_vx;
}
else {
x = PyNumber_Add(v, w);
}
Py_DECREF(v);
skip_decref_vx:
Py_DECREF(w);
SET_TOP(x);
if (x != NULL) DISPATCH();
break;

TARGET(INPLACE_ADD)
w = POP();
v = TOP();
if (PyUnicode_CheckExact(v) &&
PyUnicode_CheckExact(w)) {
x = unicode_concatenate(v, w, f, next_instr);
/* unicode_concatenate consumed the ref to v */
goto skip_decref_v;
}
else {
x = PyNumber_InPlaceAdd(v, w);
}
Py_DECREF(v);
skip_decref_v:
Py_DECREF(w);
SET_TOP(x);
if (x != NULL) DISPATCH();
break;

cevel just calls PyNumber_Add or PyNumber_InPlaceAdd. If you look at
the code for PyNumber_InPlaceAdd (in abstract.c), it calls an internal
function binary_iop1 with pointers to nb_inplace_add and nb_add.
binary_iop1 then checks if nb_inplace_add exists. The PyLong type does
not implement nb_inplace_add so the check fails and binary_iop1 used
nb_add.

In recent version of gmpy and gmpy2, I implemented the nb_inplace_add
function and performance (for the gmpy.mpz type) is much better for
the in-place addition.

For the adventuresome, gmpy2 implements a mutable integer type called
xmpz. It isn't much faster until the values are so large that the
memory copy times become significant. (Some old gmpy documentation
implies that operations with mutable integers should be much faster.
With agressive caching of deleted objects, the object creation
overhead is very low. So the big win for mutable integers is reduced
to avoiding memory copies.)

casevh

Stephen Hansen

unread,
Aug 22, 2011, 12:37:16 AM8/22/11
to pytho...@python.org
On 8/21/11 9:14 PM, Steven D'Aprano wrote:
>> That said: my advice is always to avoid += like a plague. It is magic
>> and impossible to predict without intimate knowledge of exactly what's
>> on the left-side.
>>
>> i += 1
>> n += x
>>
>> Those two things look very similar, but they may do -completely-
>> different things depending on just what "n" is.
>
> Technically, the *exact same criticism* can be applied to:
>
> n = n + x
>
> since either n or x could override __add__ or __radd__ and do anything it
> bloody well likes. Including in-place modification of *either* argument
> (where possible).

I know: I addressed that. :P See, you know I addressed that, because:

> [...]
>> With "i = i + 1", I know that generally speaking, my "i" is being
>> assigned a new object and that's that, no matter what type "i" is.
>> (Okay: I do know that you could modify __add__ to do something
>> underhanded here, tweaking internal state and then returning self.
>
> What makes you think that's underhanded?

You quoted me addressing that very fact, and responded! :)

Its underhanded outside of narrow, well-defined situations such as ORM's
and the like where they're doing interesting and novel things with the
object model.

Underhanded doesn't mean there's no reason one should ever do it, but
its very much an unusual thing and objects that do things like that
should NOT freely interact with general Python objects: very surprising
behavior will result.

> To my mind, __add__ modifying self as a side-effect is unusual, but apart
> from breaking the general rule of thumb to avoid side-effects, not
> particularly evil. (But I would accept this is a code smell.)

I didn't really say its evil, just that its underhanded: that's like,
sketchy, not evil. :)

There's good reasons to do it on occasion, but if an object does
something like that then that is a very special kind of object and when
using it, you need to be very aware of its unusual semantics.

Having objects like that is fine.

However, for the /language/ to have unusual (and unpredictable, from
something of a distance at least) semantics, rubs me very much the wrong
way.

>> People going out of their way to behave unpredictably is not my
>> objection: supposedly easy and straight-forward normal Python-fu being
>> inherently unpredictable is).
>>
>> For example: I just /know/ that it doesn't matter who or what may have
>> their own binding to that object before I go and increment it, they
>> won't be affected and everything just will work fine.
>
> But you can't /know/ that at all, unless you know that the object
> isn't "underhanded" (however you define that!). And given some arbitrary
> object, how can you know that?

I CAN know it with sufficient certainty that I can rely on it, and if I
get messed up on it-- its never happened yet-- then I can fire someone
or get a new library, after staring at an odd traceback and scratching
my head for a minute.

Python has very soft rules, I know that. Objects /can/ do all kinds of
crazy things. I'm okay with that.

But I can rely on certain behaviors being consistent: the basic behavior
of the language is very straight-forward. It has hooks where you can do
lots of Special stuff, and if objects are Special, they're described as
Special.

ORM's have a lot of Special objects, for example. That's fine: when
messing with ORM objects, I know I have to take care with the operators
I use against them, because I know I'm not using a /usual/ Python
object. SQLAlchemy for example. This is not a criticism of SQLAlchemy:
having magic objects lets it construct SQL in ways that are vastly
easier then if they stuck to more regularly behaving objects.

But, += is Python itself adding an unpredictable behavior into the core
language, with its own base types behaving

The *exact same criticism* can NOT be applied to

n = n + x

Because my criticism isn't about one choosing to do crazy stuff with the
object model. I've never advocated Python be strict about rules.

But for Python, all by itself, with nothing but built-in and basic
types, to have a situation where:

a = a + b
a += b

... does two very distinctly different actions, even if in many or
even most circumstances the end-result is probably the same and probably
fine, is my criticism.

signature.asc

Stephen Hansen

unread,
Aug 22, 2011, 12:49:03 AM8/22/11
to me+list...@ixokai.io, pytho...@python.org
On 8/21/11 9:37 PM, Stephen Hansen wrote:
> But, += is Python itself adding an unpredictable behavior into the core
> language, with its own base types behaving

... very differently to fundamental, basic appearing operators.

Editing fail on my part.

Similarly:

> But for Python, all by itself, with nothing but built-in and basic
> types, to have a situation where:
>
> a = a + b
> a += b

... would be clearer if the second example were "x += y".

> ... does two very distinctly different actions, even if in many or
> even most circumstances the end-result is probably the same and probably
> fine, is my criticism.
>

--

signature.asc

Seebs

unread,
Aug 22, 2011, 1:33:28 AM8/22/11
to
On 2011-08-21, Andreas L?scher <andreas....@s2005.tu-chemnitz.de> wrote:
> Am Sonntag, den 21.08.2011, 14:52 -0400 schrieb Roy Smith:
>> In article <mailman.282.13139510...@python.org>,
>> Christian Heimes <li...@cheimes.de> wrote:
>> > I don't think that's the reason. Modern compiles turn a switch statement
>> > into a jump or branch table rather than a linear search like chained
>> > elif statements.

>> This is true even for very small values of "modern". I remember the
>> Unix v6 C compiler (circa 1977) was able to do this.

> What is the difference in speed between a jump table that is searched
> from top to bottom

A jump table isn't searched, it's jumped into. You don't look through the
table for a matching element, you grab the Nth element of the table.

-s
--
Copyright 2011, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
I am not speaking for my employer, although they do rent some of my opinions.

Richard D. Moores

unread,
Aug 22, 2011, 5:55:11 AM8/22/11
to Steven D'Aprano, pytho...@python.org

I couldn't resist giving it a try. Using Python 3.2.1 on a 64-bit
Windows 7 machine with a 2.60 gigahertz AMD Athlon II X4 620
processor, I did 18 tests, alternating between n=n+1 and n+=1 (so 9
each).

The fastest for n+=1 was
C:\Windows\System32> python -m timeit -r 3 -s "n=0" "n += 1"
10000000 loops, best of 3: 0.0879 usec per loop

The slowest for n+=1 was
C:\Windows\System32> python -m timeit -r 3 -s "n=0" "n += 1"
10000000 loops, best of 3: 0.0902 usec per loop

The fastest for n = n + 1 was
C:\Windows\System32> python -m timeit -r 3 -s "n=0" "n=n+1"
10000000 loops, best of 3: 0.0831 usec per loop

The slowest for n = n + 1 was
C:\Windows\System32> python -m timeit -r 3 -s "n=0" "n=n+1"
10000000 loops, best of 3: 0.0842 usec per loop

Coincidence?

All the data are pasted at http://pastebin.com/jArPSe56

Dick Moores

Emile van Sebille

unread,
Aug 22, 2011, 12:35:56 PM8/22/11
to pytho...@python.org
On 8/22/2011 2:55 AM Richard D. Moores said...

> I couldn't resist giving it a try. Using Python 3.2.1 on a 64-bit
> Windows 7 machine with a 2.60 gigahertz AMD Athlon II X4 620
> processor, I did 18 tests, alternating between n=n+1 and n+=1 (so 9
> each).
>
> The fastest for n+=1 was
> C:\Windows\System32> python -m timeit -r 3 -s "n=0" "n += 1"
> 10000000 loops, best of 3: 0.0879 usec per loop
>
> The slowest for n+=1 was
> C:\Windows\System32> python -m timeit -r 3 -s "n=0" "n += 1"
> 10000000 loops, best of 3: 0.0902 usec per loop
>
> The fastest for n = n + 1 was
> C:\Windows\System32> python -m timeit -r 3 -s "n=0" "n=n+1"
> 10000000 loops, best of 3: 0.0831 usec per loop
>
> The slowest for n = n + 1 was
> C:\Windows\System32> python -m timeit -r 3 -s "n=0" "n=n+1"
> 10000000 loops, best of 3: 0.0842 usec per loop
>
> Coincidence?
>

Naaa.. I just ran it twice -- once per ... _this_ is coincidence. :)

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\Emile>python -m timeit -r 3 -s "n=0" "n=n+1"
10000000 loops, best of 3: 0.108 usec per loop

C:\Documents and Settings\Emile>python -m timeit -r 3 -s "n=0" "n += 1"
10000000 loops, best of 3: 0.108 usec per loop

C:\Documents and Settings\Emile>


Emile van Sebille

unread,
Aug 22, 2011, 1:22:25 PM8/22/11
to pytho...@python.org
On 8/22/2011 9:35 AM Emile van Sebille said...

> On 8/22/2011 2:55 AM Richard D. Moores said...
>> Coincidence?
>>
>
> Naaa.. I just ran it twice -- once per ... _this_ is coincidence. :)
>
> Microsoft Windows XP [Version 5.1.2600]
> (C) Copyright 1985-2001 Microsoft Corp.
>
> C:\Documents and Settings\Emile>python -m timeit -r 3 -s "n=0" "n=n+1"
> 10000000 loops, best of 3: 0.108 usec per loop
>
> C:\Documents and Settings\Emile>python -m timeit -r 3 -s "n=0" "n += 1"

> 10000000 loops, best of 3: 0.108 usec per loop

Actually, it's more likely I hit a minimum resolution issue -- I ran it
twenty more times and never got a different result.

Emile


0 new messages