[Python-Dev] Python 3.3 vs. Python 2.7 benchmark results (again, but this time more solid numbers)

Brett Cannon

unread,

Oct 26, 2012, 3:14:08 PM10/26/12

to python-dev

I re-ran the unladen benchmarks on my work machine and w/o the -b option flipped on (i.e. more thorough benchmark numbers). I figured I would share them now instead of after my PyCon Argentina talk in case people decide to dig into the results now, find a pathological problem in CPython, and then fix it before I give my presentation (if you have trouble running a benchmark or it isn't available in the repo because it's one I hacked together, just ask and I can help you run the benchmark if you want to try to speed things up). I have colour-coded benchmarks based on whether it is faster or slower in Python 3.3 (sorry for those of you who hate HTML email).

But the tl;dr message is that Python 3.3 looks good compared to Python 2.7 (the median benchmark score is 5% slower).

Worst benchmark is nosite_startup, best is telco. The benchmarks people might want to analyze (i.e. more than 20% slower in Python 3.3) are mako_v2, threaded_count, normal_startup, iterative_count, pathlib, formatted_logging, and simple_logging.

###########################################

Report on Linux 3.2.5-gg987 #1 SMP Fri Sep 14 02:36:36 PDT 2012 x86_64 x86_64

Total CPU cores: 12

### 2to3 ###

9.320000 -> 8.980000: 1.04x faster

### call_method ###

Min: 0.417756 -> 0.355247: 1.18x faster

Avg: 0.419688 -> 0.356382: 1.18x faster

Significant (t=92.85)

Stddev: 0.00604 -> 0.00577: 1.0479x smaller

Timeline: b'http://tinyurl.com/8soruze'

### call_method_slots ###

Min: 0.417611 -> 0.358451: 1.17x faster

Avg: 0.420761 -> 0.359676: 1.17x faster

Significant (t=88.70)

Stddev: 0.00605 -> 0.00588: 1.0291x smaller

Timeline: b'http://tinyurl.com/8uu8234'

### call_method_unknown ###

Min: 0.459057 -> 0.359327: 1.28x faster

Avg: 0.462929 -> 0.360410: 1.28x faster

Significant (t=137.99)

Stddev: 0.00698 -> 0.00583: 1.1969x smaller

Timeline: b'http://tinyurl.com/9mo7h24'

### call_simple ###

Min: 0.341689 -> 0.265289: 1.29x faster

Avg: 0.343003 -> 0.266503: 1.29x faster

Significant (t=124.20)

Stddev: 0.00555 -> 0.00511: 1.0859x smaller

Timeline: b'http://tinyurl.com/9pnn7q4'

### chameleon ###

Min: 0.072232 -> 0.062713: 1.15x faster

Avg: 0.074588 -> 0.064261: 1.16x faster

Significant (t=33.74)

Stddev: 0.00284 -> 0.00245: 1.1599x smaller

Timeline: b'http://tinyurl.com/8my8afl'

### chaos ###

Min: 0.313727 -> 0.367015: 1.17x slower

Avg: 0.317568 -> 0.371473: 1.17x slower

Significant (t=-26.72)

Stddev: 0.00962 -> 0.01053: 1.0942x larger

Timeline: b'http://tinyurl.com/9y2u6kh'

### django ###

Min: 0.798331 -> 0.855461: 1.07x slower

Avg: 0.801109 -> 0.860996: 1.07x slower

Significant (t=-87.43)

Stddev: 0.00336 -> 0.00348: 1.0356x larger

Timeline: b'http://tinyurl.com/9sf95pq'

### fannkuch ###

Min: 1.364705 -> 1.327680: 1.03x faster

Avg: 1.380412 -> 1.337467: 1.03x faster

Significant (t=10.48)

Stddev: 0.02056 -> 0.02040: 1.0077x smaller

Timeline: b'http://tinyurl.com/9r2vq6g'

### fastpickle ###

Min: 0.763479 -> 0.805715: 1.06x slower

Avg: 0.770036 -> 0.810855: 1.05x slower

Significant (t=-12.73)

Stddev: 0.01618 -> 0.01589: 1.0180x smaller

Timeline: b'http://tinyurl.com/9rvqo4s'

### fastunpickle ###

Min: 0.588694 -> 0.663616: 1.13x slower

Avg: 0.596622 -> 0.672418: 1.13x slower

Significant (t=-23.22)

Stddev: 0.01503 -> 0.01752: 1.1656x larger

Timeline: b'http://tinyurl.com/9eggn34'

### float ###

Min: 0.363234 -> 0.344408: 1.05x faster

Avg: 0.376159 -> 0.354165: 1.06x faster

Significant (t=8.76)

Stddev: 0.01282 -> 0.01227: 1.0455x smaller

Timeline: b'http://tinyurl.com/8d6rcb8'

### formatted_logging ###

Min: 0.330988 -> 0.400309: 1.21x slower

Avg: 0.335522 -> 0.408920: 1.22x slower

Significant (t=-33.48)

Stddev: 0.00989 -> 0.01194: 1.2076x larger

Timeline: b'http://tinyurl.com/9ll7dqk'

### genshi ###

Min: 0.229140 -> 0.251766: 1.10x slower

Avg: 0.232124 -> 0.257252: 1.11x slower

Significant (t=-40.24)

Stddev: 0.00516 -> 0.00564: 1.0925x larger

Timeline: b'http://tinyurl.com/9dpuuaw'

### go ###

Min: 0.632778 -> 0.710382: 1.12x slower

Avg: 0.636143 -> 0.716748: 1.13x slower

Significant (t=-37.61)

Stddev: 0.00186 -> 0.01504: 8.0815x larger

Timeline: b'http://tinyurl.com/8s7vw74'

### hexiom2 ###

Min: 150.982155 -> 154.702444: 1.02x slower

Avg: 151.194622 -> 154.780953: 1.02x slower

Significant (t=-15.83)

Stddev: 0.30047 -> 0.11103: 2.7063x smaller

Timeline: b'http://tinyurl.com/8rkkduv'

### iterative_count ###

Min: 0.117036 -> 0.156752: 1.34x slower

Avg: 0.120802 -> 0.172218: 1.43x slower

Significant (t=-34.92)

Stddev: 0.00542 -> 0.00889: 1.6422x larger

Timeline: b'http://tinyurl.com/9x9rtnk'

### json_dump_v2 ###

Min: 3.449868 -> 3.522645: 1.02x slower

Avg: 3.467124 -> 3.541902: 1.02x slower

Significant (t=-13.20)

Stddev: 0.02701 -> 0.02960: 1.0959x larger

Timeline: b'http://tinyurl.com/8bsz64a'

### json_load ###

Min: 0.981740 -> 0.567611: 1.73x faster

Avg: 0.986729 -> 0.572975: 1.72x faster

Significant (t=128.95)

Stddev: 0.01796 -> 0.01386: 1.2955x smaller

Timeline: b'http://tinyurl.com/93txokx'

### mako_v2 ###

Min: 0.083660 -> 0.243323: 2.91x slower

Avg: 0.084634 -> 0.247875: 2.93x slower

Significant (t=-821.55)

Stddev: 0.00193 -> 0.00400: 2.0737x larger

Timeline: b'http://tinyurl.com/98n9fab'

### meteor_contest ###

Min: 0.257992 -> 0.232116: 1.11x faster

Avg: 0.262581 -> 0.236684: 1.11x faster

Significant (t=14.31)

Stddev: 0.00916 -> 0.00894: 1.0243x smaller

Timeline: b'http://tinyurl.com/8tpjt43'

### nbody ###

Min: 0.375414 -> 0.293685: 1.28x faster

Avg: 0.379489 -> 0.299794: 1.27x faster

Significant (t=42.71)

Stddev: 0.00997 -> 0.00864: 1.1537x smaller

Timeline: b'http://tinyurl.com/96aqtod'

### normal_startup ###

Min: 0.360002 -> 0.593214: 1.65x slower

Avg: 0.386755 -> 0.600625: 1.55x slower

Significant (t=-134.28)

Stddev: 0.01055 -> 0.00395: 2.6704x smaller

Timeline: b'http://tinyurl.com/9td8pna'

### nqueens ###

Min: 0.300390 -> 0.363904: 1.21x slower

Avg: 0.304282 -> 0.368003: 1.21x slower

Significant (t=-37.41)

Stddev: 0.00813 -> 0.00888: 1.0920x larger

Timeline: b'http://tinyurl.com/9zxyfcu'

### pathlib ###

Min: 0.106088 -> 0.138693: 1.31x slower

Avg: 0.107279 -> 0.139885: 1.30x slower

Significant (t=-133.12)

Stddev: 0.00256 -> 0.00290: 1.1324x larger

Timeline: b'http://tinyurl.com/9llvj6a'

### pidigits ###

Min: 0.351666 -> 0.341745: 1.03x faster

Avg: 0.354743 -> 0.344146: 1.03x faster

Significant (t=5.89)

Stddev: 0.00965 -> 0.00829: 1.1643x smaller

Timeline: b'http://tinyurl.com/8bkgrv4'

### raytrace ###

Min: 1.547054 -> 1.641147: 1.06x slower

Avg: 1.552614 -> 1.643716: 1.06x slower

Significant (t=-286.42)

Stddev: 0.00190 -> 0.00120: 1.5920x smaller

Timeline: b'http://tinyurl.com/9bmnbsd'

### regex_compile ###

Min: 0.494022 -> 0.537924: 1.09x slower

Avg: 0.497904 -> 0.541971: 1.09x slower

Significant (t=-18.23)

Stddev: 0.01177 -> 0.01239: 1.0523x larger

Timeline: b'http://tinyurl.com/cvdhrrm'

### regex_effbot ###

Min: 0.065431 -> 0.073393: 1.12x slower

Avg: 0.069753 -> 0.077338: 1.11x slower

Significant (t=-10.61)

Stddev: 0.00361 -> 0.00354: 1.0179x smaller

Timeline: b'http://tinyurl.com/cupb89m'

### regex_v8 ###

Min: 0.071053 -> 0.081441: 1.15x slower

Avg: 0.075075 -> 0.086167: 1.15x slower

Significant (t=-12.44)

Stddev: 0.00359 -> 0.00518: 1.4455x larger

Timeline: b'http://tinyurl.com/d9ly6x3'

### simple_logging ###

Min: 0.325386 -> 0.395093: 1.21x slower

Avg: 0.330235 -> 0.399825: 1.21x slower

Significant (t=-34.22)

Stddev: 0.00952 -> 0.01077: 1.1317x larger

Timeline: b'http://tinyurl.com/8sbqv85'

### startup_nosite ###

Min: 0.082137 -> 0.453112: 5.52x slower

Avg: 0.129994 -> 0.459361: 3.53x slower

Significant (t=-276.85)

Stddev: 0.01114 -> 0.00419: 2.6585x smaller

Timeline: b'http://tinyurl.com/932mzal'

### telco ###

Min: 0.810000 -> 0.010000: 81.00x faster

Avg: 0.823600 -> 0.015200: 54.18x faster

Significant (t=284.37)

Stddev: 0.01946 -> 0.00505: 3.8556x smaller

### threaded_count ###

Min: 0.140653 -> 0.173500: 1.23x slower

Avg: 0.152514 -> 0.270779: 1.78x slower

Significant (t=-49.87)

Stddev: 0.00605 -> 0.01564: 2.5837x larger

Timeline: b'http://tinyurl.com/9w4u7el'

### unpack_sequence ###

Min: 0.000077 -> 0.000067: 1.15x faster

Avg: 0.000081 -> 0.000069: 1.18x faster

Significant (t=1163.57)

Stddev: 0.00000 -> 0.00000: 1.7412x larger

Timeline: b'http://tinyurl.com/8qdcjcr'

The following not significant results are hidden, use -v to show them:

html5lib, richards, silent_logging, spectral_norm.

Armin Rigo

unread,

Oct 27, 2012, 5:35:16 AM10/27/12

to Brett Cannon, python-dev

Hi Brett,

On Fri, Oct 26, 2012 at 9:14 PM, Brett Cannon <br...@python.org> wrote:
> Worst benchmark is nosite_startup, best is telco.

May I express doubts about telco? :-) It looks like the Python 3
version is simply not running:

> ### telco ###
> Min: 0.810000 -> 0.010000: 81.00x faster
> Avg: 0.823600 -> 0.015200: 54.18x faster

A bientôt,

Armin.
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Antoine Pitrou

unread,

Oct 27, 2012, 5:53:52 AM10/27/12

to pytho...@python.org

On Fri, 26 Oct 2012 15:14:08 -0400
Brett Cannon <br...@python.org> wrote:
>
> Worst benchmark is nosite_startup, best is telco. The benchmarks people
> might want to analyze (i.e. more than 20% slower in Python 3.3) are
> mako_v2, threaded_count, normal_startup, iterative_count, pathlib,
> formatted_logging, and simple_logging.

Well, did you check that mako_v2 wasn't subject to the Markupsafe
issue?

threaded_count and iterative_count are completely dumb.
Slower startup is due to the fact that Python 3 needs many more
modules to even start itself.

Regards

Antoine.

Maciej Fijalkowski

unread,

Oct 27, 2012, 6:12:28 AM10/27/12

to Armin Rigo, python-dev

> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

I think the original explanation was cDecimal vs decimal.

Stefan Krah

unread,

Oct 27, 2012, 8:33:41 AM10/27/12

to pytho...@python.org

Maciej Fijalkowski <fij...@gmail.com> wrote:
> On Sat, Oct 27, 2012 at 11:35 AM, Armin Rigo <ar...@tunes.org> wrote:
> > May I express doubts about telco? :-) It looks like the Python 3
> > version is simply not running:
> >
> >> ### telco ###
> >> Min: 0.810000 -> 0.010000: 81.00x faster
> >> Avg: 0.823600 -> 0.015200: 54.18x faster
>

> I think the original explanation was cDecimal vs decimal.

Yes, the magnitude of the speedup looks correct. In an isolated benchmark
with the large input file [1] I'm getting 30x speedup for telco.

Stefan Krah

[1] http://www.bytereef.org/mpdecimal/quickstart.html#telco-benchmark - expon180-1e6b.zip

Brett Cannon

unread,

Oct 27, 2012, 9:20:36 AM10/27/12

to Antoine Pitrou, pytho...@python.org

I did check that markup safe as not installed. It might just be mako doing something silly.

The threads tests are very synthetic.

And yes, there are more modules at startup. When was the last to,e we looked at them to make sure we weren't doing needless I ports?

Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org

Nick Coghlan

unread,

Oct 27, 2012, 11:22:20 AM10/27/12

to Brett Cannon, Antoine Pitrou, pytho...@python.org

On Sat, Oct 27, 2012 at 11:20 PM, Brett Cannon <bca...@gmail.com> wrote:
> I did check that markup safe as not installed. It might just be mako doing
> something silly.
>
> The threads tests are very synthetic.
>
> And yes, there are more modules at startup. When was the last to,e we looked
> at them to make sure we weren't doing needless I ports?

It's been quite a while.

>>> py3k - py27
set(['reprlib', 'heapq', '_collections', 'functools', '_bisect',
'copyreg', 'io', 'operator', '_heapq', '_io', '_thread',
'encodings.latin_1', 'collections', '_frozen_importlib',
'collections.abc', 'builtins', '_sysconfigdata', '_functools',
'keyword', '_imp', 'bisect', 'weakref', 'itertools', 'marshal'])

>>> py27 - py3k
set(['exceptions', 'copy_reg', 'warnings', 'UserDict', 'traceback',
'encodings.codecs', '__builtin__', 'linecache', '_abcoll',
'encodings.__builtin__', 'encodings.encodings', 'types'])

To check how many of those dependencies stemmed from collections, I
checked against the 2.7 version:

>>> py3k - py27_with_collections
set(['_functools', 'reprlib', '_thread', '_io', '_imp',
'_frozen_importlib', 'functools', 'weakref', 'collections.abc',
'encodings.latin_1', 'io', 'copyreg', 'builtins', 'marshal',
'_sysconfigdata'])

>>> py27_with_collections - py3k
set(['exceptions', 'copy_reg', 'thread', 'warnings', 'UserDict',
'traceback', 'encodings.codecs', '__builtin__', 'linecache',
'_abcoll', 'encodings.__builtin__', 'encodings.encodings', 'types'])

Implicitly bringing in _thread is a bit of a worry. Apparently 3.2 had
the same problem, though:

>>> py3k - py32
{'_imp', '_frozen_importlib', '_warnings', 'collections.abc',
'marshal', '_sysconfigdata'}

>>> py32 - py3k
{'_locale', 'locale', 'traceback', 'linecache', 'token', '_abcoll', 'tokenize'}

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
http://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: http://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Antoine Pitrou

unread,

Oct 27, 2012, 3:21:39 PM10/27/12

to pytho...@python.org

On Sat, 27 Oct 2012 09:20:36 -0400
Brett Cannon <bca...@gmail.com> wrote:
> I did check that markup safe as not installed. It might just be mako doing
> something silly.
>
> The threads tests are very synthetic.
>
> And yes, there are more modules at startup. When was the last to,e we
> looked at them to make sure we weren't doing needless I ports?

The last time was between 3.2 and 3.3. It will be hard to lower the
number of imported modules, given the current semantics (io, importlib,
unicode, site.py, sysconfig...). Python 2's view of the world was much
simpler (naïve?) in comparison.

It would be interesting to know *where* the module import time gets
spent, on a lower level. My gut feeling is that execution of Python
module code is the main contributor.

Regards

Antoine.
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
http://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: http://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Mark Shannon

unread,

Oct 27, 2012, 4:40:26 PM10/27/12

to pytho...@python.org

On 27/10/12 20:21, Antoine Pitrou wrote:
> On Sat, 27 Oct 2012 09:20:36 -0400
> Brett Cannon <bca...@gmail.com> wrote:
>> I did check that markup safe as not installed. It might just be mako doing
>> something silly.
>>
>> The threads tests are very synthetic.
>>
>> And yes, there are more modules at startup. When was the last to,e we
>> looked at them to make sure we weren't doing needless I ports?
>
> The last time was between 3.2 and 3.3. It will be hard to lower the
> number of imported modules, given the current semantics (io, importlib,
> unicode, site.py, sysconfig...). Python 2's view of the world was much
> simpler (naïve?) in comparison.
>
> It would be interesting to know *where* the module import time gets
> spent, on a lower level. My gut feeling is that execution of Python
> module code is the main contributor.

I suspect that stating and loading the .pyc files is responsible for
most of the overhead.
PyRun starts up quite a lot faster thanks to embedding all the modules
in the executable: http://www.egenix.com/products/python/PyRun/

Freezing all the core modules into the executable should reduce start up
time.

Cheers,
Mark

Tim Delaney

unread,

Oct 27, 2012, 4:53:42 PM10/27/12

to pytho...@python.org

On 28 October 2012 07:40, Mark Shannon <ma...@hotpy.org> wrote:

I suspect that stating and loading the .pyc files is responsible for most of the overhead.
PyRun starts up quite a lot faster thanks to embedding all the modules in the executable: http://www.egenix.com/products/python/PyRun/

Freezing all the core modules into the executable should reduce start up time.

That suggests a test to me that the Cython guys might be interested in (or may well have performed in the past). How much of the stdlib could be compiled with Cython and used during the startup process? How much of an effect would it have on startup times and these benchmarks if Cython-compiled extensions were used?

I'm thinking here of elimination of .pyc interpretation and execution (stat calls would be similar, probably slightly higher).

To be clear - I'm *not* suggesting Cython become part of the required build toolchain. But *if* the Cython-compiled extensions prove to be significantly faster I'm thinking maybe it could become a semi-supported option (e.g. a HOWTO with the caveat "it worked on this particular system").

Tim Delaney

mar...@v.loewis.de

unread,

Oct 27, 2012, 4:58:26 PM10/27/12

to pytho...@python.org

Zitat von Tim Delaney <timothy....@gmail.com>:

> To be clear - I'm *not* suggesting Cython become part of the required build
> toolchain. But *if* the Cython-compiled extensions prove to be
> significantly faster I'm thinking maybe it could become a semi-supported
> option (e.g. a HOWTO with the caveat "it worked on this particular system").

This should compare to zipping the standard library, which has been a
supported
configuration for a long time, and also avoids many stat calls.

Regards,
Martin

Antoine Pitrou

unread,

Oct 27, 2012, 4:59:32 PM10/27/12

to pytho...@python.org

On Sat, 27 Oct 2012 21:40:26 +0100
Mark Shannon <ma...@hotpy.org> wrote:
> On 27/10/12 20:21, Antoine Pitrou wrote:
> >
> > It would be interesting to know *where* the module import time gets
> > spent, on a lower level. My gut feeling is that execution of Python
> > module code is the main contributor.
>
> I suspect that stating and loading the .pyc files is responsible for
> most of the overhead.
> PyRun starts up quite a lot faster thanks to embedding all the modules
> in the executable: http://www.egenix.com/products/python/PyRun/

Any numbers?

Regards

Antoine.

Paul Moore

unread,

Oct 27, 2012, 5:07:11 PM10/27/12

to mar...@v.loewis.de, pytho...@python.org

On 27 October 2012 21:58, <mar...@v.loewis.de> wrote:
>
> Zitat von Tim Delaney <timothy....@gmail.com>:
>
>
>> To be clear - I'm *not* suggesting Cython become part of the required
>> build
>> toolchain. But *if* the Cython-compiled extensions prove to be
>> significantly faster I'm thinking maybe it could become a semi-supported
>> option (e.g. a HOWTO with the caveat "it worked on this particular
>> system").
>
>
> This should compare to zipping the standard library, which has been a
> supported
> configuration for a long time, and also avoids many stat calls.

Interestingly, I just did a quick test of this: This is on my Windows
7 PC, running under Powershell. D:\Apps\Python33 is a standard
installation, whereas D:\Dev\P33 has a zipped stdlib:

PS 22:02 D:\Data
>foreach ($i in 1..10) { measure-command { D:\Apps\Python33\python.exe -c "raise SystemExit" } | % { $_.TotalSeconds } }
0.0737877
0.1014695
0.0950326
0.0910734
0.0689548
0.084994
0.0772204
0.0958197
0.0696385
0.0806066
PS 22:03 D:\Data
>foreach ($i in 1..10) { measure-command { D:\Dev\P33\python.exe -c "raise SystemExit" } | % { $_.TotalSeconds } }
0.1922151
0.1879894
0.2455766
0.2842425
0.1937161
0.2168928
0.2441508
0.1860206
0.1866409
0.1897004

Looks like the normal configuration is over twice as fast as the zipped one...

Paul.

Mark Shannon

unread,

Oct 27, 2012, 5:11:01 PM10/27/12

to pytho...@python.org

On 27/10/12 21:59, Antoine Pitrou wrote:
> On Sat, 27 Oct 2012 21:40:26 +0100
> Mark Shannon <ma...@hotpy.org> wrote:
>> On 27/10/12 20:21, Antoine Pitrou wrote:
>>>
>>> It would be interesting to know *where* the module import time gets
>>> spent, on a lower level. My gut feeling is that execution of Python
>>> module code is the main contributor.
>>
>> I suspect that stating and loading the .pyc files is responsible for
>> most of the overhead.
>> PyRun starts up quite a lot faster thanks to embedding all the modules
>> in the executable: http://www.egenix.com/products/python/PyRun/
>
> Any numbers?

No numbers, but I did see this talk:
http://2012.pyconuk.net/Talks/PyRun
The abstract claims that PyRun "has a greatly improved startup time
compared to regular Python"

Cheers,
Mark

Antoine Pitrou

unread,

Oct 27, 2012, 5:25:34 PM10/27/12

to pytho...@python.org

On Sat, 27 Oct 2012 22:11:01 +0100

Mark Shannon <ma...@hotpy.org> wrote:
> On 27/10/12 21:59, Antoine Pitrou wrote:
> > On Sat, 27 Oct 2012 21:40:26 +0100
> > Mark Shannon <ma...@hotpy.org> wrote:
> >> On 27/10/12 20:21, Antoine Pitrou wrote:
> >>>
> >>> It would be interesting to know *where* the module import time gets
> >>> spent, on a lower level. My gut feeling is that execution of Python
> >>> module code is the main contributor.
> >>
> >> I suspect that stating and loading the .pyc files is responsible for
> >> most of the overhead.
> >> PyRun starts up quite a lot faster thanks to embedding all the modules
> >> in the executable: http://www.egenix.com/products/python/PyRun/
> >
> > Any numbers?
>
> No numbers, but I did see this talk:
> http://2012.pyconuk.net/Talks/PyRun
> The abstract claims that PyRun "has a greatly improved startup time
> compared to regular Python"

Sounds great ;-)

cheers

Antoine.

Brett Cannon

unread,

Oct 27, 2012, 6:06:05 PM10/27/12

to Mark Shannon, pytho...@python.org

On Sat, Oct 27, 2012 at 4:40 PM, Mark Shannon <ma...@hotpy.org> wrote:

On 27/10/12 20:21, Antoine Pitrou wrote:

On Sat, 27 Oct 2012 09:20:36 -0400
Brett Cannon <bca...@gmail.com> wrote:

I did check that markup safe as not installed. It might just be mako doing
something silly.

The threads tests are very synthetic.

And yes, there are more modules at startup. When was the last to,e we
looked at them to make sure we weren't doing needless I ports?

The last time was between 3.2 and 3.3. It will be hard to lower the
number of imported modules, given the current semantics (io, importlib,
unicode, site.py, sysconfig...). Python 2's view of the world was much
simpler (naïve?) in comparison.

It would be interesting to know *where* the module import time gets
spent, on a lower level. My gut feeling is that execution of Python
module code is the main contributor.

I suspect that stating and loading the .pyc files is responsible for most of the overhead.

I really doubt that as the amount of stat calls is significantly reduced in Python 3.3 compared to Python 3.2 (startup benchmarks show Python 3.3 is roughly 1.66x faster than 3.2 thanks to caching filenames in a directory). More modules means more work (e.g. I/O, executing the module, etc.).

The only way to lower stat call overhead is to simply not check if a directory's contents changed during startup by assuming Python itself will not write any new module files. Without benchmarking I don't know if it would make that much of a difference, though.

PyRun starts up quite a lot faster thanks to embedding all the modules in the executable: http://www.egenix.com/products/python/PyRun/

Freezing all the core modules into the executable should reduce start up time.

Sure, but working with a frozen module is a pain so it is not something to take lightly.

Brett Cannon

unread,

Oct 27, 2012, 6:07:48 PM10/27/12

to Paul Moore, mar...@v.loewis.de, pytho...@python.org

Are both debug builds (asking because of the path names)? CPython is now significantly slower in a debug build thanks to the overhead it adds to any Python code executing, which means importlib runs much slower.

Serhiy Storchaka

unread,

Oct 27, 2012, 6:16:20 PM10/27/12

to pytho...@python.org

On 28.10.12 00:07, Paul Moore wrote:
> Looks like the normal configuration is over twice as fast as the zipped one...

The normal configuration does 269 stats, but the zipped one does 12636
seeks.

Serhiy Storchaka

unread,

Oct 27, 2012, 6:39:42 PM10/27/12

to pytho...@python.org

On 28.10.12 01:06, Brett Cannon wrote:
> I really doubt that as the amount of stat calls is significantly reduced
> in Python 3.3 compared to Python 3.2 (startup benchmarks show Python 3.3
> is roughly 1.66x faster than 3.2 thanks to caching filenames in a
> directory).

$ strace ./python -c '' 2>&1 | grep -c stat

Python 2.7 - 161 stats
Python 3.2 - 555 stats
Python 3.3 - 243 stats

Antoine Pitrou

unread,

Oct 27, 2012, 7:00:48 PM10/27/12

to pytho...@python.org

On Sun, 28 Oct 2012 01:39:42 +0300
Serhiy Storchaka <stor...@gmail.com> wrote:

> On 28.10.12 01:06, Brett Cannon wrote:
> > I really doubt that as the amount of stat calls is significantly reduced
> > in Python 3.3 compared to Python 3.2 (startup benchmarks show Python 3.3
> > is roughly 1.66x faster than 3.2 thanks to caching filenames in a
> > directory).
>
> $ strace ./python -c '' 2>&1 | grep -c stat
>
> Python 2.7 - 161 stats
> Python 3.2 - 555 stats
> Python 3.3 - 243 stats

This will probably depend on the length of sys.path:

$ strace -e stat python2.7 -Sc "" 2>&1 | wc -l
35
$ strace -e stat python3.2 -Sc "" 2>&1 | wc -l
298
$ strace -e stat python3.3 -Sc "" 2>&1 | wc -l
106

$ strace -e stat python2.7 -c "" 2>&1 | wc -l
200
$ strace -e stat python3.2 -c "" 2>&1 | wc -l
726
$ strace -e stat python3.3 -c "" 2>&1 | wc -l
180

Regards

Antoine.

Gregory P. Smith

unread,

Oct 27, 2012, 11:38:58 PM10/27/12

to pytho...@python.org

One word: profile.

Looking at stat counts alone rather than measuring the total time spent in all types of system calls from strace and profiling is not really useful. ;)

Another thing to keep an eye out for within a startup profile: how often does the gc collect? our default gc collection thresholds haven't been tuned in ages afaik [or am i forgetting something] and I know of pathological cases at work where simply doing a gc.disable() before importing a bunch of modules (tons of generated protocol buffer code) and re-enabling it afterwards speeds up this application's startup way more significantly than seems healthy in 2.x... that could be related to the particulars of the protobuf module code though.

-gps

Unsubscribe: http://mail.python.org/mailman/options/python-dev/greg%40krypto.org

Stefan Behnel

unread,

Oct 28, 2012, 3:22:07 AM10/28/12

to pytho...@python.org

Tim Delaney, 27.10.2012 22:53:

> On 28 October 2012 07:40, Mark Shannon wrote:
>> I suspect that stating and loading the .pyc files is responsible for most
>> of the overhead.
>> PyRun starts up quite a lot faster thanks to embedding all the modules in

>> the executable: http://www.egenix.com/**products/python/PyRun/<http://www.egenix.com/products/python/PyRun/>

>>
>> Freezing all the core modules into the executable should reduce start up
>> time.
>
> That suggests a test to me that the Cython guys might be interested in (or
> may well have performed in the past). How much of the stdlib could be
> compiled with Cython and used during the startup process?

We have a Jenkins job set up to run the CPython test suite with a compiled
stdlib:

https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr-stdlib/

Basically, we use pyximport as an import hook that tries to compile Python
modules on import and then imports the shared library if it worked or the
original Python module if it failed. A solution that explicitly runs over
the stdlib and compiles it would be substantially cleaner and more stable.

I don't have numbers for Py3.4 because we currently have a hard crash in
one of the tests on that platform when compiling recursively on import
(likely meaning that one of the stdlib modules and/or tests would have to
be excluded from compilation), but I get 434 automatically compiled stdlib
modules for the latest Py2.7 branch out of 744 (excluding the test suite).
And Py3.x code tends to pass as least as well through the compiler, often
better.

Note that quite a number of modules are excluded accidentally because they
are already imported as Python modules when Cython starts working.
Compiling them explicitly would remove that limitation, maybe adding
another (wild guess) 50 modules or so. Another few are not being compiled
because the test module that uses them fails to compile. So missing shared
libraries are not always due to failures to compile that particular Python
module.

I didn't pay much attention to this part of our integration tests so far -
a bit of debugging should get the Py3.4 build working.

> How much of an
> effect would it have on startup times and these benchmarks if
> Cython-compiled extensions were used?

Depends on what and how much code you use. If you compile everything into
one big module that "imports" all of the stdlib when it gets loaded, you'd
likely loose a lot of time because it would take a while to initialise all
that useless code on startup. If you keep it separate, it would likely be a
lot faster because you avoid the interpreter for most of the module startup.

Most Python code runs about 30% faster when compiled, some faster, some
slower. If you want better numbers, you can start optimising the code by
giving Cython static type hints. I did that for difflib a while ago, for
example. Changing two methods made it some 50% faster back then:

http://blog.behnel.de/index.php?p=155

That particular module should compile without changes these days, and you
can provide the type hints externally, i.e. without modifying the Python
code itself.

> I'm thinking here of elimination of .pyc interpretation and execution (stat
> calls would be similar, probably slightly higher).

CPython checks for .so files before looking for .py files and imports are
absolute by default in Py3, so there should be a slight reduction in stat
calls. The net result then obviously also depends on how fast your shared
library loader and linker is, etc., but I doubt that that path is any
slower than loading and running a .pyc file.

BTW, you'd still get nice stack traces for compiled modules as long as your
.py files lie right next to your .so files.

> To be clear - I'm *not* suggesting Cython become part of the required build
> toolchain. But *if* the Cython-compiled extensions prove to be
> significantly faster I'm thinking maybe it could become a semi-supported
> option (e.g. a HOWTO with the caveat "it worked on this particular system").

Sounds reasonable.

Stefan

Stefan Behnel

unread,

Oct 28, 2012, 3:37:19 AM10/28/12

to pytho...@python.org

Stefan Behnel, 28.10.2012 08:22:

> Tim Delaney, 27.10.2012 22:53:

>> How much of an effect would it have on startup times and these benchmarks if
>> Cython-compiled extensions were used?
>
> Depends on what and how much code you use. If you compile everything into
> one big module that "imports" all of the stdlib when it gets loaded, you'd
> likely loose a lot of time because it would take a while to initialise all
> that useless code on startup. If you keep it separate, it would likely be a
> lot faster because you avoid the interpreter for most of the module startup.
>
> Most Python code runs about 30% faster when compiled, some faster, some
> slower.

Some more unoptimised pure-Python benchmarks, just in case:

2.7:

https://sage.math.washington.edu:8091/hudson/job/cython-devel-pybenchmarks-py27/lastSuccessfulBuild/artifact/bench_chart.html

3.3:

https://sage.math.washington.edu:8091/hudson/job/cython-devel-pybenchmarks-py3k/lastSuccessfulBuild/artifact/bench_chart.html

Note that the 3.3 benchmarks are not entirely up to date, the last
successful run was a month ago (likely due to the branch into 3.4 which we
use since then). Didn't have time to fix them yet.

Note also that the variations are pretty high from run to run as the
machine that executes them is not a dedicated benchmark server.

Antoine Pitrou

unread,

Oct 28, 2012, 7:11:10 AM10/28/12

to pytho...@python.org

On Sat, 27 Oct 2012 20:38:58 -0700
"Gregory P. Smith" <gr...@krypto.org> wrote:
> One word: profile.
>
> Looking at stat counts alone rather than measuring the total time spent in
> all types of system calls from strace and profiling is not really useful. ;)

Agreed, but I can't seem to cope properly with gprof. Any suggestion?

> Another thing to keep an eye out for within a startup profile: how often
> does the gc collect? our default gc collection thresholds haven't been
> tuned in ages afaik [or am i forgetting something] and I know of
> pathological cases at work where simply doing a gc.disable() before
> importing a bunch of modules (tons of generated protocol buffer code) and
> re-enabling it afterwards speeds up this application's startup way more
> significantly than seems healthy in 2.x... that could be related to the
> particulars of the protobuf module code though.

That's a good suggestion indeed.

Thanks

Antoine.

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
http://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: http://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Hynek Schlawack

unread,

Oct 28, 2012, 9:25:40 AM10/28/12

to pytho...@python.org

Am 28.10.2012 um 12:11 schrieb Antoine Pitrou <soli...@pitrou.net>:

>> One word: profile.
>>
>> Looking at stat counts alone rather than measuring the total time spent in
>> all types of system calls from strace and profiling is not really useful. ;)
> Agreed, but I can't seem to cope properly with gprof. Any suggestion?

http://oprofile.sourceforge.net/news/
http://valgrind.org/docs/manual/cl-manual.html

Are both useful. gprof is virtually useless.

Antoine Pitrou

unread,

Oct 28, 2012, 1:38:22 PM10/28/12

to pytho...@python.org

On Sat, 27 Oct 2012 20:38:58 -0700
"Gregory P. Smith" <gr...@krypto.org> wrote:
>

> Another thing to keep an eye out for within a startup profile: how often
> does the gc collect? our default gc collection thresholds haven't been
> tuned in ages afaik [or am i forgetting something] and I know of
> pathological cases at work where simply doing a gc.disable() before
> importing a bunch of modules (tons of generated protocol buffer code) and
> re-enabling it afterwards speeds up this application's startup way more
> significantly than seems healthy in 2.x... that could be related to the
> particulars of the protobuf module code though.

http://bugs.python.org/issue16351 shows us that the number of
collections at 3.4 startup is tiny:

$ ./python -Sc "import gc; print(gc.get_stats())"
[{'collections': 6, 'uncollectable': 0, 'collected': 0},
{'collections': 0, 'uncollectable': 0, 'collected': 0},
{'collections': 0, 'uncollectable': 0, 'collected': 0}]

$ ./python -c "import gc; print(gc.get_stats())"
[{'collected': 0, 'uncollectable': 0, 'collections': 12},
{'collected': 0, 'uncollectable': 0, 'collections': 1},
{'collected': 0, 'uncollectable': 0, 'collections': 0}]

Notably, there are no full collections.

Regards

Antoine.

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
http://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: http://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Tim Delaney

unread,

Oct 28, 2012, 3:48:16 PM10/28/12

to pytho...@python.org

On 28 October 2012 18:22, Stefan Behnel <stef...@behnel.de> wrote:

How much of an
effect would it have on startup times and these benchmarks if
Cython-compiled extensions were used?

Depends on what and how much code you use. If you compile everything into one big module that "imports" all of the stdlib when it gets loaded, you'd likely loose a lot of time because it would take a while to initialise all that useless code on startup. If you keep it separate, it would likely be a lot faster because you avoid the interpreter for most of the module startup.

I was specifically thinking in terms of the tests Brett ran (that was the full set on speed.python.org, wasn't it?), and having each stdlib module be its own extension i.e. no big import module. A literal 1:1 replacement where possible.

I'm thinking here of elimination of .pyc interpretation and execution (stat
calls would be similar, probably slightly higher).

CPython checks for .so files before looking for .py files and imports are absolute by default in Py3, so there should be a slight reduction in stat calls. The net result then obviously also depends on how fast your shared library loader and linker is, etc., but I doubt that that path is any slower than loading and running a .pyc file.

D'oh. I knew that and still got it backwards.

To be clear - I'm *not* suggesting Cython become part of the required build
toolchain. But *if* the Cython-compiled extensions prove to be
significantly faster I'm thinking maybe it could become a semi-supported
option (e.g. a HOWTO with the caveat "it worked on this particular system").

Sounds reasonable.

I think a stdlib compile script + pre-packaged hints for the 3.3 release would likely help both 3.3 and Cython acceptance.

Putting aside my development interest and looking at it purely from the PoV of a Python *user*, I'd really like to see Cython on speed.python.org eventually (in two modes - one without hints as a baseline and one with hints). Of course the ideal situation would be to have every implementation of Python 3.3 that is capable of running on the hardware contributing numbers e.g. if/when Jython achieves 3.3 compatibility I'd love to see numbers for it.

Tim Delaney

Catalin Iacob

unread,

Oct 28, 2012, 4:53:33 PM10/28/12

to pytho...@python.org

On Sat, Oct 27, 2012 at 11:07 PM, Paul Moore <p.f....@gmail.com> wrote:
> Interestingly, I just did a quick test of this: This is on my Windows
> 7 PC, running under Powershell.

snip

> Looks like the normal configuration is over twice as fast as the zipped one...

This result is influenced by zipimport fseek-ing for every file in the
imported zip and fseek flushing buffers in Microsoft's CRT
implementation. There's a patch which avoids the seek in
http://bugs.python.org/issue8745. Reviews welcome!

With that patch the time taken to import is half of the current state
of things so according to your test that would make zipped and non
zipped configurations roughly equally fast.

Brett Cannon

unread,

Oct 29, 2012, 9:01:38 AM10/29/12

to Tim Delaney, pytho...@python.org

On Sun, Oct 28, 2012 at 3:48 PM, Tim Delaney <timothy....@gmail.com> wrote:

On 28 October 2012 18:22, Stefan Behnel <stef...@behnel.de> wrote:

How much of an
effect would it have on startup times and these benchmarks if
Cython-compiled extensions were used?

Depends on what and how much code you use. If you compile everything into one big module that "imports" all of the stdlib when it gets loaded, you'd likely loose a lot of time because it would take a while to initialise all that useless code on startup. If you keep it separate, it would likely be a lot faster because you avoid the interpreter for most of the module startup.

I was specifically thinking in terms of the tests Brett ran (that was the full set on speed.python.org, wasn't it?),

It's not the full set as not all of them can be run on Python 3, but it is as many as can be run.

-Brett

Brett Cannon

unread,

Oct 29, 2012, 9:56:57 AM10/29/12

to python-dev

To see if the bad iterative_count and threaded_count results were consistently bad, I ran the benchmark suite on my MacBook Pro to see how "reliable" the benchmarks were. The output is below.

Basically 6 benchmarks (regex_effbot, queens, startup_nosite, iterative_count, threaded_count, and telco) had a variance of more than 15% performance between my 2 computers, although queens, iterative_count, and threaded_count were the only ones that swung between neutral/good to bad depending on the machine (the rest either want from bad to very bad, or very good to more very good).

And before Antoine asks, I added a ``sys.modules['markupsafe'] = None` line to the mako_v2 benchmark locally. =) Still need to either explicitly block it or emit a warning in the code in the repo.

#########################################

Report on Darwin Darwin Kernel Version 12.2.0: Sat Aug 25 00:48:52 PDT 2012; root:xnu-2050.18.24~1/RELEASE_X86_64 x86_64 i386

Total CPU cores: 8

### 2to3 ###

10.321463 -> 9.525119: 1.08x faster

### call_method ###

Min: 0.466812 -> 0.417812: 1.12x faster

Avg: 0.483324 -> 0.427158: 1.13x faster

Significant (t=28.77)

Stddev: 0.01876 -> 0.01483: 1.2644x smaller

Timeline: b'http://tinyurl.com/8al5lmm'

### call_method_slots ###

Min: 0.484923 -> 0.409452: 1.18x faster

Avg: 0.487877 -> 0.413054: 1.18x faster

Significant (t=131.11)

Stddev: 0.00395 -> 0.00577: 1.4589x larger

Timeline: b'http://tinyurl.com/9zhpg6z'

### call_method_unknown ###

Min: 0.547050 -> 0.406866: 1.34x faster

Avg: 0.550721 -> 0.409359: 1.35x faster

Significant (t=328.32)

Stddev: 0.00415 -> 0.00325: 1.2795x smaller

Timeline: b'http://tinyurl.com/9wxoddz'

### call_simple ###

Min: 0.391213 -> 0.332055: 1.18x faster

Avg: 0.393563 -> 0.335362: 1.17x faster

Significant (t=127.15)

Stddev: 0.00363 -> 0.00427: 1.1764x larger

Timeline: b'http://tinyurl.com/8mmepzw'

### chameleon ###

Min: 0.078505 -> 0.070175: 1.12x faster

Avg: 0.083754 -> 0.071500: 1.17x faster

Significant (t=2.95)

Stddev: 0.05086 -> 0.00119: 42.8425x smaller

Timeline: b'http://tinyurl.com/8bz9hpl'

### chaos ###

Min: 0.353739 -> 0.423587: 1.20x slower

Avg: 0.356297 -> 0.428197: 1.20x slower

Significant (t=-108.44)

Stddev: 0.00200 -> 0.00424: 2.1147x larger

Timeline: b'http://tinyurl.com/98e56le'

### django ###

Min: 0.824149 -> 0.862750: 1.05x slower

Avg: 0.831614 -> 0.869112: 1.05x slower

Significant (t=-21.47)

Stddev: 0.01020 -> 0.00697: 1.4634x smaller

Timeline: b'http://tinyurl.com/8kz8owv'

### fannkuch ###

Min: 1.776913 -> 1.832973: 1.03x slower

Avg: 1.793116 -> 1.915348: 1.07x slower

Significant (t=-11.57)

Stddev: 0.01436 -> 0.07329: 5.1030x larger

Timeline: b'http://tinyurl.com/9ptae4z'

### fastpickle ###

Min: 0.810968 -> 0.739322: 1.10x faster

Avg: 0.818099 -> 0.745148: 1.10x faster

Significant (t=58.02)

Stddev: 0.00577 -> 0.00677: 1.1731x larger

Timeline: b'http://tinyurl.com/8l769dd'

### fastunpickle ###

Min: 0.644198 -> 0.659345: 1.02x slower

Avg: 0.647976 -> 0.666154: 1.03x slower

Significant (t=-18.96)

Stddev: 0.00343 -> 0.00584: 1.7020x larger

Timeline: b'http://tinyurl.com/93xn7el'

### float ###

Min: 0.420888 -> 0.363410: 1.16x faster

Avg: 0.432285 -> 0.376179: 1.15x faster

Significant (t=38.14)

Stddev: 0.00762 -> 0.00708: 1.0766x smaller

Timeline: b'http://tinyurl.com/8bjwka9'

### formatted_logging ###

Min: 0.325707 -> 0.413196: 1.27x slower

Avg: 0.329846 -> 0.418099: 1.27x slower

Significant (t=-119.89)

Stddev: 0.00397 -> 0.00337: 1.1787x smaller

Timeline: b'http://tinyurl.com/8ktbs49'

### genshi ###

Min: 0.254604 -> 0.269696: 1.06x slower

Avg: 0.258585 -> 0.275615: 1.07x slower

Significant (t=-33.39)

Stddev: 0.00283 -> 0.00557: 1.9704x larger

Timeline: b'http://tinyurl.com/8bqvcwl'

### go ###

Min: 0.676453 -> 0.745504: 1.10x slower

Avg: 0.681833 -> 0.752170: 1.10x slower

Significant (t=-48.67)

Stddev: 0.00520 -> 0.00880: 1.6917x larger

Timeline: b'http://tinyurl.com/9d6qj3y'

### hexiom2 ###

Min: 186.378727 -> 172.939507: 1.08x faster

Avg: 186.679821 -> 173.103242: 1.08x faster

Significant (t=39.61)

Stddev: 0.42581 -> 0.23156: 1.8389x smaller

Timeline: b'http://tinyurl.com/9mc3pmg'

### html5lib ###

Min: 11.827770 -> 11.239556: 1.05x faster

Avg: 11.858253 -> 11.370960: 1.04x faster

Significant (t=6.93)

Stddev: 0.02825 -> 0.15466: 5.4746x larger

Timeline: b'http://tinyurl.com/8vl952y'

### iterative_count ###

Min: 0.168182 -> 0.154105: 1.09x faster

Avg: 0.169512 -> 0.155952: 1.09x faster

Significant (t=50.77)

Stddev: 0.00139 -> 0.00128: 1.0899x smaller

Timeline: b'http://tinyurl.com/9eymjtf'

### json_dump_v2 ###

Min: 3.350528 -> 3.795307: 1.13x slower

Avg: 3.369661 -> 3.825400: 1.14x slower

Significant (t=-125.93)

Stddev: 0.01470 -> 0.02095: 1.4250x larger

Timeline: b'http://tinyurl.com/8wyn9qa'

### json_load ###

Min: 0.999717 -> 0.607549: 1.65x faster

Avg: 1.007319 -> 0.613016: 1.64x faster

Significant (t=289.24)

Stddev: 0.00673 -> 0.00690: 1.0240x larger

Timeline: b'http://tinyurl.com/8qxakdw'

### mako_v2 ###

Min: 0.094817 -> 0.279593: 2.95x slower

Avg: 0.096962 -> 0.286479: 2.95x slower

Significant (t=-866.63)

Stddev: 0.00182 -> 0.00454: 2.4945x larger

Timeline: b'http://tinyurl.com/9lufgwz'

### meteor_contest ###

Min: 0.276138 -> 0.243228: 1.14x faster

Avg: 0.279559 -> 0.246018: 1.14x faster

Significant (t=72.30)

Stddev: 0.00298 -> 0.00136: 2.1943x smaller

Timeline: b'http://tinyurl.com/8pj9dnc'

### nbody ###

Min: 0.421698 -> 0.320496: 1.32x faster

Avg: 0.425878 -> 0.323483: 1.32x faster

Significant (t=158.15)

Stddev: 0.00386 -> 0.00247: 1.5638x smaller

Timeline: b'http://tinyurl.com/9fy8dfg'

### normal_startup ###

Min: 0.612120 -> 0.876470: 1.43x slower

Avg: 0.618945 -> 0.885492: 1.43x slower

Significant (t=-280.36)

Stddev: 0.00422 -> 0.00523: 1.2397x larger

Timeline: b'http://tinyurl.com/98ap93d'

### nqueens ###

Min: 0.402125 -> 0.410580: 1.02x slower

Avg: 0.406403 -> 0.414676: 1.02x slower

Significant (t=-12.06)

Stddev: 0.00442 -> 0.00199: 2.2189x smaller

Timeline: b'http://tinyurl.com/8wd3lez'

### pathlib ###

Min: 0.132423 -> 0.164525: 1.24x slower

Avg: 0.136298 -> 0.168843: 1.24x slower

Significant (t=-49.05)

Stddev: 0.00763 -> 0.00720: 1.0586x smaller

Timeline: b'http://tinyurl.com/9o86dc5'

### pidigits ###

Min: 0.387690 -> 0.367871: 1.05x faster

Avg: 0.391308 -> 0.371194: 1.05x faster

Significant (t=32.69)

Stddev: 0.00369 -> 0.00230: 1.6066x smaller

Timeline: b'http://tinyurl.com/9med7ko'

### raytrace ###

Min: 1.650066 -> 1.808829: 1.10x slower

Avg: 1.660110 -> 1.832654: 1.10x slower

Significant (t=-25.26)

Stddev: 0.01165 -> 0.04687: 4.0224x larger

Timeline: b'http://tinyurl.com/8fmyhex'

### regex_compile ###

Min: 0.559449 -> 0.571906: 1.02x slower

Avg: 0.563738 -> 0.580054: 1.03x slower

Significant (t=-8.38)

Stddev: 0.00434 -> 0.01306: 3.0087x larger

Timeline: b'http://tinyurl.com/8g6xcmd'

### regex_effbot ###

Min: 0.074999 -> 0.097456: 1.30x slower

Avg: 0.076343 -> 0.099435: 1.30x slower

Significant (t=-39.79)

Stddev: 0.00147 -> 0.00383: 2.5994x larger

Timeline: b'http://tinyurl.com/9vfaeux'

### regex_v8 ###

Min: 0.087433 -> 0.104053: 1.19x slower

Avg: 0.088804 -> 0.105520: 1.19x slower

Significant (t=-39.48)

Stddev: 0.00115 -> 0.00277: 2.4122x larger

Timeline: b'http://tinyurl.com/8un7vfr'

### richards ###

Min: 0.247208 -> 0.222483: 1.11x faster

Avg: 0.251661 -> 0.225276: 1.12x faster

Significant (t=44.04)

Stddev: 0.00392 -> 0.00161: 2.4275x smaller

Timeline: b'http://tinyurl.com/8b2zv34'

### silent_logging ###

Min: 0.099170 -> 0.095099: 1.04x faster

Avg: 0.099713 -> 0.095892: 1.04x faster

Significant (t=33.32)

Stddev: 0.00045 -> 0.00068: 1.5062x larger

Timeline: b'http://tinyurl.com/9arurw6'

### simple_logging ###

Min: 0.316639 -> 0.392833: 1.24x slower

Avg: 0.320059 -> 0.396853: 1.24x slower

Significant (t=-120.31)

Stddev: 0.00224 -> 0.00392: 1.7450x larger

Timeline: b'http://tinyurl.com/95bfxu7'

### spectral_norm ###

Min: 0.434691 -> 0.379294: 1.15x faster

Avg: 0.437958 -> 0.383761: 1.14x faster

Significant (t=67.75)

Stddev: 0.00410 -> 0.00390: 1.0502x smaller

Timeline: b'http://tinyurl.com/98s9c56'

### startup_nosite ###

Min: 0.209685 -> 0.660867: 3.15x slower

Avg: 0.218654 -> 0.673249: 3.08x slower

Significant (t=-458.50)

Stddev: 0.00646 -> 0.00752: 1.1645x larger

Timeline: b'http://tinyurl.com/9zyerhn'

### telco ###

Min: 0.840453 -> 0.018312: 45.90x faster

Avg: 0.844250 -> 0.019255: 43.85x faster

Significant (t=1088.45)

Stddev: 0.00521 -> 0.00127: 4.0959x smaller

Timeline: b'http://tinyurl.com/924mje7'

### threaded_count ###

Min: 0.197525 -> 0.151649: 1.30x faster

Avg: 0.213657 -> 0.153572: 1.39x faster

Significant (t=52.58)

Stddev: 0.00779 -> 0.00214: 3.6451x smaller

Timeline: b'http://tinyurl.com/8mrrqla'

### unpack_sequence ###

Min: 0.000060 -> 0.000052: 1.16x faster

Avg: 0.000088 -> 0.000069: 1.29x faster

Significant (t=1118.61)

Stddev: 0.00000 -> 0.00000: 1.0022x larger

Timeline: b'http://tinyurl.com/9ejrega'

Antoine Pitrou

unread,

Oct 29, 2012, 3:22:34 PM10/29/12

to pytho...@python.org

On Mon, 29 Oct 2012 09:56:57 -0400
Brett Cannon <br...@python.org> wrote:

> To see if the bad iterative_count and threaded_count results were
> consistently bad, I ran the benchmark suite on my MacBook Pro to see how
> "reliable" the benchmarks were. The output is below.
>
> Basically 6 benchmarks (regex_effbot, queens, startup_nosite,
> iterative_count, threaded_count, and telco) had a variance of more than 15%
> performance between my 2 computers, although queens, iterative_count, and
> threaded_count were the only ones that swung between neutral/good to bad
> depending on the machine (the rest either want from bad to very bad, or
> very good to more very good).

This is using different compilers on the 2 computers, right?

Regards

Antoine.

Brett Cannon

unread,

Oct 29, 2012, 4:01:18 PM10/29/12

to Antoine Pitrou, pytho...@python.org

On Mon, Oct 29, 2012 at 3:22 PM, Antoine Pitrou <soli...@pitrou.net> wrote:

On Mon, 29 Oct 2012 09:56:57 -0400
Brett Cannon <br...@python.org> wrote:

> To see if the bad iterative_count and threaded_count results were
> consistently bad, I ran the benchmark suite on my MacBook Pro to see how
> "reliable" the benchmarks were. The output is below.
>
> Basically 6 benchmarks (regex_effbot, queens, startup_nosite,
> iterative_count, threaded_count, and telco) had a variance of more than 15%
> performance between my 2 computers, although queens, iterative_count, and
> threaded_count were the only ones that swung between neutral/good to bad
> depending on the machine (the rest either want from bad to very bad, or
> very good to more very good).

This is using different compilers on the 2 computers, right?

Yes: gcc 4.6.3 on Linux and Clang 3.1 on OS X.

Stefan Behnel

unread,

Oct 30, 2012, 2:47:19 AM10/30/12

to pytho...@python.org

Tim Delaney, 28.10.2012 20:48:

> On 28 October 2012 18:22, Stefan Behnel wrote:
>>> How much of an effect would it have on startup times and these
>>> benchmarks if Cython-compiled extensions were used?
>>
>> Depends on what and how much code you use. If you compile everything into
>> one big module that "imports" all of the stdlib when it gets loaded, you'd
>> likely loose a lot of time because it would take a while to initialise all
>> that useless code on startup. If you keep it separate, it would likely be a
>> lot faster because you avoid the interpreter for most of the module startup.
>
> I was specifically thinking in terms of the tests Brett ran (that was the
> full set on speed.python.org, wasn't it?), and having each stdlib module be
> its own extension i.e. no big import module. A literal 1:1 replacement
> where possible.

There's also an intermediate solution of linking the top-N modules into the
interpreter core and leaving the rest outside, but I'd rather go for the
straight forward approach of having separate libs first.

Compiling all that can be compiled is easy enough. I fixed up a couple of
things in Cython (so you need the latest github master) and then ran this
setup.py script from the Lib directory with "build_ext -i":

"""
from distutils.core import setup
from Cython.Build import cythonize
from Cython.Compiler import Options

# improve Python compatibility by allowing some broken code
Options.error_on_unknown_names = False

import sys

setup(
name = 'stuff',
ext_modules = cythonize(
["**/*.py"],
exclude=['**/test/**/*.py', '**/tests/**/*.py',
'**/__init__.py',
'idlelib/MultiCall.py'],
exclude_failures=True,
language_level=sys.version_info[0],
compiler_directives=dict(auto_cpdef=True)
),
)
"""

Note that the extra compiler option above disables fatal compile errors on
unknown (usually mistyped) names of which Cython hits a couple in the
stdlib. pylint should find them as well, they're worth fixing.

The directive at the end enables automatic module internal C calls which
usually gives a major speed-up by allowing the C compiler to see what happens.

With the above setup, Cython compiles 612 out of 620 Python modules for me,
excluding test modules and __init__.py files. The rest fails to compile due
to either compiler bugs or statically detected bugs in the Python code.
I'll look through them when I find a bit of time.

One major problem I ran into is that the new importlib bootstrap module
crashes with a RuntimeError("maximum recursion depth exceeded while calling
a Python object)" when it hits compiled modules with import cycles (e.g.
shutil and tarfile, or os and posixpath). I guess that's the kind of corner
case you get when working code gets rewritten. Worth giving Py3.2 a try in
comparison.

>>> To be clear - I'm *not* suggesting Cython become part of the required build
>>> toolchain. But *if* the Cython-compiled extensions prove to be
>>> significantly faster I'm thinking maybe it could become a semi-supported
>>> option (e.g. a HOWTO with the caveat "it worked on this particular
>>> system").
>>
>> Sounds reasonable.
>
> I think a stdlib compile script

... see above ...

> + pre-packaged hints for the 3.3 release
> would likely help both 3.3 and Cython acceptance.

That would certainly be a cool feature. This can often be as easy as
putting a .pxd file next to the .py file that overrides the declarations of
functions and classes with static types.

> Putting aside my development interest and looking at it purely from the PoV
> of a Python *user*, I'd really like to see Cython on
> speed.python.org eventually (in two modes - one without hints as a
> baseline and one with
> hints).

I think the above setup.py script, with appropriately adapted glob
patterns, should do that trick well enough for now. Certainly better and
simpler than my initial pyximport configuration. With the obvious caveat
that it takes a bit longer to compile everything, not just the modules that
are actually used. But that's only an install time issue.

Philip Jenvey

unread,

Nov 2, 2012, 2:16:25 PM11/2/12

to Brett Cannon, python-dev

On Oct 26, 2012, at 12:14 PM, Brett Cannon wrote:

>
> Worst benchmark is nosite_startup, best is telco. The benchmarks people might want to analyze (i.e. more than 20% slower in Python 3.3) are mako_v2, threaded_count, normal_startup, iterative_count, pathlib, formatted_logging, and simple_logging.

>
> ### mako_v2 ###
> Min: 0.083660 -> 0.243323: 2.91x slower
> Avg: 0.084634 -> 0.247875: 2.93x slower
> Significant (t=-821.55)
> Stddev: 0.00193 -> 0.00400: 2.0737x larger
> Timeline: b'http://tinyurl.com/98n9fab'

So Mike Bayer and I narrowed down mako_v2's slowness to use of an inline re

This:

http://www.makotemplates.org/trac/changeset/c1468b12f115ac9e469150ce24ea042aeae5e270

brings it down to around:

### mako_v2 ###
Min: 0.087608 -> 0.066748: 1.31x faster
Avg: 0.091348 -> 0.071224: 1.28x faster
Significant (t=26.10)
Stddev: 0.00312 -> 0.00447: 1.4340x larger
Timeline: http://tinyurl.com/as2zedo

The culprit is the lru_cache on re._compile_typed. Notice functools' numbers from the profiler:

http://paste.ofcode.org/yZRKnJfTsHesFR8hMWfc7f

Mike also noticed that the mako fix above does nothing to 2.7's numbers.

--
Philip Jenvey

Brett Cannon

unread,

Nov 2, 2012, 2:42:55 PM11/2/12

to Philip Jenvey, python-dev

Issue filed for the performance issue: http://bugs.python.org/issue16390

With that change and running on tip of Mako on my laptop now reports 1.25x slower which is much better than it was. This performance issue might also explain why all of the regex compilation benchmarks are worse under Python 3.3 by a decent margin.

On Fri, Nov 2, 2012 at 2:16 PM, Philip Jenvey <pje...@underboss.org> wrote:

lru_cache on re._compile_typed

Maciej Fijalkowski

unread,

Nov 3, 2012, 10:48:25 AM11/3/12

to Brett Cannon, python-dev

I would like to warn you about modifying benchmarks like this (or
frameworks). Why is it relevant anyway?

Brett Cannon

unread,

Nov 3, 2012, 12:29:18 PM11/3/12

to Maciej Fijalkowski, python-dev

I'm not modifying any benchmark or framework. At best I will replace Mako 0.7.2 with Mako 0.7.3 in the benchmark suite since no one is historically recording the mako_v2 benchmark yet and it should be running with the newest version until we set it in stone.

Reply all

Reply to author

Forward