I think the original explanation was cDecimal vs decimal.
I did check that markup safe as not installed. It might just be mako doing something silly.
The threads tests are very synthetic.
And yes, there are more modules at startup. When was the last to,e we looked at them to make sure we weren't doing needless I ports?
I suspect that stating and loading the .pyc files is responsible for most of the overhead.
PyRun starts up quite a lot faster thanks to embedding all the modules in the executable: http://www.egenix.com/products/python/PyRun/
Freezing all the core modules into the executable should reduce start up time.
On 27/10/12 20:21, Antoine Pitrou wrote:I suspect that stating and loading the .pyc files is responsible for most of the overhead.
On Sat, 27 Oct 2012 09:20:36 -0400
Brett Cannon <bca...@gmail.com> wrote:
I did check that markup safe as not installed. It might just be mako doing
something silly.
The threads tests are very synthetic.
And yes, there are more modules at startup. When was the last to,e we
looked at them to make sure we weren't doing needless I ports?
The last time was between 3.2 and 3.3. It will be hard to lower the
number of imported modules, given the current semantics (io, importlib,
unicode, site.py, sysconfig...). Python 2's view of the world was much
simpler (naïve?) in comparison.
It would be interesting to know *where* the module import time gets
spent, on a lower level. My gut feeling is that execution of Python
module code is the main contributor.
PyRun starts up quite a lot faster thanks to embedding all the modules in the executable: http://www.egenix.com/products/python/PyRun/
Freezing all the core modules into the executable should reduce start up time.
How much of anDepends on what and how much code you use. If you compile everything into one big module that "imports" all of the stdlib when it gets loaded, you'd likely loose a lot of time because it would take a while to initialise all that useless code on startup. If you keep it separate, it would likely be a lot faster because you avoid the interpreter for most of the module startup.effect would it have on startup times and these benchmarks if
Cython-compiled extensions were used?
I'm thinking here of elimination of .pyc interpretation and execution (statCPython checks for .so files before looking for .py files and imports are absolute by default in Py3, so there should be a slight reduction in stat calls. The net result then obviously also depends on how fast your shared library loader and linker is, etc., but I doubt that that path is any slower than loading and running a .pyc file.calls would be similar, probably slightly higher).
Sounds reasonable.To be clear - I'm *not* suggesting Cython become part of the required build
toolchain. But *if* the Cython-compiled extensions prove to be
significantly faster I'm thinking maybe it could become a semi-supported
option (e.g. a HOWTO with the caveat "it worked on this particular system").
On 28 October 2012 18:22, Stefan Behnel <stef...@behnel.de> wrote:How much of anDepends on what and how much code you use. If you compile everything into one big module that "imports" all of the stdlib when it gets loaded, you'd likely loose a lot of time because it would take a while to initialise all that useless code on startup. If you keep it separate, it would likely be a lot faster because you avoid the interpreter for most of the module startup.effect would it have on startup times and these benchmarks if
Cython-compiled extensions were used?
I was specifically thinking in terms of the tests Brett ran (that was the full set on speed.python.org, wasn't it?),
On Mon, 29 Oct 2012 09:56:57 -0400This is using different compilers on the 2 computers, right?
Brett Cannon <br...@python.org> wrote:
> To see if the bad iterative_count and threaded_count results were
> consistently bad, I ran the benchmark suite on my MacBook Pro to see how
> "reliable" the benchmarks were. The output is below.
>
> Basically 6 benchmarks (regex_effbot, queens, startup_nosite,
> iterative_count, threaded_count, and telco) had a variance of more than 15%
> performance between my 2 computers, although queens, iterative_count, and
> threaded_count were the only ones that swung between neutral/good to bad
> depending on the machine (the rest either want from bad to very bad, or
> very good to more very good).
lru_cache on re._compile_typed