Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

import glob doesn't work?

16 views
Skip to first unread message

Dale Nagata

unread,
Jun 15, 1998, 3:00:00 AM6/15/98
to

Hi,

I don't seem to be able to use the glob module from inside a script.
I'm using Python 1.5 on NT 4.0.

It looks ok I run python interactively:

>>> import glob
>>> dir(glob)
['__builtins__', '__doc__', '__file__', '__name__', 'fnmatch', 'glob',
'glob1',
'has_magic', 'magic_check', 'os', 're']
>>> type(glob)
<type 'module'>
>>> type(glob.glob)
<type 'function'>

But if I try to use it from inside a script, I get errors:

import sys
import socket
import glob

def test():
filelist = []
args = sys.argv[1:]
for arg in args:
explist = glob.glob( arg )
for match in explist:
filelist.append( match )
print filelist

if __name__ == '__main__':
print type(socket)
print type(socket.socket)
print type(glob)
print type(glob.glob)
print dir(glob)
print dir(glob.glob)
test()

V:\tmp>python glob.py *.py
<type 'module'>
<type 'function'>
<type 'module'>
<type 'module'>
['__builtins__', '__doc__', '__file__', '__name__', 'glob', 'socket',
'sys', 'te
st']
['__builtins__', '__doc__', '__file__', '__name__', 'glob', 'socket',
'sys', 'te
st']
Traceback (innermost last):
File "glob.py", line 20, in ?
test()
File "glob.py", line 9, in test
explist = glob.glob( arg )
TypeError: call of non-function

It seems to distinguish the module socket from the function
socket.socket ok, but it seems to think glob or glob.glob
is the global namespace???

Am I doing something wrong???

Thanks,


--
Dale Nagata | tel : +1 604.451.2700 ext. 2254 (UTC -0800)
Software Developer | fax : +1 604.437.9891
Creo Products Inc. | work: dna...@creo.com
Burnaby BC Canada | play: dna...@wimsey.com

Guido van Rossum

unread,
Jun 15, 1998, 3:00:00 AM6/15/98
to

> I don't seem to be able to use the glob module from inside a script.
[...]

> V:\tmp>python glob.py *.py
[...]

> TypeError: call of non-function
>
> It seems to distinguish the module socket from the function
> socket.socket ok, but it seems to think glob or glob.glob
> is the global namespace???
>
> Am I doing something wrong???

You bet :-)

The problem is that you called your script file glob.py, and so your
"import glob" will import your script instead of the standard glob
module. Rename the script and the problem will go away.

--Guido van Rossum (home page: http://www.python.org/~guido/)


Jan Decaluwe

unread,
Jun 15, 1998, 3:00:00 AM6/15/98
to

Guido van Rossum wrote:
>
>
> The problem is that you called your script file glob.py, and so your
> "import glob" will import your script instead of the standard glob
> module. Rename the script and the problem will go away.
>

This reminds me of the problem I had with the "intra-package shortcut".
Because of this, I can't import the standard module re in my
own module easics.re. This goes somewhat against my intuition to
use packages to define hierarchical namespaces.

A solution could be built a virtual top-level package into the language
so that I could use the recommended way to import from packages and say:
from standard import re
to unambiguously get what I want.

Regards, Jan

--
===================================================================
Jan Decaluwe === Easics ===
Design Manager === VHDL-based ASIC design services ===
Tel: +32-16-395 600 ===================================
Fax: +32-16-395 619 Interleuvenlaan 86, B-3001 Leuven, BELGIUM
mailto:ja...@easics.be http://www.easics.com

Marc Wachowitz

unread,
Jun 17, 1998, 3:00:00 AM6/17/98
to

Guido van Rossum <gu...@CNRI.Reston.Va.US> writes:
> Think of it this way: you have a local variable 'x' which hides a
> global variable 'x'. How do you access the global variable? You
> don't -- you have to rename the local variable.

This analogy doesn't really fit well for package/module names, since the
context of globals vs. locals is in one file and hardly changed independently,
whereas the name space of packages/modules is 1) much larger and 2) changed
by completely independent people (and later changes to existing names will
affect much more code).

--
GiS - Gesellschaft fuer integrierte Systemplanung mbH
Marc Wachowitz Tel. +49-6201-503-38
Junkersstr. 2 Fax +49-6201-503-66
D-69469 Weinheim m.wac...@gis.ibfs.de

Marc Wachowitz

unread,
Jun 17, 1998, 3:00:00 AM6/17/98
to

Guido van Rossum <gu...@CNRI.Reston.Va.US> writes:
> > A solution could be built a virtual top-level package into the language
> > so that I could use the recommended way to import from packages and say:
> > from standard import re
> > to unambiguously get what I want.
>
> This idea has merit. It's slightly convoluted though.

As long as it isn't forced, so what? Everyone will be able to make a choice
between short, simple names or being safe even in future versions.

> What's the
> reason that you want to name your submodule the same as a standard
> module?

There are many standard modules, and they're growing - thus even if one does
not want to introduce a name clash now, it may silently (at least until run
time) introduce a name clash in a later release of Python, when one already
has many references to (and documentation for) the established module/package.

Some of the standard names are also sufficiently generic that one might want
to have one's own bag of utilities (e.g. local.string or local.html). We'd
really like some notation (e.g. __python__.string) for Python's standard libs,
such that we can use it for all standard modules, and another package (which
is unlikely to be ever used by anyone else) for all of our own modules, and
then never again worry about name space pollution. It would be much overhead
for casual scripting (thus one can still use the direct, "implicit" form for
that purpose), but these concerns are quite relevant for a large, long-term
project.

Btw, it would also be convenient to be able to rename an entity on import,
e.g.

import a = x
from b import c = d

would be similar to

import x
a = x
del x
from b import d
c = d
del d

(except that it shouldn't even touch potential local bindings for x and d).
To avoid namespace pollution and for performance reasons, we're currently
almost always doing something like the following for imported functions (or
generally any non-changing bindings):

import string
_string__split = string.split
_string__join = string.join
del string

This could then simply be written like this

from string import _string__split = split, _string__join = join

which is much more pleasant. (Would it be hard to allow line-breaks after
"import" and "," without needing a backslash? Then one could format it nicely.)

Another idea for optimization: Many top-level bindings are known not to be
changed at any later time; it might be possible to have an annotation which
allows functions to incorporate the value of that binding at defintion time,
instead of looking it up in a dictionary dynamically. (Yes, I do know about
the possibility to pass it as default argument, but that's too ugly, and bad
for error checking in case someone passes too many arguments to the function,
or if the function takes keyword or a variable number of arguments).

Marc Wachowitz

unread,
Jun 17, 1998, 3:00:00 AM6/17/98
to

Guido van Rossum <gu...@CNRI.Reston.Va.US> writes:
> Think of it this way: you have a local variable 'x' which hides a
> global variable 'x'. How do you access the global variable? You
> don't -- you have to rename the local variable.

This analogy doesn't really fit well for package/module names, since the
context of globals vs. locals is in one file and hardly changed independently,
whereas the name space of packages/modules is 1) much larger and 2) changed
by completely independent people (and later changes to existing names will
affect much more code).

--

Guido van Rossum

unread,
Jun 17, 1998, 3:00:00 AM6/17/98
to

[nn]

> > > A solution could be built a virtual top-level package into the language
> > > so that I could use the recommended way to import from packages and say:
> > > from standard import re
> > > to unambiguously get what I want.

[me]


> > This idea has merit. It's slightly convoluted though.

[Marc Wachowitz]


> As long as it isn't forced, so what? Everyone will be able to make a choice
> between short, simple names or being safe even in future versions.

I bet most people would choose short names.

> > What's the
> > reason that you want to name your submodule the same as a standard
> > module?

> There are many standard modules, and they're growing - thus even if
> one does not want to introduce a name clash now, it may silently (at
> least until run time) introduce a name clash in a later release of
> Python, when one already has many references to (and documentation
> for) the established module/package.

The funny thing is that this is not what happened in the one example
of this that was posted: instead, they wanted to have a module
'mine.re' that extended the standard 're' module. I think that would
be bad style even if there were no technical difficulties (it's like
naming your derived class the same as your base class) so, if
anything, this gives me a reason *not* to pursoe this :-)

> Some of the standard names are also sufficiently generic that one
> might want to have one's own bag of utilities (e.g. local.string or
> local.html). We'd really like some notation (e.g. __python__.string)
> for Python's standard libs, such that we can use it for all standard
> modules, and another package (which is unlikely to be ever used by
> anyone else) for all of our own modules, and then never again worry
> about name space pollution. It would be much overhead for casual
> scripting (thus one can still use the direct, "implicit" form for
> that purpose), but these concerns are quite relevant for a large,
> long-term project.

But I don't *like* reuse names in the way you propose. I think in the
long run it will be confusing if you define local modules with the
same name *and functionality* as standard modules.

> Btw, it would also be convenient to be able to rename an entity on import,
> e.g.
>
> import a = x
> from b import c = d

This has been proposed before. It's not worth adding to the language;
renaming is already so easy that I don't find it worth the cost of
adding new things to this level of the language (and to the tutorial,
and to the books, and to the collective minds of all Python users...).

Besides: quick, without looking at the definition; does "import a = b"
mean to import module a and give it local name b, or to import module
b and give it local name a? You chose the second; there have been
proposals "import a as b" which clearly reverse the order. I say it's
too ambiguous to touch with a 7-foot pole.

> would be similar to
>
> import x
> a = x
> del x
> from b import d
> c = d
> del d

> (except that it shouldn't even touch potential local bindings for x and d).
> To avoid namespace pollution and for performance reasons, we're currently
> almost always doing something like the following for imported functions (or
> generally any non-changing bindings):
>
> import string
> _string__split = string.split
> _string__join = string.join
> del string
>
> This could then simply be written like this
>
> from string import _string__split = split, _string__join = join

> which is much more pleasant. (Would it be hard to allow line-breaks
> after "import" and "," without needing a backslash? Then one could
> format it nicely.)

You find it more pleasant; I find it the road to language bloat.
For me, the classic syntax looks more readable.

> Another idea for optimization: Many top-level bindings are known not to be
> changed at any later time; it might be possible to have an annotation which
> allows functions to incorporate the value of that binding at defintion time,
> instead of looking it up in a dictionary dynamically. (Yes, I do know about
> the possibility to pass it as default argument, but that's too ugly, and bad
> for error checking in case someone passes too many arguments to the function,
> or if the function takes keyword or a variable number of arguments).

This idea has merit; I've been thinking about this myself. It's
really stupid that a function which uses len() in a for loop suffers
two dict lookups each time len() is called...

Terry Reedy

unread,
Jun 17, 1998, 3:00:00 AM6/17/98
to

gu...@CNRI.Reston.Va.US says...

>This idea has merit; I've been thinking about this myself. It's
>really stupid that a function which uses len() in a for loop suffers
>two dict lookups each time len() is called...

It seems to me that all that is needed is a companion to the 'global'
declaration statement: ie, 'fixed', 'static', or 'local' (as in 'these
are global vars that I want localized at compile time').

We do not want to go too far down the declaration path, but this one
more would be a definite improvement over the initialization hack:
easier to learn/teach/use/read and less dangerous.

Terry

Barry A. Warsaw

unread,
Jun 17, 1998, 3:00:00 AM6/17/98
to

>>>>> "MW" == Marc Wachowitz <m.wac...@gis.ibfs.de> writes:

MW> There are many standard modules, and they're growing - thus
MW> even if one does not want to introduce a name clash now, it
MW> may silently (at least until run time) introduce a name clash
MW> in a later release of Python, when one already has many
MW> references to (and documentation for) the established
MW> module/package.

This is actually something I worry about, and one reason why it might
be useful to have an implied standard module name (that could be
stated explicitly, kind of like the exceptions module).

But it might also be useful to start thinking about how the standard
library itself would be packagized. The hard part (aside from
classifying all the existing moduels) is dealing with backwards
compatibility.

-Barry

Tim Peters

unread,
Jun 18, 1998, 3:00:00 AM6/18/98
to

[guido]

> This idea has merit; I've been thinking about this myself. It's
> really stupid that a function which uses len() in a for loop
> suffers two dict lookups each time len() is called...

[Terry Reedy]


> It seems to me that all that is needed is a companion to the
> 'global' declaration statement: ie, 'fixed', 'static', or
> 'local' (as in 'these are global vars that I want localized
> at compile time').

Probably most bang for the quick-hack buck to focus on the builtins. There
could be a shared vector of builtin objects, & the compiler could (when
allowed to) then emit a new "LOAD_BUILTIN i" opcode for a reference to the
i'th builtin in the list; at runtime, LOAD_BUILTIN would zoom just as fast
as LOAD_FAST does today.

Then, since this works by replacing builtins with fixed indices, the natural
new keyword would be ... "indexing" <cackle>. It's OK if you don't get the
joke; Guido is groaning & Just is laughing <wink>.

> We do not want to go too far down the declaration path,

I don't think we want to go down the *required* declaration path at all; but
the *optional* declaration path should eventually become a 12-lane
superhighway.

> but this one more would be a definite improvement over the
> initialization hack: easier to learn/teach/use/read and less
> dangerous.

Oh ya! If we stuck to builtins at first, it may even be cooler to have a
compilation switch instead of decls, where the switch means "I promise that
whenever I use a name of a Python builtin, I'm not trying to fool myself.".
I don't think I've ever written a Python program where I've tried to hijack
a builtin name.

and-if-someone-else-has-i-don't-like-them-so-who-cares-ly y'rs - tim

M.-A. Lemburg

unread,
Jun 18, 1998, 3:00:00 AM6/18/98
to

Tim Peters wrote:
>
> [guido]
> > This idea has merit; I've been thinking about this myself. It's
> > really stupid that a function which uses len() in a for loop
> > suffers two dict lookups each time len() is called...

Calling this in module scope saves one lookup...

def localize_builtins():

try:
1/0
except:
frame = sys.exc_info()[2].tb_frame.f_back
builtins = frame.f_builtins
locals = frame.f_locals
for k,v in builtins.items():
if not locals.has_key(k):
locals[k] = v
del frame,builtins,locals

...though it will probably not be noticable in real-world applications:

/home/lemburg> python
Python 1.5 (...)
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> def f():
... t = ()
... for i in range(1000000):
... len(t)
...
>>> import hack
>>> import Tools
>>> hack.clock('f()')
5.720abs 5.720usr sec.
>>> Tools.localize_builtins()
>>> hack.clock('f()')
5.000abs 5.000usr sec.

> > but this one more would be a definite improvement over the
> > initialization hack: easier to learn/teach/use/read and less
> > dangerous.
>
> Oh ya! If we stuck to builtins at first, it may even be cooler to have a
> compilation switch instead of decls, where the switch means "I promise that
> whenever I use a name of a Python builtin, I'm not trying to fool myself.".
> I don't think I've ever written a Python program where I've tried to hijack
> a builtin name.

You've never used variables like id, len or type ? Seriously, a
compilation switch could be introduced to change the order of global
lookups: from "globals, builtins" to "builtins, globals". Or have
optimization level 2 (-OO) indicate this. The next step would be
inlining builtin function calls in the interpreter loop, adding
some more macros, etc. pp.

Some of these ideas, plus a bunch of others can be found in my
1.5 patch:

http://starship.skyport.net/~lemburg/mxPython-1.5.patch.gz

Didn't have time to update it to 1.5.1, will probably wait until 1.5.2
is out.

--
Marc-Andre Lemburg Y2000: 561 days left
---------------------------------------------------------------------
: Python Pages >>> http://starship.skyport.net/~lemburg/ :
---------------------------------------------------------


Guido van Rossum

unread,
Jun 18, 1998, 3:00:00 AM6/18/98
to

> > Oh ya! If we stuck to builtins at first, it may even be cooler to have a
> > compilation switch instead of decls, where the switch means "I promise that
> > whenever I use a name of a Python builtin, I'm not trying to fool myself.".
> > I don't think I've ever written a Python program where I've tried to hijack
> > a builtin name.
>
> You've never used variables like id, len or type ?

Local variables don't present a problem. Even globals that are
actually assigned to in the module are easily detectable. It's when
as part of the API to a module you expect your user to occasionally
write

import foomodule
foomodule.int = my_own_int_function

so as to change the behavior of int() throughout foomodule when you
get in trouble.

> Seriously, a
> compilation switch could be introduced to change the order of global
> lookups: from "globals, builtins" to "builtins, globals". Or have
> optimization level 2 (-OO) indicate this. The next step would be
> inlining builtin function calls in the interpreter loop, adding
> some more macros, etc. pp.

Compilation switches are bad -- this kind of stuff needs to be
settable per module.

M.-A. Lemburg

unread,
Jun 18, 1998, 3:00:00 AM6/18/98
to

Guido van Rossum wrote:
>
> > > Oh ya! If we stuck to builtins at first, it may even be cooler to have a
> > > compilation switch instead of decls, where the switch means "I promise that
> > > whenever I use a name of a Python builtin, I'm not trying to fool myself.".
> > > I don't think I've ever written a Python program where I've tried to hijack
> > > a builtin name.
> >
> > You've never used variables like id, len or type ?
>
> Local variables don't present a problem. Even globals that are
> actually assigned to in the module are easily detectable. It's when
> as part of the API to a module you expect your user to occasionally
> write
>
> import foomodule
> foomodule.int = my_own_int_function
>
> so as to change the behavior of int() throughout foomodule when you
> get in trouble.

The little function I posted works in that sense...

> > Seriously, a
> > compilation switch could be introduced to change the order of global
> > lookups: from "globals, builtins" to "builtins, globals". Or have
> > optimization level 2 (-OO) indicate this. The next step would be
> > inlining builtin function calls in the interpreter loop, adding
> > some more macros, etc. pp.
>
> Compilation switches are bad -- this kind of stuff needs to be
> settable per module.

...and it allows per module usage. Even if you manipulate __builtins__
*after* having called the function, the new entries are still
found -- at the cost of the second lookup.

I think eliminating the one lookup doesn't really pay off, though:
whatever is introduced should produce the same speedup as the keyword
hack and this can, AFAIK, only be done by turning the globals into function
locals (or something similar) at function creation time.

Brainstorming a bit here...

How about introducing a per module global __const__ that contains
a list (*) of global object names that can be considered constant.
[It could be filled in with __builtins__.keys() per default.] When
a function is created all global references are checked against
this table and the ones that are marked "constant" are looked up
in globals() and, upon success, get pre-bound as function locals.

__const__ = ['blah','aie']
blah = 2
def f():
print blah # blah is found to be a constant global
# and prebound as function local of the same
# name
print aie # aie is supposed to be constant too, but can't
# be found at function creation time,
# so it remains a global in the old sense
aie = 3

Then setting

mymodule.int = myint

would only have the desired effect if 'int' is not found
in mymodule.__const__. Which is natural in the sense that 'int'
cannot be considered constant anymore...

(*) A dictionary would probably be better w/r to performance.

Marc Wachowitz

unread,
Jun 18, 1998, 3:00:00 AM6/18/98
to

Guido van Rossum <gu...@CNRI.Reston.Va.US> writes:
> But I don't *like* reuse names in the way you propose. I think in the
> long run it will be confusing if you define local modules with the
> same name *and functionality* as standard modules.

I'm not saying that everyone should do this, and I could probably also be
convinced that those few instances where I thought about doing so aren't
a good idea; however, I'm strongly convinced that it's very very bad if
there's any chance that a future addition to the Python standard library
can create a name clash in existing packages, where some future modules
may indeed want to use the new module once it's there, without creating a
conflict with the already existing module and without renaming it. It's a
bit silly to have packages and then still to need some unique prefix for
one's module to be safe against future versions.

> Besides: quick, without looking at the definition; does "import a = b"
> mean to import module a and give it local name b, or to import module
> b and give it local name a? You chose the second; there have been
> proposals "import a as b" which clearly reverse the order.

Minimal surprise: In "x as y", y is the new name, just as usage of English
indicates (and I'd prefer that if it wouldn't introduce a new keyword),
and in "x = y", x is the new name, just like we're used to for assignment.

Guido van Rossum

unread,
Jun 18, 1998, 3:00:00 AM6/18/98
to

> The little function I posted works in that sense...

Sorry, I never saw it. Mea culpa.

> Brainstorming a bit here...
>
> How about introducing a per module global __const__ that contains
> a list (*) of global object names that can be considered constant.
> [It could be filled in with __builtins__.keys() per default.] When
> a function is created all global references are checked against
> this table and the ones that are marked "constant" are looked up
> in globals() and, upon success, get pre-bound as function locals.
>
> __const__ = ['blah','aie']
> blah = 2
> def f():
> print blah # blah is found to be a constant global
> # and prebound as function local of the same
> # name
> print aie # aie is supposed to be constant too, but can't
> # be found at function creation time,
> # so it remains a global in the old sense
> aie = 3
>
> Then setting
>
> mymodule.int = myint
>
> would only have the desired effect if 'int' is not found
> in mymodule.__const__. Which is natural in the sense that 'int'
> cannot be considered constant anymore...
>
> (*) A dictionary would probably be better w/r to performance.

I would separate this into two issues. The first one is the
implementation mechanism; the second one is the syntax.

The implementation mechanism could work as follows:

- Somehow the code generator has obtained a list of constant globals.
(Coming up with this list is the second issue.)

- For each name reference inside a function, if the code generator
decides that it's a global name (i.e. there are no assignments to it),
it checks whether it's in the list of constant globals. If so, it
assigns a local variable slot for it and generates code to reference
this local variable slot with a special instruction, say
LOAD_CONST_GLOBAL.

- These additional local variables are initialized to NULL just like
regular locals.

- The LOAD_CONST_GLOBAL instruction implementation in the VM uses the
local variables as a cache. The first time a particular const global
is referenced, the local slot will be NULL. The VM then fetches the
corresponding global or builtin, stores it in the local slot, and then
returns the object.

- If the local slot is not NULL, it returns whatever is found there.

Note that this implementation only requires that const globals don't
change during the function call. It still requires one or two
dictionary lookups per call per const global.

A more aggressive mechanism might use a separate vector, shared
between all calls to the same function, initialized in the same way.
I think this is preferable over prefetching the vector at function
definition time, since it doesn't require the const globals to be
already defined when the function is being defined; it only requires
that when the function is being called. For mutually recursive
functions, this makes the difference between working and not working.
The costs are about the same (the NULL test has to be made anyway and
whatever happens in the branch only happens once per python
invocation).

The second issue is how to come up with the list of const globals.

Here's a strawman that attempts to maximize the amount of stuff we can
optimize this way without requiring any source modifications; all it
requires is some static analysis of the module source.

My observation is that non-const globals are *usually* recognizable by
analyzing just the module's source (not any other source). I would
say that in the following cases, it is *normally* safe to assume that
if there's exactly one definition of a name it will be const global:

import spam

from spam import ham

def foo...

class Bar...

Add to this all builtins that aren't also the target of a global
assignment, and you can optimize almost all globals that normally
occur, without jeopardizing typical global *variables*.

This kind of ananysis is not beyond the abilities of the current code
generator.

Note that it's easy to "declare" exceptions without adding new syntax:
the occurrence of

x = x

anywhere is enough to indicate that x is not a const global.

Tim Peters

unread,
Jun 19, 1998, 3:00:00 AM6/19/98
to

[M.-A. Lemburg]

> You've never used variables like id, len or type ?

I'm an anal kind of guy, M-A! Got it into my head years ago that it was a
Bad Idea to hijack common function names in any language.

But rather than defend my personality defects <0.9 wink>, I'd rather endorse
Guido's later, better-considered reply. Python already distinguishes
between local and non-local names, and I remember the days when it didn't;
things are a lot faster now. The only thing stopping it from similarly
distinguishing between the two flavors of non-local is impossibility <wink>.
That extreme dynamicism makes possible some way-cool tricks, but at least
99.5% of my modules never exploit it. So I'd be delighted to dump a "don't
fret, the owls are exactly what they seem!" pragma into most of my modules.

> Seriously, a compilation switch could be introduced to change
> the order of global lookups: from "globals, builtins" to
> "builtins, globals". Or have optimization level 2 (-OO) indicate
> this.

It's too fat a brush to paint details this fine; you generally don't know
which other modules you may be sucking in, and even if you do you generally
don't know which will need to be compiled first. Too iffy for a switch that
*can* cause radical changes in semantics.

> The next step would be inlining builtin function calls
> in the interpreter loop, adding some more macros, etc. pp.

Never ever agitate for the next step before the first step is a done deal --
expanding the scope of a change, and especially in ad hoc directions,
slashes the odds of anything happening at all <0.5 wink>. Seriously, Guido
is God, but he can only deal with an infinite number of changes at any given
infinitesimal instant.

yes-that's-inscrutable-and-that's-the-whole-point<wink>-ly y'rs - tim

Marc Wachowitz

unread,
Jun 19, 1998, 3:00:00 AM6/19/98
to

Guido van Rossum <gu...@CNRI.Reston.Va.US> writes:
[...]

> Here's a strawman that attempts to maximize the amount of stuff we can
> optimize this way without requiring any source modifications; all it
> requires is some static analysis of the module source.
[...]

All this sounds quite interesting. I have another idea (which may or may not
work; I don't know much about the internals, but I trust your in-sights ;-)

Why worry about the constant-ness of global bindings at all, if what we
really want to optimize is the dictionary lookup? Wouldn't it be even
better to optimize name lookup, by having a per-module vector of global
variable slots, where a global name is associated with a fixed slot when
it's first encountered (whether during compilation or at run time), and
the index of the respective slot is remembered in every function which uses
any global variable. Thus the function of a module's name space would change
from a mapping of a name to a value towards a mapping from a name to a slot
index for the vector of values. This would preserve correctness for any
existing code, and (as far as I can tell) at most waste some space for code
which uses a module's global name space as a very dynamic dictionary (which
is probably already quite rare, and perhaps even deserves to be punished ;-)
In case a "del" is used for a global variable, this would keep the slot, but
store some "impossible" value in it (a NULL pointer or some special object,
depending on which needs less dynamic checks). Of course, the "__dict__" of
a module would then cease to be an ordinary dictionary, but if I remember
that correctly, dictionary lookup goes through some C-level function vector,
and yet another internal mapping type with a "magical" assignment method
shouldn't hurt too much.

Tim Peters

unread,
Jun 19, 1998, 3:00:00 AM6/19/98
to

[m-a and guido exchange ideas for speeding access to non-locals]

[guido]
> ...


> A more aggressive mechanism might use a separate vector, shared
> between all calls to the same function, initialized in the same
> way.

More aggressive: locals "live in" a function, and are normally implemented
via a vector attached to the function. globals "live in" a module, so are
normally implemented via a vector attached to ... the module? Sounds
natural to me <wink>.

I didn't really grasp what "constness" had to do with the problem. Sketch:
compile a module, you eventually know every name that was considered to be
global, as module compilation goes along assign a unique index to each
unique global name and append a slot for it to a module-level vector,
replace each LOAD/STORE_GLOBAL opcode with new LOAD/STORE_xxx opcodes that
index into that vector. LOAD_xxx would need to do a lookup in the builtins
if it found the slot was NULL, but e.g. after looking "len" up once every
use of "len" within the module thereafter would get the correct value
straight away.

If the user did "global len; len = 3" in some function, no problem, the
corresponding STORE_xxx would overwrite len's slot in the vector with 3, so
3 is what everyone else would see thereafter. Similarly for
currently-expensive global "count = count + 1" kinds of instrumentation & so
on.

Mutual recursion seems to work naturally too. E.g., when compiling "f" that
refers to a non-local not-seen-before "g", a NULL'ed slot is created for
"g", and is filled in correctly by the subsequent "def g"; while g's use of
"f" etc.

In other words, I'm trying to extend Guido's known-to-be *highly* effective
local-vrbl design in as natural a way as possible.

Glossing over many details in the hopes someone will point out the obvious
killer problem and so save me the trouble <wink>. Is the primary hangup
that some other module may do

import M
M.some_global = whatever

? Could it be that's the *only* real hangup? If so, not discounting it,
just pondering whether there may not be ways to deal with it that don't so
grossly warp a "natural" design for a module's non-local accesses.

For direct intramodule "M.xxx [= yyy]" kind of stuff, it seems like a module
object's getattr and setattr could easily map between names and indices,
yes?

Assuming that's not completely nuts either, that leaves dealing with code
mucking with module.__dict__ directly. No way to solve that in the current
framework, and relying on someone typing "x = x" to signal it's *possible*
for "x" doesn't really solve it either (notwithstanding that it will likely
work as intended most of the time -- but you don't want a Perl solution
unless you have no choice, and especially not one that attaches profound
meaning to what appears to be a typo <0.4 wink>).

Have I mentioned the notion of a signal-on-write dict before <0.5 wink>? If
more than one design problem settles for a partial solution just because we
can't tell when a magical dict has been modified, making them a little
*more* magical may lead to a direct solution. Or maybe not -- I'm really
sleepy <0.6 wink>.

modifications-are-almost-always-rarer-than-references-and-across-
module-boundaries-that's-gotta-be-25x-as-true-ly y'rs - tim

Jan Decaluwe

unread,
Jun 19, 1998, 3:00:00 AM6/19/98
to

Guido van Rossum wrote:
>
>
> The funny thing is that this is not what happened in the one example
> of this that was posted: instead, they wanted to have a module
> 'mine.re' that extended the standard 're' module.

Exactly right.

> I think that would
> be bad style even if there were no technical difficulties (it's like
> naming your derived class the same as your base class) so, if
> anything, this gives me a reason *not* to pursoe this :-)

In my thinking the package path is part of the name - hence the
names are unique. When importing, I would always use the full
package name, even inside the package itself - no shortcuts,
hence no ambiguity.

I now understand how the intra-package shortcut can help people
to move existing code to packaged code - thanks for explaining.
I accept that this may be much more relevant in practical terms
than the slightly distorted view on package usage that is causes.



> I think in the
> long run it will be confusing if you define local modules with the
> same name *and functionality* as standard modules.
>

What about the scenario where someone innocently uses a (short!)
module name inside his package and then later wants to make use of
a standard package with unrelated functionality but the "same name".
Would you consider this to be an unrealistic case?

M.-A. Lemburg

unread,
Jun 19, 1998, 3:00:00 AM6/19/98
to

Tim Peters wrote:
>
> [M.-A. Lemburg]
> > You've never used variables like id, len or type ?
>
> I'm an anal kind of guy, M-A! Got it into my head years ago that it was a
> Bad Idea to hijack common function names in any language.

That question wasn't meant to be taken too seriously :)

> > Seriously, a compilation switch could be introduced to change
> > the order of global lookups: from "globals, builtins" to
> > "builtins, globals". Or have optimization level 2 (-OO) indicate
> > this.
>
> It's too fat a brush to paint details this fine; you generally don't know
> which other modules you may be sucking in, and even if you do you generally
> don't know which will need to be compiled first. Too iffy for a switch that
> *can* cause radical changes in semantics.

Agreed. I like the module scope constants much better too...

> > The next step would be inlining builtin function calls
> > in the interpreter loop, adding some more macros, etc. pp.
>
> Never ever agitate for the next step before the first step is a done deal --
> expanding the scope of a change, and especially in ad hoc directions,
> slashes the odds of anything happening at all <0.5 wink>. Seriously, Guido
> is God, but he can only deal with an infinite number of changes at any given
> infinitesimal instant.

Oh, so it's no problem after all ;) Hmm, given the past development,
I think there is some negative exponential upper bound to that
infinite number of changes, though...

--
Marc-Andre Lemburg Y2000: 560 days left

M.-A. Lemburg

unread,
Jun 19, 1998, 3:00:00 AM6/19/98
to

Guido van Rossum wrote:
>
> > MAL:
> >
> > Brainstorming a bit here...
> >
> > [Introducing global constants]

>
> I would separate this into two issues. The first one is the
> implementation mechanism; the second one is the syntax.
>
> The implementation mechanism could work as follows:
>
> - Somehow the code generator has obtained a list of constant globals.
> (Coming up with this list is the second issue.)
>
> - For each name reference inside a function, if the code generator
> decides that it's a global name (i.e. there are no assignments to it),
> it checks whether it's in the list of constant globals. If so, it
> assigns a local variable slot for it and generates code to reference
> this local variable slot with a special instruction, say
> LOAD_CONST_GLOBAL.
>
> - These additional local variables are initialized to NULL just like
> regular locals.
>
> - The LOAD_CONST_GLOBAL instruction implementation in the VM uses the
> local variables as a cache. The first time a particular const global
> is referenced, the local slot will be NULL. The VM then fetches the
> corresponding global or builtin, stores it in the local slot, and then
> returns the object.
>
> - If the local slot is not NULL, it returns whatever is found there.
>
> Note that this implementation only requires that const globals don't
> change during the function call. It still requires one or two
> dictionary lookups per call per const global.
>
> A more aggressive mechanism might use a separate vector, shared
> between all calls to the same function, initialized in the same way.

I like this better. Refilling the cache for every function call
would only speed up globals and builtins used within loops inside
functions. A per-function cache gives you full performance across
many calls. It would also make the global constants behaviour
a lot clearer and prevent the introduction of strange namespace
hacks that could by-pass the semantics of "constants".

> I think this is preferable over prefetching the vector at function
> definition time, since it doesn't require the const globals to be
> already defined when the function is being defined; it only requires
> that when the function is being called. For mutually recursive
> functions, this makes the difference between working and not working.
> The costs are about the same (the NULL test has to be made anyway and
> whatever happens in the branch only happens once per python
> invocation).

Agreed. This is better than what I had in mind.

> The second issue is how to come up with the list of const globals.
>

> Here's a strawman that attempts to maximize the amount of stuff we can
> optimize this way without requiring any source modifications; all it
> requires is some static analysis of the module source.
>

> My observation is that non-const globals are *usually* recognizable by
> analyzing just the module's source (not any other source). I would
> say that in the following cases, it is *normally* safe to assume that
> if there's exactly one definition of a name it will be const global:
>
> import spam
>
> from spam import ham

What about:
from Constants import *

>
> def foo...
>
> class Bar...
>
> Add to this all builtins that aren't also the target of a global
> assignment, and you can optimize almost all globals that normally
> occur, without jeopardizing typical global *variables*.
>
> This kind of ananysis is not beyond the abilities of the current code
> generator.
>
> Note that it's easy to "declare" exceptions without adding new syntax:
> the occurrence of
>
> x = x
>
> anywhere is enough to indicate that x is not a const global.

Nice all the way :-)

Donald Beaudry

unread,
Jun 19, 1998, 3:00:00 AM6/19/98
to

"Tim Peters" <tim...@email.msn.com> wrote,

> I didn't really grasp what "constness" had to do with the problem. Sketch:
> compile a module, you eventually know every name that was considered to be
> global, as module compilation goes along assign a unique index to each
> unique global name and append a slot for it to a module-level vector,
> replace each LOAD/STORE_GLOBAL opcode with new LOAD/STORE_xxx opcodes that
> index into that vector. LOAD_xxx would need to do a lookup in the builtins
> if it found the slot was NULL, but e.g. after looking "len" up once every
> use of "len" within the module thereafter would get the correct value
> straight away.

In the MESS I did a fairly interesting implementation of object
attributes. Rather than binding a name to a value, I bound the name
to an attribute. For the purposes of this discussion, you can simply
think of the attribute as the slot number. Assuming that you have a
mechanism to remember that you have already looked it up, you dont
need to look it up again.

> If the user did "global len; len = 3" in some function, no problem, the
> corresponding STORE_xxx would overwrite len's slot in the vector with 3, so
> 3 is what everyone else would see thereafter. Similarly for
> currently-expensive global "count = count + 1" kinds of instrumentation & so
> on.
>
> Mutual recursion seems to work naturally too. E.g., when compiling "f" that
> refers to a non-local not-seen-before "g", a NULL'ed slot is created for
> "g", and is filled in correctly by the subsequent "def g"; while g's use of
> "f" etc.
>
> In other words, I'm trying to extend Guido's known-to-be *highly* effective
> local-vrbl design in as natural a way as possible.
>
> Glossing over many details in the hopes someone will point out the obvious
> killer problem and so save me the trouble <wink>.

I like the details... I'd even bet the killer problems could be worked
out. But... not that all names are interned, I would suspect that a
much simpler caching mechanism would be quite effective. Just
allocate say 3 (or 5 or 7 or 13) slots for each module (or class or
instance or dict) map the address of the name onto one of those slots.
So long as the cache mechanism is inserted at a point through which
all access to the underlying object must be made, it should work just
fine. As a bonus, all access to the object would be sped up, not just
those that are made by the interpreter.

> Have I mentioned the notion of a signal-on-write dict before <0.5 wink>? If
> more than one design problem settles for a partial solution just because we
> can't tell when a magical dict has been modified, making them a little
> *more* magical may lead to a direct solution. Or maybe not -- I'm really
> sleepy <0.6 wink>.

Signal-on-write dicts.... hmm.... that would be fun too. But why
just on write? The kids in the back seat seem to think that signal-
on-read is useful too: "MOM! She's looking at me!".

--
Donald Beaudry Silicon Graphics
Performance Tools/SSO 1 Cabot Road
do...@sgi.com Hudson, MA 01749
...So much code, so little time...

Jim Fulton

unread,
Jun 20, 1998, 3:00:00 AM6/20/98
to

Tim Peters wrote:
>
> [M.-A. Lemburg]
> > You've never used variables like id, len or type ?
>
> I'm an anal kind of guy, M-A! Got it into my head years ago that it was a
> Bad Idea to hijack common function names in any language.

Well, did you ever use the variable name: "list" before Python 1.5?
Or did you always know it would become a builtin name eventually. ;-)

Jim

Tim Peters

unread,
Jun 20, 1998, 3:00:00 AM6/20/98
to

[M.-A. Lemburg]
> You've never used variables like id, len or type ?

[Timmy]


> I'm an anal kind of guy, M-A! Got it into my head years ago that it was a
> Bad Idea to hijack common function names in any language.

[Jim Fulton]


> Well, did you ever use the variable name: "list" before Python 1.5?
> Or did you always know it would become a builtin name eventually. ;-)

Of course -- the only one that ever caught me by surprise was getrefcount
<wink>.

I just double-checked, and best I can tell I never did use "list" before
1.5 -- but have used it as vrbl name at least twice after! QED for you and
M-A <wink>.

never-mind-ly y'rs - tim


M.-A. Lemburg

unread,
Jun 20, 1998, 3:00:00 AM6/20/98
to

[Donald & Tim on global lookup vectors]

Though the module lookup vector approach seems appealing there are a
few problems, IMHO:

· Keeping the two storage schemes in sync will become problematic
-- the only clean way I can see to implement this is to invent
a special dictionary implementation that also manages the extra
slots (something like a list-dict hybrid)

· Using the special type for module dicts only would cause
'exec code in globals,locals' to fail in case code uses the
special <load global via slot> opcode.

The per-function cache also has problems with the latter, since
the cache might already have been filled using some other global
dictionary which could result in strange behaviour.

That leaves the per-function-call cache approach that Guido
proposed...

[Why introduce "global constants" ?]

Caching non-constant values can only be implemented in the
value serving part of the implementation. Caching constant
values can easily be done on the client side with no further
side-effects. Introducing constants to the language will also
enable many other optimizations, like loop-unrolling, function
inlining, etc.

--
Marc-Andre Lemburg Y2000: 559 days left

Tim Peters

unread,
Jun 21, 1998, 3:00:00 AM6/21/98
to

[M.-A. Lemburg]

> Though the module lookup vector approach seems appealing there are a
> few problems, IMHO:

Mine too!

> · Keeping the two storage schemes in sync will become problematic
> -- the only clean way I can see to implement this is to invent
> a special dictionary implementation that also manages the extra
> slots (something like a list-dict hybrid)

"Clean" I don't care about so much about as "fast". But, yes, picturing a
dlict is a Big Hammer for Smashing Big Doubts. Let's say a dlict is a dict
that accepts either names or little integers as keys, where a 1-to-1
correspondence between name keys and their little-int alias keys is
specified in advance. A dlict promises to keep aliases in synch, such that
dlict[name] and dlict[name-as-int] always yield the same outcome.

A dlict can be implemented in a more-than-less obvious way as a thin wrapper
around a dict (probably clearer to *picture* it as a dict subtype though).

As a mental device, it certainly clarifies what I had in mind. It may be
better than that, though: dlict[string] can be as cheap as dict[string]
(and note that Donald points out that dict[interned-attribute-name] could be
made faster than it is today), while dlict[int] can be as cheap as
vector[int], and if 99% of dynamic references are of the latter form, that's
a huge win.

dlict[key] = yyy (regardless of key type) is probably more expensive than
dict[string] = yyy is today. But reads are generally much more common than
writes in any language, and for globals that's likely overwhelmingly true.

So if "clean" is a big deal <wink>, I don't think "fast" would suffer much
(compared to what's achievable via changing code all over the place to synch
up cleverly and minimally) to actually use honest-to-goodness dlicts in the
implementation.

> · Using the special type for module dicts only would cause
> 'exec code in globals,locals' to fail in case code uses the
> special <load global via slot> opcode.

I expect that code objects would need to hold on to their name->slot
mapping, and that it would be exec's burden to temporarily embed a raw dict
in a dlict (if a dlict is implemented as an object that contains an instance
of a dict, this embedding amounts to an incref & a pointer store, after
which the dict object would be automagically updated by the dlict however
the execution of the code object dictates).

> The per-function cache also has problems with the latter, since
> the cache might already have been filled using some other global
> dictionary which could result in strange behaviour.

Guido has been known to violate the Optimizer's Creed (Thou Shalt Not Alter
Visible Behaviors No Matter How Repulsive Thou Judgeth Them) before, though.
E.g., when locals were switched to use a vector-based implementation,
semantics in some end cases did change! And it was worth it.

I must say I'm uncomfortable basing a scheme on the compiler *guessing*
which globals are "const", though. There's nothing wrong with taking an
educated guess (optimizers do that all the time), the problem is that a
wrong guess in this case is neither harmless nor caught by the runtime; it
will most likely manifest as *subtly* wrong runtime behavior (if someone's
dynamically mucking with another module's globals, they're probably
replacing one function with another on the fly, where both functions "work"
in the sense of not blowing up).

> That leaves the per-function-call cache approach that Guido
> proposed...
>
> [Why introduce "global constants" ?]
>
> Caching non-constant values can only be implemented in the
> value serving part of the implementation. Caching constant
> values can easily be done on the client side with no further
> side-effects.

I do appreciate the quick-bang-for-the-buck attraction in this! Would be
more comfortable if the compiler actually knew which things were const; so
long as it doesn't, all values are in fact (potentially) non-constant, so
client-side-only caching is (potentially) unsafe.

Now if Guido wants to redefine the semantics of the language, that's fine.
Heck, more power to him. But I'm going to have a very hard time catching my
breath if you and I are *both* more concerned about backward compatibility
than he is in the same week <wink>.

> Introducing constants to the language will also enable many
> other optimizations, like loop-unrolling, function inlining, etc.

All that is orthogonal. That is, a const is a const is a const regardless
of how name resolution is done, and you can unroll, CSE, const-fold etc to
your heart's content even if names are resolved by sending queries over the
net to a Python name server -- you only need to know that an expression
*will* return the same object each time, not how that's accomplished.
Indeed, that's why Guido made such a point of separating the implementation
mechanism from how to come up with the list of const names.

So *your* real agenda isn't in jeopardy no matter how "speeding up len in a
loop" gets done <wink>.

or-if-it-wasn't-your-real-agenda-you-shouldn't-have-hidden-it-at-the-
end<wink>-ly y'rs - tim

Tim Peters

unread,
Jun 21, 1998, 3:00:00 AM6/21/98
to

[Donald Beaudry]

> In the MESS I did a fairly interesting implementation of object
> attributes. Rather than binding a name to a value, I bound the name
> to an attribute. For the purposes of this discussion, you can simply
> think of the attribute as the slot number. Assuming that you have a
> mechanism to remember that you have already looked it up, you dont
> need to look it up again.

Happy to take your word for it -- whatever it meant <0.9 wink>. It vaguely
reminds of Smalltalk-inspired "associations" Jim Fulton wrote about, though.

> ...


> I like the details... I'd even bet the killer problems could be
> worked out. But... not that all names are interned, I would
> suspect that a much simpler caching mechanism would be quite
> effective.

Simpler than a fixed unique index? Or are you saying simpler than an
unspecified elaboration of the simple vector would be after working out all
the killer problems?

> Just allocate say 3 (or 5 or 7 or 13) slots for each module (or
> class or instance or dict) map the address of the name onto one
> of those slots.

That's a tasteful collection of what appear to be small primes. I don't
know what they're doing here, but I like them all the same <wink>. Are you
re-inventing dicts here, specialized to a particular key type? Or just
pointing out that Python could be exploiting attribute-name interning much
more aggressively than it is today? I agree wholeheartedly with that last
point, anyway.

> So long as the cache mechanism is inserted at a point through which
> all access to the underlying object must be made, it should work just
> fine. As a bonus, all access to the object would be sped up, not just
> those that are made by the interpreter.

I'm interested in minimizing total time (over all runtime accesses). I'm
vaguely picturing you taking the modulus of an address wrt a small prime
(still ~100 cycles on an Alpha?), then doing some sort of collision
resolution in a too-small hash table <0.7 wink>. Based on that
possibly-total misunderstanding of what you have in mind, I'd much rather
(if possible) just do vector[i], since I expect the latter would suffice for
upwards of 99.9% of dynamic accesses in most programs. Rare operations can
afford to cost more than overwhelmingly common ones.

However it goes, though, part of the hangup in all these hoped-to-be-general
schemes is that there doesn't appear to *be* a "point through which all
access ... [is made]" today. What I understand of your approach still
leaves it *very* dict-like, and that may be a killer advantage: I figure
Guido is willing to say he'll spend two days on this (probably meaning 10
days in reality <2/10 wink>), so anything he doesn't think he can complete
in 3 is probably off the radar screen -- despite that if he signed up for a
7-day project (which would drag out to 23), the whole problem would be
solved once & for all <0.97 wink>.

> Signal-on-write dicts.... hmm.... that would be fun too. But why
> just on write? The kids in the back seat seem to think that signal-
> on-read is useful too: "MOM! She's looking at me!".

From what I've seen, kids come with far too many callbacks installed at the
factory already.

"growing-up"-considered-a-process-of-overriding-one's-base-classes-ly
'rs - tim

M.-A. Lemburg

unread,
Jun 21, 1998, 3:00:00 AM6/21/98
to

Tim Peters wrote:
>
> [M.-A. Lemburg]
> > · Keeping the two storage schemes in sync will become problematic
> > -- the only clean way I can see to implement this is to invent
> > a special dictionary implementation that also manages the extra
> > slots (something like a list-dict hybrid)
>
> "Clean" I don't care about so much about as "fast". But, yes, picturing a
> dlict is a Big Hammer for Smashing Big Doubts. Let's say a dlict is a dict
> that accepts either names or little integers as keys, where a 1-to-1
> correspondence between name keys and their little-int alias keys is
> specified in advance. A dlict promises to keep aliases in synch, such that
> dlict[name] and dlict[name-as-int] always yield the same outcome.
>
> A dlict can be implemented in a more-than-less obvious way as a thin wrapper
> around a dict (probably clearer to *picture* it as a dict subtype though).

Actually, rather than inventing a new type, the dlict (how do you
pronounce that ? D-List or D-L-Ice-Tea ?) functionality could be included in
the standard dictionary implementation using a seperate API for accessing
the values via integer indices.

> As a mental device, it certainly clarifies what I had in mind. It may be
> better than that, though: dlict[string] can be as cheap as dict[string]
> (and note that Donald points out that dict[interned-attribute-name] could be
> made faster than it is today), while dlict[int] can be as cheap as
> vector[int], and if 99% of dynamic references are of the latter form, that's
> a huge win.
>
> dlict[key] = yyy (regardless of key type) is probably more expensive than
> dict[string] = yyy is today. But reads are generally much more common than
> writes in any language, and for globals that's likely overwhelmingly true.

You wouldn't want all items to be interned: it would waste memory because
the int-to-name mapping will have to persist even if name was deleted
from the dictionary (replacing the corresponding vector slot with NULL).

> So if "clean" is a big deal <wink>, I don't think "fast" would suffer much
> (compared to what's achievable via changing code all over the place to synch
> up cleverly and minimally) to actually use honest-to-goodness dlicts in the
> implementation.
>
> > · Using the special type for module dicts only would cause
> > 'exec code in globals,locals' to fail in case code uses the
> > special <load global via slot> opcode.
>
> I expect that code objects would need to hold on to their name->slot
> mapping, and that it would be exec's burden to temporarily embed a raw dict
> in a dlict (if a dlict is implemented as an object that contains an instance
> of a dict, this embedding amounts to an incref & a pointer store, after
> which the dict object would be automagically updated by the dlict however
> the execution of the code object dictates).

True -- dlict could be implemented using the two existing types dictionary
and list. Yet the lookup opcode would still fail since it would
ask for non-existing ids.

> > The per-function cache also has problems with the latter, since
> > the cache might already have been filled using some other global
> > dictionary which could result in strange behaviour.
>
> Guido has been known to violate the Optimizer's Creed (Thou Shalt Not Alter
> Visible Behaviors No Matter How Repulsive Thou Judgeth Them) before, though.
> E.g., when locals were switched to use a vector-based implementation,
> semantics in some end cases did change! And it was worth it.
>
> I must say I'm uncomfortable basing a scheme on the compiler *guessing*
> which globals are "const", though. There's nothing wrong with taking an
> educated guess (optimizers do that all the time), the problem is that a
> wrong guess in this case is neither harmless nor caught by the runtime; it
> will most likely manifest as *subtly* wrong runtime behavior (if someone's
> dynamically mucking with another module's globals, they're probably
> replacing one function with another on the fly, where both functions "work"
> in the sense of not blowing up).

Though these global module namespace hacks (it would only break code
that dynamically *replaces* an existing function) fall under the
"use at you own risk" clause, they do pose a problem to automatic
constantness recognition...

The global vector approach does look more realistic the more I
think about it. Add to it the per-function-call global id cache,
drop the implicit "const" assumption and we should be able to arrive
at something fast, clean and beautiful.

[...constants...inlining function calls]

> So *your* real agenda isn't in jeopardy no matter how "speeding up len in a
> loop" gets done <wink>.
>
> or-if-it-wasn't-your-real-agenda-you-shouldn't-have-hidden-it-at-the-
> end<wink>-ly y'rs - tim

Oh, my agenda is full of nice things like speeding up global lookups,
writing an ordered mapping type, implementing new C level coercion schemes,
having methods wrap callables, inling builtin function calls,
etc. etc. [wouldn't fit here without being noticed ;-)]

--
Marc-Andre Lemburg Y2000: 558 days left

Tim Peters

unread,
Jun 22, 1998, 3:00:00 AM6/22/98
to

[tim]
> ...

> A dlict can be implemented in a more-than-less obvious way as a
> thin wrapper around a dict ...

[M.-A. Lemburg]


> Actually, rather than inventing a new type, the dlict (how do you
> pronounce that ? D-List or D-L-Ice-Tea ?)

I've been pronouncing it just like "dict", except with the
unknown-in-English "dl" beginning. Say it just right, and it sounds like a
Klingon vomiting. But since I moved to speech recognition, I find others
don't always share my sense of humor <wink>.

> functionality could be included in the standard dictionary
> implementation using a seperate API for accessing the values via
> integer indices.

Whatever Works(tm)! And that way makes more sense than including it in the
std list implementation with a separate API for accessing values via strings
<wink>.

> You wouldn't want all items to be interned: it would waste memory
> because the int-to-name mapping will have to persist even if name
> was deleted from the dictionary (replacing the corresponding vector
> slot with NULL).

I don't have a feel for the pragmatics of interning; Donald may wish to
address it.

Under a dlict-like scheme "most of the time", or Guido's original scheme
"all of the (applicable) time", it's just ints though.

> > > · Using the special type for module dicts only would cause
> > > 'exec code in globals,locals' to fail in case code uses the
> > > special <load global via slot> opcode.

> > I expect that code objects would need to hold on to their name->slot
> > mapping, and that it would be exec's burden to temporarily embed a

> > raw dict in a dlict ...

> True -- dlict could be implemented using the two existing types dictionary
> and list. Yet the lookup opcode would still fail since it would ask for
> non-existing ids.

I'm actively trying not to get too specific here, because this kind of stuff
Guido will look at and *instantly* know which of yea or nay applies -- would
just waste his time to make him slog thru my guesses about it.

That said, a specific example would make your point much clearer to me.
exec would, of course, need to initialize the "int half" of the dlict to be
consistent with the name->slot mapping carried by the code object, and with
the name->value mapping supplied by the input globals dict. Because of the
first half of that, every slot in the name->slot map would be known to the
dlict before the code object gets control -- in which case I'm not clear on
where a non-existing id could come from. If the code references an id
corresponding to a name that was not in the input globals dict, the
dlict[int] slot would be NULL, so LOAD_xxx would attempt to find the name in
the builtins. If it finds it there, no problem. If it doesn't, it's a
genuine error in the user's code (NameError, specifically).

> Though these global module namespace hacks (it would only break code
> that dynamically *replaces* an existing function) fall under the
> "use at you own risk" clause,

I actually have no code that mucks with Module.__dict__ directly, but have
plenty doing stuff like

import M
M.is_whitespace = lambda ch: ch in ' \t'

def g():
M.is_whitespace = lambda ch: ch in ' \t\n'

I don't see anything in the Language Ref warning that *this* spelling is
undefined, either, no matter how insanely dynamic it gets.

I would actively support changing the language definition to make code like
this illegal (or undefined, or-- best --explicitly decreed to "take effect"
only at well-defined times), but I fear it may be too late for that.

> they do pose a problem to automatic constantness recognition...

That's why I would like to change the language definition -- if possible.
But that's an agenda I'm not going to push now: I've lost count of how many
solid little improvements have gotten dropped on the floor because people
couldn't refrain from burying them under other Grand Schemes. In the U.S.
Congress, legislation can't get passed unless it lines the pockets of
someone in every state in the Union (albeit by emptying the pockets of
someone else -- & usually me <wink>) -- but this is a dictatorship. *Here*
it works better to keep it tightly focused and suck up to the boss a lot
<wink>.

> The global vector approach does look more realistic the more I
> think about it. Add to it the per-function-call global id cache,
> drop the implicit "const" assumption and we should be able to arrive
> at something fast, clean and beautiful.

My hope is that a module-level global vector would render the
per-function-call global id cache superfluous. Achieving that likely hinges
on making dlict[int] as fast as a per-function-call-cache lookup, which
implies that ceval needs to be able to do the dlict[int] part of the dlict
API inline. Violating abstraction in one opcode doesn't bother me.

One other thing *someone* <wink> should think about: if nested scopes were
added to Python, what happens to all these schemes? Offhand I think they
would all suck eggs -- except for Donald's.

> > or-if-it-wasn't-your-real-agenda-you-shouldn't-have-hidden-it-at-the-
> > end<wink>-ly y'rs - tim

> Oh, my agenda is full of nice things like speeding up global lookups,
> writing an ordered mapping type, implementing new C level coercion
> schemes, having methods wrap callables, inling builtin function calls,
> etc. etc. [wouldn't fit here without being noticed ;-)]

And I generally approve of your agenda, M-A! You don't have to keep hiding
it at the bottom unless you want to <grin>.

then-again-i-usually-hide-my-sigs-at-the-bottom-too-and-they're-
probably-the-only-thing-most-people-read-ly y'rs - tim

Marc Wachowitz

unread,
Jun 22, 1998, 3:00:00 AM6/22/98
to

Hey, three people with roughly similar ideas for optimization - shall we
create a design committee? ;-) Unfortunately, I'm pretty sure I won't have
the time to help implement any of this; at best (or worst, depending on the
reader's view) I can throw in remarks.

"Tim Peters" <tim...@email.msn.com> writes:
> Let's say a dlict is a dict
> that accepts either names or little integers as keys, where a 1-to-1
> correspondence between name keys and their little-int alias keys is
> specified in advance.

Note that "in advance" should only mean "upon first store via the index", not
"once together for all potential future keys". The slot vector should be able
to grow, though it can't ever shrink. To avoid frequent reallocation, the
slot vector should be allocated with some excess slots; e.g. increment the
number of allocated slots by 10 when it overflows. A clever code generator
could allocate the slots for statically known globals in advance, but one
could as well accept some heuristic initial size (based on statistics over
the existing library) and determine the size and the individual index values
incrementally at load time, if that's easier to fit into the existing code.
That will also help to unify it with dynamically created dlicts for "exec"
(see below).

Btw, I assume that the index-based access will use some special interface,
and not get mixed into __getitem__ depending on the key's type. That would
be unclean, could mask some bugs, and probably also cost performance.

> dlict[key] = yyy (regardless of key type) is probably more expensive than
> dict[string] = yyy is today.

Apart from the first store with a new key, when the indexed slot needs to be
allocated, there shouldn't be any noticable overhead. A dlict-entry would be
managed essentially like a dict-entry today, but instead of containing the
value directly, it would contain the index (stored as immediate C long int)
into the slot vector.

> > · Using the special type for module dicts only would cause
> > 'exec code in globals,locals' to fail in case code uses the
> > special <load global via slot> opcode.
>
> I expect that code objects would need to hold on to their name->slot
> mapping, and that it would be exec's burden to temporarily embed a raw dict
> in a dlict

How important (wrt. performance) are such "ordinary dicts as global scope"?
Unless this stuff is already critical in many places, I'd rather leave that
alone, by building a fallback into the optimized opcodes. That is, every
index-based lookup can fall back to a string-based lookup if it doesn't get
its dlict as the respective name space. This check should be fast, and since
it would only hurt performance a bit on failure, and then the full dict-based
lookup is already much slower than the index-based one, I think that's ok.

For new code which wants our optimization (e.g. for creating a sandbox) for
"exec", there may be a built-in function returning a new, empty dlict, which
the code can then populate as it likes. I don't think there's much sense in
hijacking ordinary dictionaries (or even worse, arbitrary mapping objects)
for this optimization.

> I must say I'm uncomfortable basing a scheme on the compiler *guessing*
> which globals are "const", though.

Agreed. I think Python should bite the sour grape and add some _extensible_
declaration mechanism, for which constant-declarations would be one usage,
and which could be used e.g. for type annotations once someone wants to try
that (whether to optimize operations on built-in types or for some (possibly
soft) type-checking). However, this would be an independent optimization,
and improving not-known-to-be-constant global/built-in variables is a worthy
goal by itself, since it can be non-intrusive for existing code and still get
the benefits.

Marc Wachowitz

unread,
Jun 22, 1998, 3:00:00 AM6/22/98
to

"Tim Peters" <tim...@email.msn.com> writes:
> That said, a specific example would make your point much clearer to me.
> exec would, of course, need to initialize the "int half" of the dlict to be
> consistent with the name->slot mapping carried by the code object,

As mentioned in another article; I think we shouldn't really try very hard
to optimize this case, too, and until someone shows good reasons to try it,
I'll rather not think much about it (this isn't meant to say you shouldn't
do so if you like to ;-), or that I'd think there's some unsolvable problem
in this area - I just think this special case isn't worth the complication).

> My hope is that a module-level global vector would render the
> per-function-call global id cache superfluous.

Yes, think (and hope) that. The code would simply lookup (possibly generate,
when it was the first time for a variable) and remember the index for "its
own" global module, _together_ with the variable name, and if it's executed
in the proper module, use the remembered index, or do the "traditional" name-
based lookup otherwise.

> Violating abstraction in one opcode doesn't bother me.

IMO, the primary motivation in this thread is the internal optimization; if
that results in something which can also be used by Python source code, that
will be a welcome bonus, but it shouldn't be required. After all, if Python
code wants a dictionary and a list with related contents, it can quite easily
get that via a more general wrapper class which coordinates both.

> One other thing *someone* <wink> should think about: if nested scopes were
> added to Python, what happens to all these schemes?

I assume you're thinking of nested first-class scopes, not nested local-like
scopes? I think the optimization would still work, by storing some kind of
"forwarding-pointer" in the intermediate layers (where a name wasn't found).
Optimized dynamically nested lookups would probably be somewhat slower than
non-nested optimized lookups (and non-optimized nested lookups would be much
slower than non-optimized non-nested lookups), but I'm optimistic that we
could find acceptable solutions for the performance - assuming that someone
can find acceptable solutions for the semantics and reasons for the effort to
add such a feature to Python in the first place, which may be less likely ;-)

Marc Wachowitz

unread,
Jun 22, 1998, 3:00:00 AM6/22/98
to

I wrote:
> [...] The code would simply lookup (possibly generate,

> when it was the first time for a variable) and remember the index for "its
> own" global module, _together_ with the variable name, and if it's executed
> in the proper module, use the remembered index, or do the "traditional" name-
> based lookup otherwise.

Just in case it wasn't clear: The above is supposed to happen at code
generation time or code load time.

Guido van Rossum

unread,
Jun 22, 1998, 3:00:00 AM6/22/98
to

(This thread is long overdue a renaming :-)

I'm fascinated by the proposed solutions. Some of them will probably
makeit into Python 2.0. But right now, without a Python Consortium in
place, I shrudder at the thought of making this into a 3-day project,
knowing that Tim predicts it will in fact take 23 days.

The only thing that doesn't require me to completely rethink the
structure of the code generator (which currently carries no knowledge
between functions in the same module, nor from the functions to the
module itself) would be the per-function-call caching scheme, which
can be implemented with one new opcode and some changes in the VM.

Here's yet another proposal to subtly change (Tim would say "subvert")
the semantics so as to enable this optimization: any global that isn't
listed in a 'global' statement is already read-only from the
function's point of view; it's only a simple step to assume that it
won't change at all during the call. The only place where this may go
wrong is cases like this:

option = 0 # default

def main():
...use option...
parse_command_line()
...use option...

def parse_command_line():
global option
...may assign a value to option...

I wouldn't dare enable this by default in 1.5.2, but I could see
having an interpreter flag that enabled it. It would be a 2-hour
project.

M.-A. Lemburg

unread,
Jun 22, 1998, 3:00:00 AM6/22/98
to

Guido van Rossum wrote:
>
> I'm fascinated by the proposed solutions. Some of them will probably
> makeit into Python 2.0. But right now, without a Python Consortium in
> place, I shrudder at the thought of making this into a 3-day project,
> knowing that Tim predicts it will in fact take 23 days.

Are there any plans for such a consortium ?

Aside:

IMHO, a consortium is not going to change much about the current
situation. It is not so much the coding effort that poses a problem
-- there are lots of volunteers out there that would gladly help.
I think the main issue is quality control and maybe getting a
multi-threaded version of the underlying GvR-engine ;-).

We would need a team of people that are willing to test new ideas (even
if
they don't get accepted later on because of some showstopper that
wasn't realized beforehand), provide the necessary patches
and some scheme for submitting these ideas for inclusion in the next
Python version.

Back to the subject:

> The only thing that doesn't require me to completely rethink the
> structure of the code generator (which currently carries no knowledge
> between functions in the same module, nor from the functions to the
> module itself) would be the per-function-call caching scheme, which
> can be implemented with one new opcode and some changes in the VM.
>
> Here's yet another proposal to subtly change (Tim would say "subvert")
> the semantics so as to enable this optimization: any global that isn't
> listed in a 'global' statement is already read-only from the
> function's point of view; it's only a simple step to assume that it
> won't change at all during the call. The only place where this may go
> wrong is cases like this:
>
> option = 0 # default
>
> def main():
> ...use option...
> parse_command_line()
> ...use option...
>
> def parse_command_line():
> global option
> ...may assign a value to option...
>
> I wouldn't dare enable this by default in 1.5.2, but I could see
> having an interpreter flag that enabled it. It would be a 2-hour
> project.

That would provide a nice test-bed for future extensions in the
direction of what we've discussed on this thread. The only fear
I have is that introducing a quick hack now could effectively
prevent further improvements -- unless the cache technique
is kept well hidden in the C level implementation or simply
left undocumented w/r to details.

--
Marc-Andre Lemburg Y2000: 557 days left

Guido van Rossum

unread,
Jun 22, 1998, 3:00:00 AM6/22/98
to

[I mentioned the Python Consortium in an aside.]

> Are there any plans for such a consortium ?

Yes. It was discussed during the panel session on the last
conference. We were hoping to have a Python consortium, modeled after
the X11 and W3C consortia, created by the next conference, with CNRI
as the host (like MIT hosted the X11 consortium). I'm not sure that
this schedule is still realistic, since we haven't made as much
progress as we hoped. I'm optimistic that we will get *something*
going though. Details have to be vague because prospective consortium
members don't want to be embarrassed by premature publicity, so please
don't ask about who will join. (On the other hand, if you are
interested in participating as a consortium member, please get in
touch with me! Understand that we expect a serious financial
commitment, of, say, at least an order of magnitude more than a
corporate PSA membership, for a number of years.)

> Aside:
>
> IMHO, a consortium is not going to change much about the current
> situation. It is not so much the coding effort that poses a problem
> -- there are lots of volunteers out there that would gladly help.
> I think the main issue is quality control and maybe getting a
> multi-threaded version of the underlying GvR-engine ;-).

Actually, since I'm a single-CPU device, I would be more productive if
I had fewer threads competing for me. For example, the threads that
require me to do non-Python related research for CNRI (no matter how
interesting!) severely reduce the time I can devote to coding (and,
more importantly, designing the future architecture) for Python.

> We would need a team of people that are willing to test new ideas
> (even if they don't get accepted later on because of some
> showstopper that wasn't realized beforehand), provide the necessary
> patches and some scheme for submitting these ideas for inclusion in
> the next Python version.

A consortium could do a lot of good work in integrating those patches,
and coordinating all those independent efforts. Look at all the
different working groups at the W3C!

Guido van Rossum

unread,
Jun 22, 1998, 3:00:00 AM6/22/98
to

[me]

> > The only thing that doesn't require me to completely rethink the
> > structure of the code generator (which currently carries no knowledge
> > between functions in the same module, nor from the functions to the
> > module itself) would be the per-function-call caching scheme, which
> > can be implemented with one new opcode and some changes in the VM.
> >
> > Here's yet another proposal to subtly change (Tim would say "subvert")
> > the semantics so as to enable this optimization: any global that isn't
> > listed in a 'global' statement is already read-only from the
> > function's point of view; it's only a simple step to assume that it
> > won't change at all during the call. The only place where this may go
> > wrong is cases like this:
> >
> > option = 0 # default
> >
> > def main():
> > ...use option...
> > parse_command_line()
> > ...use option...
> >
> > def parse_command_line():
> > global option
> > ...may assign a value to option...
> >
> > I wouldn't dare enable this by default in 1.5.2, but I could see
> > having an interpreter flag that enabled it. It would be a 2-hour
> > project.

[Marc-Andre]


> That would provide a nice test-bed for future extensions in the
> direction of what we've discussed on this thread. The only fear
> I have is that introducing a quick hack now could effectively
> prevent further improvements -- unless the cache technique
> is kept well hidden in the C level implementation or simply
> left undocumented w/r to details.

Hm... What I had in mind doesn't help much at all for the more
elaborate approaches. I've actually spent the last 30 minutes
implementing it, and found that it even has a nice side effect.
Instead of adding a new opcode, I changed the existing LOAD_FAST
opcode to look for a global or built-in variable and cache it as a
local. This added about 12 lines of code to ceval.c. Then I changed
optimize() in compile.c to create a local variable for every global
that was only read -- another 5 lines. This is all it takes.

As a side effect, the changes to ceval.c also "fixes" the following
problem (reported on this list a few times):

x = 1

def f():
print x
...100 lines of unrelated code...
if something:
x = 2
...code using x...

This currently breaks on the 'print x' because x is a local; after the
change, it does what the programmer expected.

An unanticipated problem with the total set of changes is that it
affects locals(). But I still wonder if locals() really needs to
exist: most postings that reference it seem to focus on some
misunderstanding about it! It is really a remnant of a more dynamic
language than Python needs to be.

While I'm writing this, I'm timing my code. Unfortunately, it seems
to slow down rather than speed up! E.g., pystone goes down from 2257
to 1996 for this particular machine. This may have to do with the
larger volume of memory set aside for locals, without any benefit in
functions that only look up said globals once (or worse, less than
once on average). Or it may have to do with the dreaded cache lines
-- where random code rearrangements may change the performance up or
down by 10%!

I'll include my patches here:

Index: compile.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Python/compile.c,v
retrieving revision 2.90
diff -c -r2.90 compile.c
*** compile.c 1998/05/14 02:16:20 2.90
--- compile.c 1998/06/22 18:53:36
***************
*** 3318,3323 ****
--- 3318,3328 ----
case EXEC_STMT:
c->c_flags &= ~CO_OPTIMIZED;
break;
+ case LOAD_NAME:
+ name = GETNAMEOBJ(oparg);
+ if (PyDict_GetItem(c->c_globals, name) == NULL)
+ com_addlocal_o(c, GETNAMEOBJ(oparg));
+ break;
}
}


Index: ceval.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Python/ceval.c,v
retrieving revision 2.148
diff -c -r2.148 ceval.c
*** ceval.c 1998/05/22 00:52:29 2.148
--- ceval.c 1998/06/22 20:09:52
***************
*** 1316,1330 ****

case LOAD_FAST:
x = GETLOCAL(oparg);
! if (x == NULL) {
! PyErr_SetObject(PyExc_NameError,
! PyTuple_GetItem(co->co_varnames,
! oparg));
! break;
}
! Py_INCREF(x);
! PUSH(x);
! if (x != NULL) continue;
break;

case STORE_FAST:
--- 1316,1338 ----

case LOAD_FAST:
x = GETLOCAL(oparg);
! if (x != NULL) {
! Py_INCREF(x);
! PUSH(x);
! continue;
}
! w = PyTuple_GetItem(co->co_varnames, oparg);
! x = PyDict_GetItem(f->f_globals, w);
! if (x == NULL)
! x = PyDict_GetItem(f->f_builtins, w);
! if (x != NULL) {
! Py_INCREF(x);
! SETLOCAL(oparg, x);
! Py_INCREF(x);
! PUSH(x);
! continue;
! }
! PyErr_SetObject(PyExc_NameError, w);
break;

case STORE_FAST:

M.-A. Lemburg

unread,
Jun 22, 1998, 3:00:00 AM6/22/98
to

Guido van Rossum wrote:
>
> [I mentioned the Python Consortium in an aside.]
>
> > Are there any plans for such a consortium ?
>
> Yes. It was discussed during the panel session on the last
> conference. We were hoping to have a Python consortium, modeled after
> the X11 and W3C consortia, created by the next conference, with CNRI
> as the host (like MIT hosted the X11 consortium). I'm not sure that
> this schedule is still realistic, since we haven't made as much
> progress as we hoped. I'm optimistic that we will get *something*
> going though. Details have to be vague because prospective consortium
> members don't want to be embarrassed by premature publicity, so please
> don't ask about who will join. (On the other hand, if you are
> interested in participating as a consortium member, please get in
> touch with me! Understand that we expect a serious financial
> commitment, of, say, at least an order of magnitude more than a
> corporate PSA membership, for a number of years.)

This is what I don't like about consortiums: to get in you'll
have to pay lots of money which at least some of the interested
people don't have. And of course paying members will always have more
influence than non-paying ones.

Instead of making money rule the development, I'd rather like to
see some more-or-less democratic way of deciding what goes in and what
not (with you as sole veto-owner). Linux is being managed that way,
so why not do something similar ?

> > IMHO, a consortium is not going to change much about the current
> > situation. It is not so much the coding effort that poses a problem
> > -- there are lots of volunteers out there that would gladly help.
> > I think the main issue is quality control and maybe getting a
> > multi-threaded version of the underlying GvR-engine ;-).
>
> Actually, since I'm a single-CPU device, I would be more productive if
> I had fewer threads competing for me. For example, the threads that
> require me to do non-Python related research for CNRI (no matter how
> interesting!) severely reduce the time I can devote to coding (and,
> more importantly, designing the future architecture) for Python.

What would you think of a python-dev list that takes at least
some of burdens of intergrating patches, testing, etc. off of your
back ? Then you'd have more time for design, etc.

> > We would need a team of people that are willing to test new ideas
> > (even if they don't get accepted later on because of some
> > showstopper that wasn't realized beforehand), provide the necessary
> > patches and some scheme for submitting these ideas for inclusion in
> > the next Python version.
>
> A consortium could do a lot of good work in integrating those patches,
> and coordinating all those independent efforts. Look at all the
> different working groups at the W3C!

Sure, but rather than expecting serious financial commitment, a
serious work commitment would probably result in much better results.

M.-A. Lemburg

unread,
Jun 22, 1998, 3:00:00 AM6/22/98
to

Guido van Rossum wrote:
> GvR: [new idea for optimizing global lookups]
> me: [could prevent further improvements]

>
> Hm... What I had in mind doesn't help much at all for the more
> elaborate approaches.

I meant the introduction of new attributes to code objects, but
you didn't do that, so it's not an issue.

> I've actually spent the last 30 minutes
> implementing it, and found that it even has a nice side effect.
> Instead of adding a new opcode, I changed the existing LOAD_FAST
> opcode to look for a global or built-in variable and cache it as a
> local. This added about 12 lines of code to ceval.c. Then I changed
> optimize() in compile.c to create a local variable for every global
> that was only read -- another 5 lines. This is all it takes.

LOAD_FAST is very touchy about being changed. Even small changes
in size can result in rather drastic slow-downs due to processor
caching strategies, compilers optimizing for wrong targets, etc...

Here is a statistic that gives a rough impression of the importance
of LOAD_FAST:

read 53520000 opcodes
Opcode frequencies:
------------------------------------------------------------------------
LOAD_FAST (124) : 15051586 ================================
LOAD_CONST (100) : 8812325 ==================
STORE_FAST (125) : 7884359 ================
LOAD_ATTR (105) : 3350236 =======
STORE_ATTR ( 95) : 2890260 ======
CALL_FUNCTION (131) : 1899818 ====
POP_TOP ( 1) : 1700232 ===
STORE_SUBSCR ( 60) : 1128940 ==
LOAD_NAME (101) : 851747 =
RETURN_VALUE ( 83) : 797362 =

> As a side effect, the changes to ceval.c also "fixes" the following
> problem (reported on this list a few times):
>
> x = 1
>
> def f():
> print x
> ...100 lines of unrelated code...
> if something:
> x = 2
> ...code using x...
>
> This currently breaks on the 'print x' because x is a local; after the
> change, it does what the programmer expected.

Nice ;-)

> An unanticipated problem with the total set of changes is that it
> affects locals(). But I still wonder if locals() really needs to
> exist: most postings that reference it seem to focus on some
> misunderstanding about it! It is really a remnant of a more dynamic
> language than Python needs to be.

In what way does it affect locals() ? The cached entries don't
show up -- but that's a pleasent side-effect: people can't rely
on them being there...

> While I'm writing this, I'm timing my code. Unfortunately, it seems
> to slow down rather than speed up! E.g., pystone goes down from 2257
> to 1996 for this particular machine. This may have to do with the
> larger volume of memory set aside for locals, without any benefit in
> functions that only look up said globals once (or worse, less than
> once on average). Or it may have to do with the dreaded cache lines
> -- where random code rearrangements may change the performance up or
> down by 10%!

In my patched 1.5 version I isolated the LOAD_FAST opcode from
the switch statement. It's worth a try.

Greg Ewing

unread,
Jun 23, 1998, 3:00:00 AM6/23/98
to

Optimising access to globals in the current module
is well and good, but what about when the global
is in another module?

Or is this case not considered worth optimising?

Seems to me it could be quite important if you
like to use functions from other modules a lot
without importing their names into your own module.

--
Greg Ewing, Computer Science Dept, | The address below is not spam-
University of Canterbury, | protected, so as not to waste
Christchurch, New Zealand | the time of Guido van Rossum.
gr...@cosc.canterbury.ac.nz

Guido van Rossum

unread,
Jun 23, 1998, 3:00:00 AM6/23/98
to

> Optimising access to globals in the current module
> is well and good, but what about when the global
> is in another module?
>
> Or is this case not considered worth optimising?
>
> Seems to me it could be quite important if you
> like to use functions from other modules a lot
> without importing their names into your own module.

Or how about optimizing instance attribute access...

One approach with a much wider applicability would be to have a
super-optimized dictionary subclass that only takes string arguments.
(JPython uses this.)

Just van Rossum

unread,
Jun 23, 1998, 3:00:00 AM6/23/98
to

At 4:58 PM -0400 6/22/98, Guido van Rossum wrote:
>An unanticipated problem with the total set of changes is that it
>affects locals().

Would that mean that frameobject.f_locals is also affected?
If yes: I consider that attribute essential for proper debugging (for the
DebuggingProgram, not the DebuggingUser obviously), and I fear that if
f_locals also contains some globals as well it will confuse anyone who is
not entirely familiar with Python's guts. I mean, I have wonderful "locals"
and "globals" object browser widgets in my debugger, and I would hate to
see these things "break" due to some obscure optimization...

>But I still wonder if locals() really needs to
>exist: most postings that reference it seem to focus on some
>misunderstanding about it! It is really a remnant of a more dynamic
>language than Python needs to be.

I don't care much about locals(), but frameobject.f_locals I can't do
without...

Just

Andrew Kuchling

unread,
Jun 23, 1998, 3:00:00 AM6/23/98
to

M.-A. Lemburg writes:
>Instead of making money rule the development, I'd rather like to
>see some more-or-less democratic way of deciding what goes in and what
>not (with you as sole veto-owner). Linux is being managed that way,
>so why not do something similar ?

Guiding development is not the problem that a consortium
attempts to solve; that's fairly easily done by newsgroup and mailing
list chatter. The problem is actually implementing the new features.
Python maintenance takes a fair amount of GvR's time, and CNRI isn't
willing to just pay Guido's salary and then just let him mess around
on whatever he likes. (And, I think I speak for all Guido's
co-workers when I say that, if CNRI *were* willing to do that, we'd
have to kill that lucky @#$%^. So it's really for the best...)

There are already outstanding problems which seem to be
important, but no one really has the time to work on them. (I'm
thinking of Unicode here, but I'm sure you can come up with your own
list. Try it; it's fun!) As in most free software projects,
contributing running code will probably continue to be a very
effective low-cost way of affecting Python's development.

--
A.M. Kuchling http://starship.skyport.net/crew/amk/
Thou hast made the Furies cry, Orpheus. They will never forgive you for that.
-- Queen Persephone, in SANDMAN: "The Song of Orpheus"


Fred L. Drake

unread,
Jun 23, 1998, 3:00:00 AM6/23/98
to

Andrew Kuchling writes:
> on whatever he likes. (And, I think I speak for all Guido's
> co-workers when I say that, if CNRI *were* willing to do that, we'd
> have to kill that lucky @#$%^. So it's really for the best...)


I think we could arrange for that long-awaited bus...


;-)


-Fred

--
Fred L. Drake, Jr. <fdr...@acm.org>
Corporation for National Research Initiatives
1895 Preston White Dr. Reston, VA 20191

Guido van Rossum

unread,
Jun 23, 1998, 3:00:00 AM6/23/98
to

[me]

> >An unanticipated problem with the total set of changes is that it
> >affects locals().

[my brother]


> Would that mean that frameobject.f_locals is also affected?
> If yes: I consider that attribute essential for proper debugging (for the
> DebuggingProgram, not the DebuggingUser obviously), and I fear that if
> f_locals also contains some globals as well it will confuse anyone who is
> not entirely familiar with Python's guts. I mean, I have wonderful "locals"
> and "globals" object browser widgets in my debugger, and I would hate to
> see these things "break" due to some obscure optimization...
>
> >But I still wonder if locals() really needs to
> >exist: most postings that reference it seem to focus on some
> >misunderstanding about it! It is really a remnant of a more dynamic
> >language than Python needs to be.
>
> I don't care much about locals(), but frameobject.f_locals I can't do
> without...

Yes, it would also affect f_locals. However, in a real implementation
it would be easy to make some kind of distinction for the debugger
between "real" locals and cached globals, e.g. by adding a field to
the code object that gives how many locals there really are (the
'co_varnames' gives the local variable names first and the cached
globals second).

I would think that the proposed semantic change requires the debugger
to change anyway, because it can cause occasional confusion. It would
be helpful if the debugger could inform the user that a particular
local variable shadows a global...

Guido van Rossum

unread,
Jun 23, 1998, 3:00:00 AM6/23/98
to

[me]

> > Hm... What I had in mind doesn't help much at all for the more
> > elaborate approaches.

[Marc-Andre]


> I meant the introduction of new attributes to code objects, but
> you didn't do that, so it's not an issue.

It's about time that someone started work on a framework for a real
Python bytecode optimizer... This shouldn't be particularly hard, and
might ve *very* fruitful. Search Deja News for old posts by Tim
Peters for suggestions on how to proceed. (E.g. make many simple
optimizer passes that each do one simple, obviously correct
transformation, then run all passes repeatedly until nothing changes.
A better data structure to hold the intermediate byte code would be an
array of fixed-size structures each representing one instruction,
rather than a string of actual bytecodes; this would simplify
traversing by the optimizer passes enormously.)

> > I've actually spent the last 30 minutes
> > implementing it, and found that it even has a nice side effect.
> > Instead of adding a new opcode, I changed the existing LOAD_FAST
> > opcode to look for a global or built-in variable and cache it as a
> > local. This added about 12 lines of code to ceval.c. Then I changed
> > optimize() in compile.c to create a local variable for every global
> > that was only read -- another 5 lines. This is all it takes.
>

> LOAD_FAST is very touchy about being changed. Even small changes
> in size can result in rather drastic slow-downs due to processor
> caching strategies, compilers optimizing for wrong targets, etc...

[and later]


> In my patched 1.5 version I isolated the LOAD_FAST opcode from
> the switch statement. It's worth a try.

Hm... Perhaps that happened to help in your case because it caused
the resulting code to hit the cache lines in a different pattern. For
me, today, it made the pystone rating go *down* 10%. I think I'm not
going to trust performance hacks that claim to change the pystone
rating by up to 10% any more...

> > An unanticipated problem with the total set of changes is that it
> > affects locals(). But I still wonder if locals() really needs to
> > exist: most postings that reference it seem to focus on some
> > misunderstanding about it! It is really a remnant of a more dynamic
> > language than Python needs to be.
>

> In what way does it affect locals() ? The cached entries don't
> show up -- but that's a pleasent side-effect: people can't rely
> on them being there...

Hm? The cached entries *do* show up (for me, anyway).

Guido van Rossum

unread,
Jun 23, 1998, 3:00:00 AM6/23/98
to

[Marc-Andre]

> This is what I don't like about consortiums: to get in you'll
> have to pay lots of money which at least some of the interested
> people don't have. And of course paying members will always have more
> influence than non-paying ones.
>
> Instead of making money rule the development, I'd rather like to
> see some more-or-less democratic way of deciding what goes in and what
> not (with you as sole veto-owner). Linux is being managed that way,
> so why not do something similar ?

And who will pay my salary? If I was paid by a consortium, I would be
able to do my job as "benevolent dictator" much better than now (with
other non-Python, CNRI-related responsibilities). We'd have to offer
the consortium members something tangible for their money, such as
early access to new code developed by me, but at least it would be
Python related, and eventually given back to the public (e.g. with a 6
month delay). Most Python code I write for CNRI is never released.

> What would you think of a python-dev list that takes at least
> some of burdens of intergrating patches, testing, etc. off of your
> back ? Then you'd have more time for design, etc.

I don't expect that the formation a python-dev would make my life
easier -- there are plenty of people making contributions now, and we
don't seem to have a communication problem. Testing is done by alpha
and beta testing anyway. Public discussion of proposals happens on
the newsgroup and in the SIGs. On the other hand, I'm reluctant to
give too much control away over what should go into the language --
many current users trust my judgement and appreciate my desire to keep
the language (relatively) small and simple.

> > A consortium could do a lot of good work in integrating those patches,
> > and coordinating all those independent efforts. Look at all the
> > different working groups at the W3C!
>
> Sure, but rather than expecting serious financial commitment, a
> serious work commitment would probably result in much better results.

Maybe, but a financial commitment is harder to back out of. There are
contracts and official signatures, and lawyers. If an individual
"commits" to write a module to do X, what recourse do I have when they
change their mind? I'd rather receive funding so I can *pay* someone
to write that module. All within the open source model, of course.
(See http://www.opensource.org/.)

M.-A. Lemburg

unread,
Jun 23, 1998, 3:00:00 AM6/23/98
to

Just van Rossum wrote:
>
> At 4:58 PM -0400 6/22/98, Guido van Rossum wrote:
> >An unanticipated problem with the total set of changes is that it
> >affects locals().
>
> Would that mean that frameobject.f_locals is also affected?
> If yes: I consider that attribute essential for proper debugging (for the
> DebuggingProgram, not the DebuggingUser obviously), and I fear that if
> f_locals also contains some globals as well it will confuse anyone who is
> not entirely familiar with Python's guts. I mean, I have wonderful "locals"
> and "globals" object browser widgets in my debugger, and I would hate to
> see these things "break" due to some obscure optimization...

The answer is yes, but I wouldn't really talk about breaking anything
(see below)...

import sys

# A global attribute
opt = 1

def print_opt():
print 'locals():',locals()
try:
1/0
except:
frame = sys.exc_info()[2].tb_frame
print 'f_locals:',frame.f_locals
print 'f_globals:',frame.f_globals
del frame

lopt = opt + 1
print
print 'opt was used...'
print

print 'locals():',locals()
try:
1/0
except:
frame = sys.exc_info()[2].tb_frame
print 'f_locals:',frame.f_locals
print 'f_globals:',frame.f_globals
del frame

print_opt()

This prints:

"""
patched/Python-1.5.1+globalopt> ./python -i testglobopt.py
locals(): {'locals': <built-in function locals>}
f_locals: {'locals': <built-in function locals>, 'frame': <frame object
at 80cd588>, 'sys': <module 'sys'>}
f_globals: {'sys': <module 'sys'>, '__doc__': None, 'opt': 1,
'print_opt': <function print_opt at 80cd7f0>, '__name__': '__main__',
'__builtins__': <module '__builtin__'>}

opt was used...

locals(): {'locals': <built-in function locals>, 'lopt': 2, 'opt': 1,
'sys': <module 'sys'>}
f_locals: {'locals': <built-in function locals>, 'lopt': 2, 'frame':
<frame object at 80cd588>, 'opt': 1, 'sys': <module 'sys'>}
f_globals: {'sys': <module 'sys'>, '__doc__': None, 'opt': 1,
'print_opt': <function print_opt at 80cd7f0>, '__name__': '__main__',
'__builtins__': <module '__builtin__'>}
"""

opt is found in the globals *and* the locals. But I wouldn't worry
about it too much, since the semantics don't change: the cached
globals are true locals in all aspects -- the only difference is
that they are not explicitly being assigned to.

Note that the patch produces byte code that will fail with
original 1.5.1 interpreter. A change of the byte code magic is
also be necessary.

As for Guido's measurement: I get these results (still well within
the 10% grey zone, but pointing in the right direction :)...

Original 1.5.1 with -O:

Pystone(1.1) time for 10000 passes = 2.98
This machine benchmarks at 3355.7 pystones/second

Patched 1.5.1 with -O:

Pystone(1.1) time for 10000 passes = 2.79
This machine benchmarks at 3584.23 pystones/second

I'll add some special tests to my benchmrk suite and see what the
overall result looks like.

Here is the whole patch with modified PYC-file magic (the one Guido
posted didn't work for 1.5.1, at least not for me):

--- Python/orig/compile.c Tue Jun 23 15:08:19 1998
+++ Python/compile.c Tue Jun 23 14:39:14 1998
@@ -3313,10 +3313,15 @@ optimize(c)
case STORE_NAME:
case DELETE_NAME:
case IMPORT_FROM:
com_addlocal_o(c, GETNAMEOBJ(oparg));


break;
+ case LOAD_NAME:
+ name = GETNAMEOBJ(oparg);
+ if (PyDict_GetItem(c->c_globals, name) == NULL)
+ com_addlocal_o(c, GETNAMEOBJ(oparg));
+ break;
case EXEC_STMT:
c->c_flags &= ~CO_OPTIMIZED;
break;
}
}

--- Python/orig/ceval.c Tue Jun 23 15:08:19 1998
+++ Python/ceval.c Tue Jun 23 14:40:25 1998
@@ -1318,20 +1318,28 @@ eval_code2(co, globals, locals,
PUSH(x);
break;



case LOAD_FAST:
x = GETLOCAL(oparg);

- if (x == NULL) {
- PyErr_SetObject(PyExc_NameError,
-
PyTuple_GetItem(co->co_varnames,
- oparg));
- break;
- }
- Py_INCREF(x);
- PUSH(x);
- if (x != NULL) continue;
- break;
+ if (x != NULL) {
+ Py_INCREF(x);
+ PUSH(x);
+ continue;
+ }
+ w = PyTuple_GetItem(co->co_varnames, oparg);
+ x = PyDict_GetItem(f->f_globals, w);
+ if (x == NULL)
+ x = PyDict_GetItem(f->f_builtins, w);
+ if (x != NULL) {
+ Py_INCREF(x);
+ SETLOCAL(oparg, x);
+ Py_INCREF(x);
+ PUSH(x);
+ continue;
+ }
+ PyErr_SetObject(PyExc_NameError, w);
+ break;

case STORE_FAST:
v = POP();
SETLOCAL(oparg, v);
continue;
--- Python/orig/import.c Tue Jun 23 15:08:19 1998
+++ Python/import.c Tue Jun 23 15:07:05 1998
@@ -78,11 +78,11 @@ extern long PyOS_GetLastModificationTime
a .pyc file in text mode the magic number will be wrong; also, the
Apple MPW compiler swaps their values, botching string constants */
/* XXX Perhaps the magic number should be frozen and a version field
added to the .pyc file header? */
/* New way to come up with the magic number: (YEAR-1995), MONTH, DAY */
-#define MAGIC (20121 | ((long)'\r'<<16) | ((long)'\n'<<24))
+#define MAGIC (30723 | ((long)'\r'<<16) | ((long)'\n'<<24))

/* See _PyImport_FixupExtension() below */
static PyObject *extensions = NULL;

/* This table is defined in config.c: */


--
Marc-Andre Lemburg Y2000: 556 days left

M.-A. Lemburg

unread,
Jun 23, 1998, 3:00:00 AM6/23/98
to

Andrew Kuchling wrote:

>
> M.-A. Lemburg writes:
> >Instead of making money rule the development, I'd rather like to
> >see some more-or-less democratic way of deciding what goes in and what
> >not (with you as sole veto-owner). Linux is being managed that way,
> >so why not do something similar ?
>
> Guiding development is not the problem that a consortium
> attempts to solve; that's fairly easily done by newsgroup and mailing
> list chatter. The problem is actually implementing the new features.

But that's exactly what I'm aiming at: let people contribute code
and ideas. If a company wants Unicode, let *them* contract a programmer
to have it implemented.

> Python maintenance takes a fair amount of GvR's time, and CNRI isn't

> willing to just pay Guido's salary and then just let him mess around


> on whatever he likes. (And, I think I speak for all Guido's
> co-workers when I say that, if CNRI *were* willing to do that, we'd
> have to kill that lucky @#$%^. So it's really for the best...)
>

> There are already outstanding problems which seem to be
> important, but no one really has the time to work on them. (I'm
> thinking of Unicode here, but I'm sure you can come up with your own
> list. Try it; it's fun!)

I know ;-) ...
http://starship.skyport.net/~lemburg/CoercionProposal.html
http://starship.skyport.net/~lemburg/CallableMethodsProposal.html
http://starship.skyport.net/~lemburg/mxPython-1.5.patch.gz

The problem with these patches is that adapting them to each new
available Python (sub-)version simply takes too long. If there
were a clear way to vote for inclusion the development could be
sped up. Guido could then spend more time evaluating ideas (though I
figure he has just as much fun coding new ideas as we do ;-) and
designing Python's future. Well, my two cents, anyway.

> As in most free software projects,
> contributing running code will probably continue to be a very
> effective low-cost way of affecting Python's development.

--

M.-A. Lemburg

unread,
Jun 23, 1998, 3:00:00 AM6/23/98
to

Guido van Rossum wrote:
>
> It's about time that someone started work on a framework for a real
> Python bytecode optimizer...

Skip Montanaro has something along those lines...

> This shouldn't be particularly hard, and
> might ve *very* fruitful. Search Deja News for old posts by Tim
> Peters for suggestions on how to proceed. (E.g. make many simple
> optimizer passes that each do one simple, obviously correct
> transformation, then run all passes repeatedly until nothing changes.
> A better data structure to hold the intermediate byte code would be an
> array of fixed-size structures each representing one instruction,
> rather than a string of actual bytecodes; this would simplify
> traversing by the optimizer passes enormously.)

This is pretty much what Skip's optimizer does.

> > LOAD_FAST is very touchy about being changed. Even small changes
> > in size can result in rather drastic slow-downs due to processor
> > caching strategies, compilers optimizing for wrong targets, etc...
> [and later]
> > In my patched 1.5 version I isolated the LOAD_FAST opcode from
> > the switch statement. It's worth a try.
>
> Hm... Perhaps that happened to help in your case because it caused
> the resulting code to hit the cache lines in a different pattern. For
> me, today, it made the pystone rating go *down* 10%. I think I'm not
> going to trust performance hacks that claim to change the pystone
> rating by up to 10% any more...

I get different readings (see below) but they too lie in the 10%
range.

BTW: Even though single performance patches may well lie within the
10% grey zone, applying several of them may still lead the way into
safe water, e.g. after applying all my patches to 1.5, I get
an interpreter that clocks at around 4150 pystones compared to
3350 for the original 1.5.1 code.

> > > An unanticipated problem with the total set of changes is that it
> > > affects locals(). But I still wonder if locals() really needs to
> > > exist: most postings that reference it seem to focus on some
> > > misunderstanding about it! It is really a remnant of a more dynamic
> > > language than Python needs to be.
> >

> > In what way does it affect locals() ? The cached entries don't
> > show up -- but that's a pleasent side-effect: people can't rely
> > on them being there...
>
> Hm? The cached entries *do* show up (for me, anyway).

Well, yes, but only after they have been used... I should have
written: ...the to be cached entries don't show up until they
are used... my fault, sorry.

-----------------------------------------------------------------------

Just van Rossum wrote:
>
> At 4:58 PM -0400 6/22/98, Guido van Rossum wrote:

> >An unanticipated problem with the total set of changes is that it
> >affects locals().
>

> Would that mean that frameobject.f_locals is also affected?
> If yes: I consider that attribute essential for proper debugging (for the
> DebuggingProgram, not the DebuggingUser obviously), and I fear that if
> f_locals also contains some globals as well it will confuse anyone who is
> not entirely familiar with Python's guts. I mean, I have wonderful "locals"
> and "globals" object browser widgets in my debugger, and I would hate to
> see these things "break" due to some obscure optimization...

The answer is yes...

import sys

print_opt()

This prints:

opt was used...

Original 1.5.1 with -O:

Patched 1.5.1 with -O:

I'll add some special tests to my benchmark suite and see what the
overall result looks like.

Here is the whole patch with modified PYC-file magic (the one Guido
posted didn't work for 1.5.1, at least not for me):

--- Python/orig/compile.c Tue Jun 23 15:08:19 1998
+++ Python/compile.c Tue Jun 23 14:39:14 1998
@@ -3313,10 +3313,15 @@ optimize(c)
case STORE_NAME:
case DELETE_NAME:
case IMPORT_FROM:
com_addlocal_o(c, GETNAMEOBJ(oparg));

break;
+ case LOAD_NAME:
+ name = GETNAMEOBJ(oparg);
+ if (PyDict_GetItem(c->c_globals, name) == NULL)
+ com_addlocal_o(c, GETNAMEOBJ(oparg));
+ break;
case EXEC_STMT:
c->c_flags &= ~CO_OPTIMIZED;
break;
}
}

--- Python/orig/ceval.c Tue Jun 23 15:08:19 1998
+++ Python/ceval.c Tue Jun 23 14:40:25 1998
@@ -1318,20 +1318,28 @@ eval_code2(co, globals, locals,
PUSH(x);
break;

case LOAD_FAST:
x = GETLOCAL(oparg);

Will Ware

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

M.-A. Lemburg (m...@lemburg.com) wrote:
: Guido van Rossum wrote:
: > > Are there any plans for such a consortium ?
: > Yes...
: This is what I don't like about consortiums: to get in you'll
: have to pay lots of money ... paying members will always have more
: influence than non-paying ones.
: [propose some sort of Linux-like semi-democratic contribution process]
: ... rather than expecting serious financial commitment, a

: serious work commitment would probably result in much better results.

In a perfect world, people could make decent livings writing free
software. But it's not at all obvious how to generate money from
this activity that could pay developers' salaries. Any experiment
that attempts to do so is a worthwhile exercise. There are a few
successful business models (O'Reilly, Red Hat, Cygnus) but none of
them are wildly insanely profitable. I think many programmers who
admire free software (myself definitely among them) would love to
see somebody get really offensively wealthy in the process of
writing/managing/distributing/whatever free software.

It's interesting (but of dubious utility) to think about scenarios
where things could go wrong. The one that jumps right to mind is
where one consortium member with deep pockets makes Guido an offer
he can't refuse to drag Python in some wierdly application-specific
direction, e.g. making it the world's most carefully-tuned microwave
oven control language.
--
Will Ware email: wware[at]world[dot]std[dot]com
PGP fp (new key 07/15/97) 67683AE2 173FE781 A0D99636 0EAE6117
God: "Sum id quod sum." Descartes: "Cogito ergo sum."
Popeye: "Sum id quod sum et id totum est quod sum."

Guido van Rossum

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

> Skip Montanaro has something along those lines...
[...]

> This is pretty much what Skip's optimizer does.

So why isn't anyone else interested in this? Why isn't there a horde
of volunteers working on the Ultimate Python Optimizer? Because it
isn't glamorous enough? Because Python is fast enough? I'd happily
add LOAD_FAST_TWO and other interesting instructions that the
optimizers would really need, if an external optimizer came into
existence.

> BTW: Even though single performance patches may well lie within the
> 10% grey zone, applying several of them may still lead the way into
> safe water, e.g. after applying all my patches to 1.5, I get
> an interpreter that clocks at around 4150 pystones compared to
> 3350 for the original 1.5.1 code.

But how do you know that there isn't one patch that has no effect or
that has an adverse effect...?

(PS, I knew about the needed change in magic, just didn't want to
present a patch that looked too "perfect". Glad you figured it out.)

Richard Jones

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

Guido van Rossum wrote:
> > Skip Montanaro has something along those lines...
> [...]
> > This is pretty much what Skip's optimizer does.
>
> So why isn't anyone else interested in this? Why isn't there a horde
> of volunteers working on the Ultimate Python Optimizer? Because it
> isn't glamorous enough? Because Python is fast enough? I'd happily
> add LOAD_FAST_TWO and other interesting instructions that the
> optimizers would really need, if an external optimizer came into
> existence.

Speaking entirely for myself, I'm not working on the optimisation stuff
becuase:

a. I have too many other projects that I'm working on,
b. I can only really work on Python stuff at work [and even there
it's not officially endorsed :(], and
c. It's all black magic anyway :)

Though it does sound VERY interesting...


Richard

Guido van Rossum

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

> > Guiding development is not the problem that a consortium
> > attempts to solve; that's fairly easily done by newsgroup and mailing
> > list chatter. The problem is actually implementing the new features.
>
> But that's exactly what I'm aiming at: let people contribute code
> and ideas. If a company wants Unicode, let *them* contract a programmer
> to have it implemented.

And keep it proprietary? The consortium agreement guarantees that all
work done by the consortium will be made public. This also applies to
work done visiting engineers for the consortium.

> I know ;-) ...
> http://starship.skyport.net/~lemburg/CoercionProposal.html
> http://starship.skyport.net/~lemburg/CallableMethodsProposal.html
> http://starship.skyport.net/~lemburg/mxPython-1.5.patch.gz
>
> The problem with these patches is that adapting them to each new
> available Python (sub-)version simply takes too long.

No, the real problem with these patches is that they are not mature
enough to be included.

> If there were a clear way to vote for inclusion the development
> could be sped up. Guido could then spend more time evaluating ideas
> (though I figure he has just as much fun coding new ideas as we do
> ;-) and designing Python's future. Well, my two cents, anyway.

Python is not a democracy. (Nor is Linux, or Perl, or Tcl, or Apache
by the way.)

Tim Peters

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

[Marc-Andre]

> BTW: Even though single performance patches may well lie within the
> 10% grey zone, applying several of them may still lead the way into
> safe water, e.g. after applying all my patches to 1.5, I get
> an interpreter that clocks at around 4150 pystones compared to
> 3350 for the original 1.5.1 code.

[Guido]


> But how do you know that there isn't one patch that has no effect or
> that has an adverse effect...?

I think the only answer to that is faith based on best-effort analysis.

About a year and a half ago I wrote up what may be the only truly useful
advice I'll ever manage to pass on. But it was in email, so millions of
optimizer wannabes were denied its rich blessing <ahem>. Here, then, is
Timmy's Lost Legacy:

<<<BEGIN SNIPPETS FROM OLD EMAIL>>>

Best general advice I have for portable low-level optimization:

Step 1: reduce the operation count!
Step 2: goto Step 1

Make that trip often enough and Good Things will happen, albeit in bursts.
Along the way, & under any given HW+SW combination, timing tests will
sometimes show no improvements or even regressions. That's just Satan
trying to tempt you off the path! Pay no attention. If you know you're
doing one less add, you are, and getting rid of one more subtract 2 weeks
later will suddenly make it appear worthwhile ...

...

Well, for potential speedups in the ~<= 5% range, I don't think it helps to
run timing tests at all, provided you're sure you're reducing the operation
count. Getting rid of a compare, or a jump, or whatever, simply can't hurt!
But it can fool the compiler du jour on the architecture du jour into
generating slightly worse code, for any number of bad reasons. But those
reasons will go away on another machine, or another release of the compiler
on the same machine, or very often mysteriously as a side-effect of making
some other trivial code change. You can't always understand this game (with
reasonable effort), but it's easy to win it over time <0.5 wink>. [ed note:
today I would remove the wink]

In fancier cases (say, interning all strings -- what's *that* all about
<wink>?!), you may have no idea whether the overall operation count will
decrease, and then timing is your only hope of getting an answer. But
timing is so frustrating when looking at small & "obvious" speedups
(especially, indeed, with those mail daemons ...), that about all you can
really get out of timing is a gross sanity check.

...

>> Anyway, an old odd truth is that if you paste together ten 1%
>> improvements, it usually does more good than finding one 10%
>> improvement.

[but sometimes a 10% improvement is so much easier to get!]

Pick the cherries, sure! I don't want you to give up the 10%s.

There are the "state lottery" and "penny saved is a penny earned" schools of
optimization: both have their place, but everyone tries to win the lottery,
few succeed, and it's the folks who develop a liking for pinching pennies
who consistently get richer year after year.

...

> [a mysterious slowdown with WITH_THREAD *dis*abled]

We should track one of these down to the bitter end just to disabuse you of
your prejudice that the world should make sense <snort>. One plausible
scenario: with the WITH_THREAD code missing, there are fewer external
calls, which in turn puts less pressure on the register set, which in turn
causes the optimizer to try to carry more things in registers, which in turn
screws up by kicking something truly valuable out of a register in order to
carry 4 new related things (but less valuable things) in registers. Or
removing the WITH_THREAD code suddenly puts the top of the loop at a less
favorable position with respect to the I cache (e.g., maybe it now maps to
the same line as sigcheck <wink -- but I've seen stranger things happen!>).
Or ... modern architectures are too damn complex to outguess reliably.

...

I don't know enough about the specifics of Python's name resolution
strategies to say anything useful here -- except that it's obvious it could
be a lot faster <grin>.

<<<END SNIPPETS FROM OLD EMAIL>>>


And that wickedly clever little patch of Guido's *did* make it faster --
even if some brainless stopwatch insists it runs slower.

wouldn't-go-so-far-as-to-say-"*especially*-if"-though<wink>-ly y'rs - tim

Tim Peters

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

[Guido]

> Or how about optimizing instance attribute access...
>
> One approach with a much wider applicability would be to have a
> super-optimized dictionary subclass that only takes string arguments.
> (JPython uses this.)

Note too that Donald Beaudry raised the notion of a super-optimized dict
specialized to *interned* strings (i.e., that need only look at addresses).
Some interesting tradeoffs there!

mostly-just-encouraging-don-to-come-out-of-hiding-again<wink>-ly y'rs - tim

Tim Peters

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

[Guido van Rossum]
> ...

> An unanticipated problem with the total set of changes is that it
> affects locals(). But I still wonder if locals() really needs to
> exist: most postings that reference it seem to focus on some
> misunderstanding about it! It is really a remnant of a more dynamic
> language than Python needs to be.

Suggest a different approach: first decide how dynamic Python needs to be,
then deduce what follows from that <0.5 wink>.

But since we're going to do it your way <grin>, I'll just add that the only
thing I use locals() for that I'd really miss is in conjuction with string-%
(like

print "i %(i)d j %(j)d" % locals()

). Suspect that's pretty common, too.

> While I'm writing this, I'm timing my code. Unfortunately, it seems
> to slow down rather than speed up! E.g., pystone goes down from 2257
> to 1996 for this particular machine.

That settles it for me: introduces a peculiar form of dynamic scoping
(cleverly sold as a "bug fix") *and* slows things down? I'll take two
<wink>.

> This may have to do with the larger volume of memory set aside for
> locals, without any benefit in functions that only look up said globals
> once (or worse, less than once on average). Or it may have to do with
> the dreaded cache lines -- where random code rearrangements may change
> the performance up or down by 10%!

I have not tried the patch yet (too, too swamped! within a few days,
though -- has anyone else tried it under Win95/Pentium/MSVC5?).

Offhand it seems the amount of extra memory should be trivial -- there must
be fewer than 40 global names in the whole program, and most functions
access only a few of them. Looks certain to be well under a few extra Kb no
matter how I exaggerate the possibilities.

Would be interesting to figure out how many global accesses *are* "one
shot" -- I recall that pystone did a lot of LOAD_GLOBALs (proportionally
more than any other program I ever looked at), but I don't see a lot of
"multi shots" in eyeballing the code (most times it looks like a global is
accessed just once per call).

The interesting thing there is that one-shots actually should be slower
after the patch, because they incur a GETLOCAL and failing x != NULL
test-and-jump sequence they didn't have to pay before. But that's the only
hope I see so far of finding a "good reason" for the slowdown -- and it's
not all that promising <0.5 wink>.

Ah ... am also worried that the caching *may* be semantically incorrect in
pystone, and so causing it to do more stuff than before. For example, Proc0
calls Proc4, which in turn assigns to the global Char2Glob -- but Proc0 also
*references* Char2Glob inside a loop and will use its *cached* value
(Char2Glob is a "const global" so far as Proc0 is concerned). In this
specific case it looks like the cached Char2Glob value will be correct
anyway, but more by accident than anything else. Maybe some other spot
isn't getting away with it. Perhaps you or M-A could conveniently count the
total number of opcodes executed before and after (they should be identical!
if they differ, the semantics changed)?

too-early-to-panic<wink>-ly y'rs - tim

M.-A. Lemburg

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

Guido van Rossum wrote:
>
> > > Guiding development is not the problem that a consortium
> > > attempts to solve; that's fairly easily done by newsgroup and mailing
> > > list chatter. The problem is actually implementing the new features.
> >
> > But that's exactly what I'm aiming at: let people contribute code
> > and ideas. If a company wants Unicode, let *them* contract a programmer
> > to have it implemented.
>
> And keep it proprietary? The consortium agreement guarantees that all
> work done by the consortium will be made public. This also applies to
> work done visiting engineers for the consortium.

No, let the company pay the programmer, not the consortium, and then
have the outcome returned to the community. This is what e.g. SuSE
in Germany is doing for X display drivers: they hire engineers to do
the work and then return the code under GPL to the public.

> > I know ;-) ...
> > http://starship.skyport.net/~lemburg/CoercionProposal.html
> > http://starship.skyport.net/~lemburg/CallableMethodsProposal.html
> > http://starship.skyport.net/~lemburg/mxPython-1.5.patch.gz
> >
> > The problem with these patches is that adapting them to each new
> > available Python (sub-)version simply takes too long.
>
> No, the real problem with these patches is that they are not mature
> enough to be included.

Would be nice to hear some details about the maturity level then :-).
I'd be happy to make the necessary changes, but without feedback I
don't see any way of getting near the "maturity level" you have in mind.

> > If there were a clear way to vote for inclusion the development
> > could be sped up. Guido could then spend more time evaluating ideas
> > (though I figure he has just as much fun coding new ideas as we do
> > ;-) and designing Python's future. Well, my two cents, anyway.
>
> Python is not a democracy. (Nor is Linux, or Perl, or Tcl, or Apache
> by the way.)

I didn't propose a democracy. To the contrary: I think that the
"One Man -- Cast Of Thousands" work model is a very efficient one.
The problem is getting feedback... topics like the two proposals above
are of a highly technical nature and probably bore most people
on this list.

--
Marc-Andre Lemburg Y2000: 555 days left

M.-A. Lemburg

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

Guido van Rossum wrote:
>
> > Skip Montanaro has something along those lines...
> [...]

> > This is pretty much what Skip's optimizer does.
>
> So why isn't anyone else interested in this? Why isn't there a horde
> of volunteers working on the Ultimate Python Optimizer? Because it
> isn't glamorous enough? Because Python is fast enough? I'd happily
> add LOAD_FAST_TWO and other interesting instructions that the
> optimizers would really need, if an external optimizer came into
> existence.

Probably simply because people didn't know about it... now they do ;-)

> > BTW: Even though single performance patches may well lie within the
> > 10% grey zone, applying several of them may still lead the way into
> > safe water, e.g. after applying all my patches to 1.5, I get
> > an interpreter that clocks at around 4150 pystones compared to
> > 3350 for the original 1.5.1 code.
>

> But how do you know that there isn't one patch that has no effect or
> that has an adverse effect...?

Every single patch passed the speedup test on my machine (even though
they were gaining only 2-5%). Some of the patches are obscure (like
reformatting the mainloop) and probably don't show much effect on
other machines, but there are also some other ideas in there, like
enhancing allocation of small dictionaries, inling attribute lookup
and builtin function calls, getting rid off some PyErr_Clear()s, adding
a few more macros (eliminating C function calls) etc. IMHO, all of
these pass the Tim-Test (that he explained in his other posting).

Anyway, this discussion is getting a little too heated up for my taste.
Let's get back to the subject...

How do you plan to enable the new functionality with an interpreter
flag ? Wouldn't that only work with an extra opcode ?

--
Marc-Andre Lemburg Y2000: 555 days left

M.-A. Lemburg

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

Tim Peters wrote:
>
> Ah ... am also worried that the caching *may* be semantically incorrect in
> pystone, and so causing it to do more stuff than before. For example, Proc0
> calls Proc4, which in turn assigns to the global Char2Glob -- but Proc0 also
> *references* Char2Glob inside a loop and will use its *cached* value
> (Char2Glob is a "const global" so far as Proc0 is concerned). In this
> specific case it looks like the cached Char2Glob value will be correct
> anyway, but more by accident than anything else. Maybe some other spot
> isn't getting away with it. Perhaps you or M-A could conveniently count the
> total number of opcodes executed before and after (they should be identical!
> if they differ, the semantics changed)?

There you go...

Original 1.5.1 running ./python -O Lib/test/pystone.py

Opcode frequencies:
------------------------------------------------------------------------
LOAD_FAST (124) : 1439748 ================================
LOAD_GLOBAL (116) : 539884 ===========
LOAD_CONST (100) : 470152 ==========
STORE_FAST (125) : 449908 =========
POP_TOP ( 1) : 289967 ======
JUMP_IF_FALSE (111) : 269952 =====
COMPARE_OP (106) : 249961 =====
CALL_FUNCTION (131) : 219994 ====
BINARY_ADD ( 23) : 179966 ===
RETURN_VALUE ( 83) : 170052 ===
STORE_ATTR ( 95) : 159972 ===
LOAD_ATTR (105) : 139987 ===
BINARY_SUBSCR ( 25) : 99976 ==
JUMP_ABSOLUTE (113) : 79991 =
STORE_SUBSCR ( 60) : 69984 =
BINARY_SUBTRACT ( 24) : 69981 =
STORE_GLOBAL ( 97) : 59988 =
FOR_LOOP (114) : 50006 =
JUMP_FORWARD (110) : 49997 =
SETUP_LOOP (120) : 49995 =
POP_BLOCK ( 87) : 40001
BINARY_MULTIPLY ( 20) : 29994
UNARY_NOT ( 12) : 19995
JUMP_IF_TRUE (112) : 10003
DUP_TOP ( 4) : 10002
BREAK_LOOP ( 80) : 9997
BINARY_DIVIDE ( 21) : 9997
STORE_NAME ( 90) : 194
MAKE_FUNCTION (132) : 109
LOAD_NAME (101) : 69
SLICE ( 30) : 51
BUILD_TUPLE (102) : 27
BUILD_CLASS ( 89) : 26
LOAD_LOCALS ( 82) : 26
IMPORT_NAME (107) : 11
SETUP_EXCEPT (121) : 7
32 ( 32) : 6
31 ( 31) : 6
UNARY_NEGATIVE ( 11) : 6
BUILD_LIST (103) : 4
IMPORT_FROM (108) : 3
DELETE_NAME ( 91) : 2
BUILD_MAP (104) : 1
UNPACK_LIST ( 93) : 1
BINARY_AND ( 64) : 1
------------------------------------------------------------------------

With globopt patch running ./python -O Lib/test/pystone.py

Opcode frequencies:
------------------------------------------------------------------------
LOAD_FAST (124) : 1909644 ================================
LOAD_CONST (100) : 470152 =======
STORE_FAST (125) : 449908 =======
POP_TOP ( 1) : 289967 ====
JUMP_IF_FALSE (111) : 269952 ====
COMPARE_OP (106) : 249961 ====
CALL_FUNCTION (131) : 219994 ===
BINARY_ADD ( 23) : 179966 ===
RETURN_VALUE ( 83) : 170052 ==
STORE_ATTR ( 95) : 159972 ==
LOAD_ATTR (105) : 139987 ==
BINARY_SUBSCR ( 25) : 99976 =
JUMP_ABSOLUTE (113) : 79991 =
LOAD_GLOBAL (116) : 69988 =
STORE_SUBSCR ( 60) : 69984 =
BINARY_SUBTRACT ( 24) : 69981 =
STORE_GLOBAL ( 97) : 59988 =
FOR_LOOP (114) : 50006
JUMP_FORWARD (110) : 49997
SETUP_LOOP (120) : 49995
POP_BLOCK ( 87) : 40001
BINARY_MULTIPLY ( 20) : 29994
UNARY_NOT ( 12) : 19995
JUMP_IF_TRUE (112) : 10003
DUP_TOP ( 4) : 10002
BREAK_LOOP ( 80) : 9997
BINARY_DIVIDE ( 21) : 9997
STORE_NAME ( 90) : 194
MAKE_FUNCTION (132) : 109
LOAD_NAME (101) : 69
SLICE ( 30) : 51
BUILD_TUPLE (102) : 27
BUILD_CLASS ( 89) : 26
LOAD_LOCALS ( 82) : 26
IMPORT_NAME (107) : 11
SETUP_EXCEPT (121) : 7
32 ( 32) : 6
31 ( 31) : 6
UNARY_NEGATIVE ( 11) : 6
BUILD_LIST (103) : 4
IMPORT_FROM (108) : 3
DELETE_NAME ( 91) : 2
BUILD_MAP (104) : 1
UNPACK_LIST ( 93) : 1
BINARY_AND ( 64) : 1
------------------------------------------------------------------------

I'll leave the analysis to you ;-) You're much better at that anyways.

hin...@cnrs-orleans.fr

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

wware-...@world.std.com (Will Ware) writes:

> In a perfect world, people could make decent livings writing free
> software. But it's not at all obvious how to generate money from
> this activity that could pay developers' salaries. Any experiment

There is one model that has worked quite well for many scientific
packages: they are written by people who have jobs in publicly funded
research institutions, and produce public-domain code as part of
their work. A good example is LAPACK (and its predecessors LINPACK
and EISPACK), but there are many others.

One could argue that certain fundamental software should be provided
by public institutions as a public service, much as public
institutions provide roads and other infrastructure which a
private-enterprise business model could not produce. There are
different opinions around the world as to what constitutes an
essential public service, but as far as I know software has never been
a candidate. Nevertheless, I think competition in the computer
business would be much healthier if operating systems were provided
free of charge by public institutions, rather than by Microsoft. And
programming tools, being of little use to end users but an important
infrastructure for the whole industry, would be potential candidates
as well.
--
-------------------------------------------------------------------------------
Konrad Hinsen | E-Mail: hin...@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron | Fax: +33-2.38.63.15.17
45071 Orleans Cedex 2 | Deutsch/Esperanto/English/
France | Nederlands/Francais
-------------------------------------------------------------------------------

Vladimir Marangozov

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to Guido van Rossum

>
> [M.-A. Lemburg]

> > Are there any plans for such a consortium ?
>

[Guido]
>
> Yes. It was discussed ...
> ... Details have to be vague because prospective consortium


> members don't want to be embarrassed by premature publicity,
> so please don't ask about who will join.

Although Guido reminded some of the public plans, note that there
are secret plans too. I can safely reveal one of them, just to
warn you that unpredictable things may happen in the future!

A highly-optimized real-time Python implementation is currently
running a script in the background, which is expected to find the
"magic" combination for LOTO machines.

As soon as we get the *BIG* money, a very-high salary will be
proposed to Guido (CNRI's * 10) and his return back to Europe
will be organized. It is planned that he will benefit from
the confort of a wonderful Chateau, where he will enjoy living
and working peacefully on the Global Python Strategic Development.

[Guido]


>
> Actually, since I'm a single-CPU device, I would be more
> productive if I had fewer threads competing for me.

All this will be granted. And more. I am not allowed to reveal
the details though. Only one obvious thing: it has been reported
that Guido nearly reached his maximum productivity level and he's
still doing very well with the amount of tasks he has to deal with.
Actually, all this happens in America, where his power supply is
tuned to 50Hz. In Europe, the standard is 60Hz, and one single
thread will be scheduled in the Chateau: Python. Thus, we expect
yet another improvement in Guido's productivity.

This is only one of the secret plans, among many others.
You have been warned! ;-}

P.S. Just to reassure the community: all plans include
an explicit clause related to safety and liveness properties.
Access to public transport units and places will be prevented
(busses, bus stations, trams, trolls and the like).

P.P.S. If you think it is still possible to make alternate plans,
like offering a salary superior to the money earned from the LOTO,
you must know that long time ago, judicious people here have started
to collect every "two cents" that have been so generously thrown in
c.l.py. By now, this is a fortune that Guido can and will enjoy!!!
--
Vladimir MARANGOZOV | Vladimir....@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

Guido van Rossum

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

> Every single patch passed the speedup test on my machine (even though
> they were gaining only 2-5%). Some of the patches are obscure (like
> reformatting the mainloop) and probably don't show much effect on
> other machines, but there are also some other ideas in there, like
> enhancing allocation of small dictionaries, inling attribute lookup
> and builtin function calls, getting rid off some PyErr_Clear()s, adding
> a few more macros (eliminating C function calls) etc. IMHO, all of
> these pass the Tim-Test (that he explained in his other posting).

I agree that most of those changes are fine. I'm still reluctant to
do too many of them if they cause the code to become less readable --
at some point I get worried that the code becomes unmaintainable or so
fragile that changes can break it too easily. Therefore I'll want to
have a veto over which things to accept.

But I was thinking of putting LOAD_FAST in front of the switch.
Whether this is a win or not depends on how good the compiler is at
generating the switch code, as well as on the frequency of LOAD_FAST
opcodes. It can easily go either way. For me, this one probably
isn't worth the added irregularity.

> Anyway, this discussion is getting a little too heated up for my taste.
> Let's get back to the subject...
>
> How do you plan to enable the new functionality with an interpreter
> flag ? Wouldn't that only work with an extra opcode ?

Sorry, I've lost the context. What new functionality? I don't recall
mentioning a compiler flag.

Guido van Rossum

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

> No, let the company pay the programmer, not the consortium, and then
> have the outcome returned to the community. This is what e.g. SuSE
> in Germany is doing for X display drivers: they hire engineers to do
> the work and then return the code under GPL to the public.

There's more than one business model, and every situation is different.

> Would be nice to hear some details about the maturity level then :-).
> I'd be happy to make the necessary changes, but without feedback I
> don't see any way of getting near the "maturity level" you have in mind.

In a separate message.

> I didn't propose a democracy. To the contrary: I think that the
> "One Man -- Cast Of Thousands" work model is a very efficient one.
> The problem is getting feedback... topics like the two proposals above
> are of a highly technical nature and probably bore most people
> on this list.

That includes me, at times :-)

Guido van Rossum

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

> Suggest a different approach: first decide how dynamic Python needs to be,
> then deduce what follows from that <0.5 wink>.

I think it ought to be as dynamic as it is, and I would rather not
have to mess with this kind of optimization. Besides, so far it
sounds like even if I replaced *all* dict lookups with indexing
operations, it wouldn't speed things up by more than 15-20%. That's a
lot for some of you, but not enough to make a difference in language
comparisons, and I think it's not worth to move from a simple,
easy-to-understand dictionary model to a messy dlict implementation.

> But since we're going to do it your way <grin>, I'll just add that the only
> thing I use locals() for that I'd really miss is in conjuction with string-%
> (like
>
> print "i %(i)d j %(j)d" % locals()
>
> ). Suspect that's pretty common, too.

Good think you mentioned that. To be even more useful, locals()
shouldn't return a dictionary then, but something that does a name
lookup in the current scope, using the standard
locals-first-then-globals rule. That would make it clear that it's
read-only (which the manual already warns about).

> The interesting thing there is that one-shots actually should be slower
> after the patch, because they incur a GETLOCAL and failing x != NULL
> test-and-jump sequence they didn't have to pay before. But that's the only
> hope I see so far of finding a "good reason" for the slowdown -- and it's
> not all that promising <0.5 wink>.

That was my theory why pystone was slowed down. I stick to it for now.

> Ah ... am also worried that the caching *may* be semantically incorrect in
> pystone, and so causing it to do more stuff than before. For example, Proc0
> calls Proc4, which in turn assigns to the global Char2Glob -- but Proc0 also
> *references* Char2Glob inside a loop and will use its *cached* value
> (Char2Glob is a "const global" so far as Proc0 is concerned). In this
> specific case it looks like the cached Char2Glob value will be correct
> anyway, but more by accident than anything else. Maybe some other spot
> isn't getting away with it. Perhaps you or M-A could conveniently count the
> total number of opcodes executed before and after (they should be identical!
> if they differ, the semantics changed)?

Thanks to Marc-Andre, we now know not to worry about this one.

sk...@calendar.com

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

In article <3590213C...@lemburg.com>,
"M.-A. Lemburg" <m...@lemburg.com> wrote:

> The problem with these patches is that adapting them to each new

> available Python (sub-)version simply takes too long. If there


> were a clear way to vote for inclusion the development could be
> sped up. Guido could then spend more time evaluating ideas (though I
> figure he has just as much fun coding new ideas as we do ;-) and
> designing Python's future. Well, my two cents, anyway.

I think I prefer the current situation where Guido decides what is and isn't
included in the language. I really like the idea that Guido's there with the
will to put the brakes on things. (I remember when C++ first saw the light
of day lots of people worried about how monstrous Ada was going to be to
implement thought C++ was a much better (smaller, easier to grasp, easier to
implement) alternative. I think the tables may have turned on that
particular pair of languages...) Having consortium members vote on inclusion
of features seems to me like a good way to follow in the footsteps of X, Ada
or C++. Perhaps I misunderstand the use of the term "vote for inclusion"
however.

Where I think Guido could probably use more help is in maintaining the
existing distribution (testing submitted bug fixes to make sure they work on
all platforms, writing test cases, updating documentation, etc.) That way
Guido can focus more on Python n.m+1 and Python n+1. For that sort of thing
I'm not sure you need a consortium.

--
Skip Montanaro
(sk...@calendar.com)

-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/ Now offering spam-free web-based newsreading

sk...@calendar.com

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

[Guido]

> It's about time that someone started work on a framework for a real

> Python bytecode optimizer... This shouldn't be particularly hard, and


> might ve *very* fruitful. Search Deja News for old posts by Tim
> Peters for suggestions on how to proceed.

I still have much of my byte code optimizer laying about. It doesn't
currently work -- I blew away a couple files during a previous Python upgrade
:-( -- but I doubt it would take much effort to get it going again. It uses
Tim's model of making a bunch of passes over the code, one per optimization.
Each optimization is a separately instantiated class. Most optimizations are
just a few lines of Python code.

sk...@calendar.com

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

In article <3590C964...@lemburg.com>,
"M.-A. Lemburg" <m...@lemburg.com> wrote:

> Probably simply because people didn't know about it... now they do ;-)

Oh, no... I guess I'll really have to get it working again...

Watch this space...

Skip

sk...@calendar.com

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

[Guido]

> Besides, so far it
> sounds like even if I replaced *all* dict lookups with indexing
> operations, it wouldn't speed things up by more than 15-20%. That's a
> lot for some of you, but not enough to make a difference in language
> comparisons,

So, perhaps our optimization efforts should be redirected back to
enhancements to Python (optional or otherwise) that would make Python->C
conversion much more worthwhile, performance-wise. Wasn't there something in
the Perl/Python thread about needing an order of magnitude improvement to
cause people to switch from whatever has market/mind share to something new?
If generating .o files was as easy (or nearly as easy) as generating .pyc
files and they were an order of magnitude faster when it mattered (however
you define that), Python would have a very nice advantage.

On the other hand, at what rate is Perl or Tcl performance improving? (They
are the "competition" if you want to do language comparisons, right?) What
happened to the Perl->C effort? Did it yield fruit or was the speedup not
worth the effort?

--
Skip Montanaro
(sk...@calendar.com)

Terry Reedy

unread,
Jun 24, 1998, 3:00:00 AM6/24/98
to

In article <1998062403...@eric.CNRI.Reston.Va.US>,
gu...@CNRI.Reston.Va.US says...

>And keep it proprietary? The consortium agreement guarantees that all
>work done by the consortium will be made public. This also applies to
>work done visiting engineers for the consortium.

Sounds good. Six months after sometime is a lot sooner than never.


Guido van Rossum

unread,
Jun 25, 1998, 3:00:00 AM6/25/98
to

> > Probably simply because people didn't know about it... now they do ;-)
>
> Oh, no... I guess I'll really have to get it working again...

Can you make it to the conference? This sounds like it would make a
perfect conference paper submission...

Tim Peters

unread,
Jun 25, 1998, 3:00:00 AM6/25/98
to

[tim]

> Suggest a different approach: first decide how dynamic Python
> needs to be, then deduce what follows from that <0.5 wink>.

[Guido]


> I think it ought to be as dynamic as it is, and I would rather not
> have to mess with this kind of optimization.

I certainly believe the latter. I'm skeptical that you mean the former,
though: you've *often* pondered (even without anyone nagging you <wink>)
whether it might not be an improvement to clamp down on how free other
modules are to reach into the current module and change its attributes
without notice.

This isn't about your patch or my head-dlicts either: the semantics of the
language in this area put severe constraints on what anyone can ever do in
the way of optimizing for Python, even given major resources to tackle it.
Right now, it's as if every non-local in a module were declared extern in C,
so that every call site must assume that every non-local is killed by the
call (and where "call site" includes mere attribute access too). A faithful
module-level optimizer can't even look at g in

def f():
return 3

def g(x):
while len(x) > 7:
x = x[:-f()]
return x

and conclude "ah, the call to f will have no effect on len!", because for
all it can tell the call to len may change the binding of f!

Now call me telepathic <wink>, but I don't think you ever had deliberate
intent to design a language *that* dynamic. It simply fell out of the
implementation. It would (IMO) be good to back off from this extreme --
which can be done by changing words in the Ref Manual alone.

> Besides, so far it sounds like even if I replaced *all* dict lookups
> with indexing operations, it wouldn't speed things up by more than
> 15-20%.

I wouldn't be surprised, but you have to understand that I have, at times,
been assigned to work 3 months full-time to get 5%. So I see "15-20%" and
think "wow! that's a year's salary!" <0.9 wink>. I'm also more concerned
about the constraining effects of the current semantics on all conceivable
future optimizations, not so much just name resolution (although 15-20% is
nothing to sneeze at), and really not at all for 1.5.2.

BTW, does your "*all* dict lookups" include resolving instance attributes &
methods? Offhand I'd bet 20% would be at the low end for what replacing
dict-chain lookups with straight indexing could do for heavily OO Python
programs.

BTW 2, you and I have both stated publically that we expected an ambitious
peephole optimizer to max out in the 20% range too. If you go on to poo-poo
speedups in that range, it might answer your question about why nobody is
lining up to take over Skip's fine start <wink>. Seriously, if you pile up
enough 20% improvements, you can get as much as a 25% speedup in the end
<grin>.

> That's a lot for some of you, but not enough to make a difference

> in language comparisons, and I think it's not worth to move from


> a simple, easy-to-understand dictionary model to a messy dlict
> implementation.

But the intent of the dlict was to reproduce current semantics exactly. It
needed more work to achieve that (most glaringly it didn't "pass along"
changes made to the builtins), but that was its goal: leave *users'* simple
& ultra-dynamic dictionary model entirely alone. Something worked before,
it works after: no visible difference.

"Faithful" optimizations often get messy in the implementation; fact o'
life; users don't care; I'm a Python user <0.7 wink>.

>> print "i %(i)d j %(j)d" % locals()

> Good think you mentioned that. To be even more useful, locals()


> shouldn't return a dictionary then, but something that does a name
> lookup in the current scope, using the standard locals-first-then-
> globals rule. That would make it clear that it's read-only (which
> the manual already warns about).

Cool idea! I'm unclear on why you want to change locals() now, though. But
if you do, should no-argument vars() change in tandem?

Long ago there were threads arguing for "dict merge" syntax so that people
could pass the union of locals() and globals() to %, so at least in the dim
past the functionality was desired by several. A mapping object that did
the whole locals->globals->builtin bit would be much better than that -- but
isn't "locals()" a surprising name for such a beast?

>> ... worried that caching *may* be semantically incorrect in pystone

> Thanks to Marc-Andre, we now know not to worry about this one.

Oh yes! I'm grateful to M-A for that too (thanks, Marc-Andre!). And I'm
sure it's just coincidence that his figures showed that LOAD_GLOBAL
accounted for about 10% of all opcodes, while your figures showed pystone
slowing down about 10% <wink>.

shoulda-beena-numerologist-ly y'rs - tim

Chris Wright

unread,
Jun 25, 1998, 3:00:00 AM6/25/98
to

Could I add the view of one not developer (in the sense of not contributing
to the development of the internals of python).

I would be very wary of "voting for the inclusion" of X.

I DON'T want "There's more than one way to do it". I don't want feature
accretion.
I know there are arguments for all the proposals that various people put up.
I'm not usually familiar enough with the implications of the arguments to
have an informed view. So, I proxy to Guido. Who always gets it right anyway
(by definition)

I can also sympathise with those who do wonderful work that they believe
will enhance the language, and then find the incorporation and/or adoption
being delayed or denied. I'm sure that many (?most) of these proposals have
great merit, and won't break any of my code. (Can't remember the last time I
used meta-programming in python ;-) [In prolog, it was Tuesday!])

So, I've got no idea what the right answer is. I'm just expressing the view
of an "end-user" level of programmer who uses python a fair bit, and wan't
to see it stay small.

cheers

chris

>The problem with these patches is that adapting them to each new
>available Python (sub-)version simply takes too long. If there
>were a clear way to vote for inclusion the development could be
>sped up. Guido could then spend more time evaluating ideas (though I
>figure he has just as much fun coding new ideas as we do ;-) and
>designing Python's future. Well, my two cents, anyway.
>

>> As in most free software projects,
>> contributing running code will probably continue to be a very
>> effective low-cost way of affecting Python's development.
>

>--
>Marc-Andre Lemburg Y2000: 556 days left

Guido van Rossum

unread,
Jun 25, 1998, 3:00:00 AM6/25/98
to

[Tim has a whole bunch to say.]

I can't read as fast as you can type (and I type far slower than I
read), so I'll only address a few things.

> Now call me telepathic <wink>, but I don't think you ever had deliberate
> intent to design a language *that* dynamic. It simply fell out of the
> implementation. It would (IMO) be good to back off from this extreme --
> which can be done by changing words in the Ref Manual alone.

Not sure. The very dynamic nature makes some interesting things
possible. We're still thinking about optional static typing (perhaps
another round of discussion will be scheduled at the developers' day)
and perhaps that would be a better way to reduce the language's
dynamicism, rather than altering the reference manual (which nobody
reads) and then later changing the semantics, and claiming that
anybody who was using the previous semantics didn't read the manual...

> BTW, does your "*all* dict lookups" include resolving instance attributes &
> methods? Offhand I'd bet 20% would be at the low end for what replacing
> dict-chain lookups with straight indexing could do for heavily OO Python
> programs.

I've just started thinking along those lines. The assumption that
self is an instance of the current class (or a subclass) might make
this possible.

> BTW 2, you and I have both stated publically that we expected an ambitious
> peephole optimizer to max out in the 20% range too. If you go on to poo-poo
> speedups in that range, it might answer your question about why nobody is
> lining up to take over Skip's fine start <wink>. Seriously, if you pile up
> enough 20% improvements, you can get as much as a 25% speedup in the end
> <grin>.

I was mostly concerned about having to rewrite a vast amount of code
for only 20% improvement. (I don't get paid for this, you know :-)

> > Good think you mentioned that. To be even more useful, locals()
> > shouldn't return a dictionary then, but something that does a name
> > lookup in the current scope, using the standard locals-first-then-
> > globals rule. That would make it clear that it's read-only (which
> > the manual already warns about).

> Cool idea! I'm unclear on why you want to change locals() now,
> though.

Because I've needed this a few times: needed string formatting to
reference both locals and globals.

> But if you do, should no-argument vars() change in tandem?

Yes. Or perhaps vars() should be the only one changed. Or perhaps a
new function should be introduced, e.g. scope(). Just thinking aloud...

Dirk Heise

unread,
Jun 25, 1998, 3:00:00 AM6/25/98
to

> [Tim has a whole bunch to say.]
>
Guido answers:

> I can't read as fast as you can type (and I type far slower than I
> read), so I'll only address a few things.

He DOESN'T type - he's using superior technology! He SPEAKS!
We can't beat him!

Dirk Heise, Braunschweig, Germany
dhe...@metronet.de


Vladimir Marangozov

unread,
Jun 26, 1998, 3:00:00 AM6/26/98
to

> [Tim Peters, pushing hard with good arguments]


>
> > I wouldn't be surprised, but you have to understand that I have, at times,
> > been assigned to work 3 months full-time to get 5%. So I see "15-20%" and
> > think "wow! that's a year's salary!" <0.9 wink>.

> > ...


> > BTW 2, you and I have both stated publically that we expected an ambitious
> > peephole optimizer to max out in the 20% range too. If you go on to poo-poo
> > speedups in that range, it might answer your question about why nobody is
> > lining up to take over Skip's fine start <wink>. Seriously, if you pile up
> > enough 20% improvements, you can get as much as a 25% speedup in the end
> > <grin>.

> > ...


> > "Faithful" optimizations often get messy in the implementation; fact o'
> > life; users don't care; I'm a Python user <0.7 wink>.

[Guido, masking his real potential]


>
> I was mostly concerned about having to rewrite a vast amount of code
> for only 20% improvement. (I don't get paid for this, you know :-)

Don't worry about this. Just do it! ;-) Such change would be very rewarding,
in terms of new application domains Python would conquer if it was faster.
(or tell us how to do it and get back to you. People already expressed their
will to contribute to the sources -- better honor them at some point ;-)

Seriously, I'm tempted to forward the readership to a technical document
(of general interest) which discusses C program optimization. It has a nice
abstract, reminding that the optimized use of the available ressources
is a responsability. It's a good summary of some common practices & tricks.

http://www.ontek.com/mikey/current.html

Amit Patel

unread,
Jun 26, 1998, 3:00:00 AM6/26/98
to

Guido van Rossum <gu...@CNRI.Reston.Va.US> wrote:
| > Skip Montanaro has something along those lines...
| [...]
| > This is pretty much what Skip's optimizer does.
|
| So why isn't anyone else interested in this? Why isn't there a horde
| of volunteers working on the Ultimate Python Optimizer? Because it
| isn't glamorous enough? Because Python is fast enough? I'd happily
| add LOAD_FAST_TWO and other interesting instructions that the
| optimizers would really need, if an external optimizer came into
| existence.

Python really is fast enough for what I use it for. :-) Someone
could make my Python code run ten times as fast and I wouldn't be able
to tell.

I have one project (a game) where I need speed, and I use C++ for that.

- Amit

Tim Peters

unread,
Jun 27, 1998, 3:00:00 AM6/27/98
to

[tim]
> ... [pathologically dynamic example] ...

> Now call me telepathic <wink>, but I don't think you ever had
> deliberate intent to design a language *that* dynamic. It simply
> fell out of the implementation. It would (IMO) be good to back
> off from this extreme -- which can be done by changing words in
> the Ref Manual alone.

[guido]
> Not sure.

All right -- *that* I believe.

> The very dynamic nature makes some interesting things possible.

I suspect there's a pragmatic distinction to be made here, though. People
certainly accomplish cool things by exploiting the dynamic nature of class
and instance namespaces. But I'm not sure I've seen a good use of mucking
with another module's globals. Have one in mind?

At best it seems to lead to brittle design. Like regex's set/get_syntax --
and even that one doesn't allow an external module to reach into regex's
guts and change its globals *directly*. Restricting module attribute
modifications to be made via calls to a module's exported functions doesn't
hurt strong optimization: it's not rebinding that hurts optimization, it's
the optimizer's complete ignorance about when rebinding may occur (which
forces it to make worst-case assumptions at every call site, attribute
access, and operator symbol). IOW, it's not the dynamicism that hurts, it's
the "magical behind your back"ness of it.

> We're still thinking about optional static typing (perhaps
> another round of discussion will be scheduled at the developers' day)
> and perhaps that would be a better way to reduce the language's
> dynamicism,

Yes, when you *don't* have a hammer, it's *especially* tempting to view
every problem as a nail <wink>.

One useful thought occurs to me: the best hope for a "high performance"
Python in the foreseeable future probably lies with JPython, leveraging off
the millions of bucks being poured into fast JVMs. I know more about the
latter than I'm supposed to <wink>, and the intellectual capital being
devoted to it is staggering (e.g., they're arguably putting more effort into
it than I put into the 1.5.2 list.sort() <wink>). So I should learn more
about JPython & rethink it all from that view.

> rather than altering the reference manual (which nobody
> reads) and then later changing the semantics, and claiming that
> anybody who was using the previous semantics didn't read the manual...

That's inevitable, though. If Python survives, it will eventually be
standardized, and you can think about the semantics you want it to have
before then, or let some committee decide for you when the time comes.
While the first incarnation of a stds committee typically treats the
documents it inherits with great respect, anything left fuzzy is seized upon
as a chance for "creative clarification". Python is in no danger of being
standardized right now though, so I don't claim it's crucial to resolve this
now. Something to ponder in spare moments -- the ref manual eventually
determines the future of the language, whether end-users read it or not.

> > ... Offhand I'd bet 20% would be at the low end for what replacing


> > dict-chain lookups with straight indexing could do for heavily OO Python
> > programs.

> I've just started thinking along those lines. The assumption that
> self is an instance of the current class (or a subclass) might make
> this possible.

Yes, that definitely makes sense. You'll end up with two ways of getting at
an attribute: an index and a name. The two views need to be kept in synch.
This should sound familiar <0.9 wink>.

> > BTW 2, you and I have both stated publically that we expected
> > an ambitious peephole optimizer to max out in the 20% range too.
> > If you go on to poo-poo speedups in that range, it might answer
> > your question about why nobody is lining up to take over Skip's

> > fine start <wink>. ...

> I was mostly concerned about having to rewrite a vast amount of code
> for only 20% improvement. (I don't get paid for this, you know :-)

Sorry, my mistake: I didn't realize the people shooting for a 20%
improvement via the more-difficult peephole optimizer route *will* be
getting paid <ahem>.

In any case, if speeding OO attribute access competes with speeding global
access, it seems clear a priori the former is more valuable.

> [about introducing a mapping object encapsulating the current
> global->local->builtin name->value lookup -- change locals(),
> or vars(), or ... ?]

> Yes. Or perhaps vars() should be the only one changed. Or perhaps a
> new function should be introduced, e.g. scope(). Just thinking aloud...

I like a new function best (who could guess everything locals() and vars()
may be used for today? especially since they're probably being abused!).
scope() is suggestive without precluding a possible future generalization to
lexical nesting, so that's a good name. vars() is a friendlier name,
though, and really seems *natural* for this. Also just thinking aloud: if I
were you, I'd introduce scope(); but if I were me, I'd just change vars()
<wink>.

always-delighted-to-be-of-clarifying-service-ly y'rs - tim

Tim Peters

unread,
Jun 27, 1998, 3:00:00 AM6/27/98
to

[Amit Patel]

> Python really is fast enough for what I use it for. :-) Someone
> could make my Python code run ten times as fast and I wouldn't be able
> to tell.

You're not alone in that, Amit! One of the paradoxes of Python Life is that
for most of its history, the people best able to speed it up have been the
least interested in doing so. Guido in particular is not a speed freak at
heart, although for short periods he can be buffaloed into pulling off a
very credible impersonation of one <wink>. Perhaps if the Python Consortium
idea takes off, it could afford to fund a tiny group who really *love* doing
that kind of thing.

and-as-a-bonus-the-code-would-become-so-convoluted-guido-would-
*have*-to-move-on-to-python2-ly y'rs - tim

0 new messages