Other notes

bearoph...@lycos.com

unread,

Dec 28, 2004, 9:23:40 PM12/28/04

to

Here are some questions and suggestions of mine that I've collected in
the last weeks on the language (please note that some (most?) of them
are probably wrong/useless/silly, but I've seen that such notes help me
understand a lot of things and to find my weak spots.)

1) I've seen that list pop/append is amortised to its tail, but not for
its head. For this there is deque. But I think dynamical arrays can be
made with some buffer at both head and tail (but you need to keep an
index S to skip the head buffer and you have to use this index every
access to the elements of the list). I think that the most important
design goal in Python built-in data types is flexibility (and safety),
instead of just speed (but dictionaries are speedy ^_^), so why there
are deques instead of lists with both amortised tail&tail operations?
(My hypothesis: to keep list implementation a bit simpler, to avoid
wasting memory for the head buffer, and to keep them a little faster,
avoiding the use of the skip index S).

2) I usually prefer explicit verbose syntax, instead of cryptic symbols
(like decorator syntax), but I like the infix Pascal syntax ".." to
specify a closed interval (a tuple?) of integers or letters (this
syntax doesn't mean to deprecate the range function). It reminds me the
... syntax sometimes used in mathematics to define a sequence.
Examples:

assert 1..9 == tuple(range(1,10))
for i in 1..12: pass
for c in "a".."z": pass

3) I think it can be useful a way to define infix functions, like this
imaginary decorator syntax:

@infix
def interval(x, y): return range(x, y+1) # 2 parameters needed

This may allow:
assert 5 interval 9 == interval(5,9)

4) The printf-style formatting is powerful, but I still think it's
quite complex for usual purposes, and I usually have to look its syntax
in the docs. I think the Pascal syntax is nice and simpler to remember
(especially for someone with a little Pascal/Delphi experience ^_^), it
uses two ":" to format the floating point numbers (the second :number
is optional). For example this Delphi program:

{$APPTYPE CONSOLE}
const a = -12345.67890;
begin
writeln(a);
writeln(a:2:0);
writeln(a:4:2);
writeln(a:4:20);
writeln(a:12:2);
end.

Gives:
-1.23456789000000E+0004
-12346
-12345.68
-12345.67890000000000000000
-12345.68
(The last line starts with 3 spaces)

5) From the docs about round:
Values are rounded to the closest multiple of 10 to the power minus n;
if two multiples are equally close, rounding is done away from 0 (so.
for example, round(0.5) is 1.0 and round(-0.5) is -1.0).
Example:
a = [0.05 + x/10.0 for x in range(10)]
b str(round(x, 1))
for x in a: print x,
print
for x in a: print str(round(x, 1)) + " ",

Gives:
0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

But to avoid a bias toward rounding up there is another way do this:
If the digit immediately to the right of the last sig. fig. is more
than 5, you round up.
If the digit immediately to the right of the last sig. fig. is less
than 5, you round down.
If the digit immediately to the right of the last sig. fig. is equal to
5, you round up if the last sig. fig. is odd. You round down if the
last sig. fig. is even. You round up if 5 is followed by nonzero
digits, regardless of whether the last sig. fig. is odd or even.
http://www.towson.edu/~ladon/roundo~1.html
http://mathforum.org/library/drmath/view/58972.html
http://mathforum.org/library/drmath/view/58961.html

6) map( function, list, ...) applies function to every item of list and
return a list of the results. If list is a nested data structure, map
applies function to just the upper level objects.
In Mathematica there is another parameter to specify the "level" of the
apply.

So:
map(str, [[1,[2]], 3])
==>
['[1, [2]]', '3']

With a hypothetical level (default level = 1, it gives the normal
Python map):

map(str, [[1,[2]], 3], level=1)
==>
['[1, [2]]', '3']

map(str, [[1,[2]], 3], level=2)
==>
['1', '[2]', '3']

I think this semantic can be extended:
level=0 means that the map is performed up to the leaves (0 means
infinitum, this isn't nice, but it can be useful because I think Python
doesn't contain a built-in Infinite).
level=-1 means that the map is performed to the level just before the
leaves.
Level=-n means that the map is performed n levels before the leaves.

7) Maybe it can be useful to extended the reload(module) semantic:
reload(module [, recurse=False])
If recurse=True it reloads the module and recursively all the modules
that it imports.

8) Why reload is a function and import is a statement? (So why isn't
reload a statement too, or both functions?)

9) Functions without a return statement return None:
def foo(x): print x
I think the compiler/interpreter can give a "compilation warning" where
such function results are assigned to something:
y = foo(x)
(I know that some of such cases cannot be spotted at compilation time,
but the other cases can be useful too).
I don't know if PyChecker does this already. Generally speaking I'd
like to see some of those checks built into the normal interpreter.
Instructions like:
open = "hello"
Are legal, but maybe a "compilation warning" can be useful here too
(and maybe even a runtime warning if a Verbose flag is set).

10) There can be something in the middle between the def statement and
the lambda. For example it can be called "fun" (or it can be called
"def" still). With it maybe both def and lambdas aren't necessary
anymore. Examples:
cube = fun x:
return x**3
(the last line is indented)

sorted(data, fun x,y: return x-y)
(Probably now it's almost impossible to modify this in the language.)

11) This is just a wild idea for an alternative syntax to specify a
global variable inside a function. From:

def foo(x):
global y
y = y + 2
(the last two lines are indented)

To:

def foo(x): global.y = global.y + 2

Beside the global.y, maybe it can exist a syntax like upper.y or
caller.y that means the name y in the upper context. upper.upper.y etc.

12) Mathematica's interactive IDE suggests possible spelling errors;
this feature is often useful, works with builtin name functions too,
and it can be switched off.
In[1]:= sin2 = N[Sin[2]]
Out[1]= 0.909297

In[2]:= sina2
General::"spell1": "Possible spelling error: new symbol name "sina2"
is similar to existing symbol "sin2".
Out[2]= sina2

I don't know if some Python IDEs (or IPython) do this already, but it
can be useful in Pythonwin.

13) In Mathematica language the = has the same meaning of Python, but
:= is different:

lhs := rhs assigns rhs to be the delayed value of lhs. rhs is
maintained in an unevaluated form. When lhs appears, it is replaced by
rhs, evaluated afresh each time.

I don't know if this can be useful...

------------------

14) In one of my last emails of notes, I've tried to explain the
Pattern Matching programming paradigm of Mathematica. Josiah Carlson
answered me:

http://groups-beta.google.com/group/comp.lang.python/msg/e15600094cb281c1
> In the C/C++ world, that is called polymorphism.
> You can do polymorphism with Python, and decorators may make it
easier...

This kind of programming is like the use of a kind regular expression
on the parameters of functions. Here are some fast operators, from the
(copyrighted) online help:

_ or Blank[ ] is a pattern object that can stand for any Mathematica
expression.
For example this info comes from:
http://documents.wolfram.com/mathematica/functions/Blank
This is used for example in the definition of functions:
f[x_] := x^2

__ (two _ characters) or BlankSequence[ ] is a pattern object that can
stand for any sequence of one or more Mathematica expressions.

___ (three _ characters) or BlankNullSequence[ ] is a pattern object
that can stand for any sequence of zero or more Mathematica
expressions.
___h or BlankNullSequence[h] can stand for any sequence of expressions,
all of which have head h.

p1 | p2 | ... is a pattern object which represents any of the patterns
pi

s:obj represents the pattern object obj, assigned the name s. When a
transformation rule is used, any occurrence of s on the righthand
side is replaced by whatever expression it matched on the lefthand
side. The operator : has a comparatively low precedence. The expression
x:_+_ is thus interpreted as x:(_+_), not (x:_)+_.

p:v is a pattern object which represents an expression of the form p,
which, if omitted, should be replaced by v. Optional is used to specify
"optional arguments" in functions represented by patterns. The pattern
object p gives the form the argument should have, if it is present. The
expression v gives the "default value" to use if the argument is
absent. Example: the pattern f[x_, y_:1] is matched by f[a], with x
taking the value a, and y taking the value 1. It can also be matched by
f[a, b], with y taking the value b.

p.. is a pattern object which represents a sequence of one or more
expressions, each matching p.

p... is a pattern object which represents a sequence of zero or more
expressions, each matching p.

patt /; test is a pattern which matches only if the evaluation of test
yields True.
Example: f[x_] := fp[x] /; x > 1 defines a function in the case when a.
lhs := Module[{vars}, rhs /; test] allows local variables to be shared
between test and rhs. You can use the same construction with Block and
With.

p?test is a pattern object that stands for any expression which matches
p, and on which the application of test gives True. Ex:
p1[x_?NumberQ] := Sqrt[x]
p2[x_?NumericQ] := Sqr[x]

Verbatim[expr] represents expr in pattern matching, requiring that expr
be matched exactly as it appears, with no substitutions for blanks or
other transformations. Verbatim[x_] will match only the actual
expression x_. Verbatim is useful in setting up rules for transforming
other transformation rules.

HoldPattern[expr] is equivalent to expr for pattern matching, but
maintains expr in an unevaluated form.

Orderless is an attribute that can be assigned to a symbol f to
indicate that the elements a in expressions of the form f[e1, e2, ...]
should automatically be sorted into canonical order. This property is
accounted for in pattern matching.

Flat is an attribute that can be assigned to a symbol f to indicate
that all expressions involving nested functions f should be flattened
out. This property is accounted for in pattern matching.

OneIdentity is an attribute that can be assigned to a symbol f to
indicate that f[x], f[f[x]], etc. are all equivalent to x for the
purpose of pattern matching.

Default[f], if defined, gives the default value for arguments of the
function f obtained with a _. pattern object.
Default[f, i] gives the default value to use when _. appears as the
i-th argument of f.

Cases[{e1, e2, ...}, pattern] gives a list of the a that match the
pattern.
Cases[{e1, e2, ...}, pattern -> rhs] gives a list of the values of rhs
corresponding to the ei that match the pattern.

Position[expr, pattern] gives a list of the positions at which objects
matching pattern appear in expr.

Select[list, crit] picks out all elements a of list for which crit[ei]
is True.

DeleteCases[expr, pattern] removes all elements of expr which match
pattern.
DeleteCases[expr, pattern, levspec] removes all parts of expr on levels
specified by levspec which match pattern.
Example : DeleteCases[{1, a, 2, b}, _Integer] ==> {a, b}

Count[list, pattern] gives the number of elements in list that match
pattern.

MatchQ[expr, form] returns True if the pattern form matches expr, and
returns False otherwise.

It may look strange, but an expert can even use it to write small full
programs... But usually they are used just when necessary.
Note that I'm not suggesting to add those (all) into python.

------------------

15) NetLogo is a kind of logo derived from StarLogo, implemented in
Java.
http://ccl.northwestern.edu/netlogo/
I think it contains some ideas that can be useful for Python too.
- It has built-in some hi-level data structures, like the usual turtle
(but usually you use LOTS of turtles at the same time, in parallel),
and the patch (programmable cellular automata layers, each cell can be
programmed and it can interact with nearby cells or nearby turtles)
- It contains built-in graphics, because it's often useful for people
that starts to program, and it's useful for lots of other things. In
Python it can be useful a tiny and easy "fast" graphics library
(Tkinter too can be used, but something simpler can be useful for some
quick&dirty graphics. Maybe this library can also be faster than the
Tkinter pixel plotting and the pixel matrix visualisation).
- It contains few types of built-in graphs to plot variables, etc. (for
python there are many external plotters).
- Its built-in widgets are really easy to use (they are defined inside
NetLogo and StarLogo source), but they probably look too much toy-like
for Python programs...
- This language contains lots of other nice ideas. Some of them
probably look too much toy-like, still some examples:
http://ccl.northwestern.edu/netlogo/models/
Show that this language is only partially a toy, and it can be useful
to understand and learn nonlinear dynamics of many systems.

This is a source, usually some parts of it (like widget positioning and
parameters) are managed by the IDE:
http://ccl.northwestern.edu/netlogo/models/models/Sample%20Models/Biology/Fur.nlogo
Bye,
bear hugs,
Bearophile

Doug Holton

unread,

Dec 28, 2004, 10:17:49 PM12/28/04

to

bearoph...@lycos.com wrote:

> for i in 1..12: pass
> for c in "a".."z": pass

....

> @infix
> def interval(x, y): return range(x, y+1) # 2 parameters needed

> assert 5 interval 9 == interval(5,9)

....

> 10) There can be something in the middle between the def statement and
> the lambda.

These will likely not appear in CPython standard, but Livelogix runs on
top of the CPython VM and supports ".." sequences and custom infix
operators: http://logix.livelogix.com/tutorial/5-Standard-Logix.html

> 11) This is just a wild idea for an alternative syntax to specify a
> global variable inside a function. From:
>
> def foo(x):
> global y
> y = y + 2
> (the last two lines are indented)
>
> To:
>
> def foo(x): global.y = global.y + 2
>
> Beside the global.y, maybe it can exist a syntax like upper.y or
> caller.y that means the name y in the upper context. upper.upper.y etc.

This will also likely never appear in Python. I like your idea though.
I implemented the same exact thing a couple months ago. One
difference though, you only need to type out the full "global.y" if you
want to differentiate it from a local variable with the same name.

> 15) NetLogo is a kind of logo derived from StarLogo, implemented in
> Java.

> Show that this language is only partially a toy, and it can be useful
> to understand and learn nonlinear dynamics of many systems.

If you want to do something like Netlogo but using Python instead of
Logo, see: http://repast.sourceforge.net/
You can script repast in jython or you can script repast.net.

Also, you might request the NetLogo and StarLogo developers to support
Jython (in addition to Logo) scripting in their next version (which is
already in development and supports 3D).

Andrew Dalke

unread,

Dec 28, 2004, 11:10:23 PM12/28/04

to

bearophileHUGS:
[on Python's O(n) list insertion/deletion) at any place other than tail

> (My hypothesis: to keep list implementation a bit simpler, to avoid
> wasting memory for the head buffer, and to keep them a little faster,
> avoiding the use of the skip index S).

Add its relative infrequent need.

> 2) I usually prefer explicit verbose syntax, instead of cryptic symbols
> (like decorator syntax), but I like the infix Pascal syntax ".." to
> specify a closed interval (a tuple?) of integers or letters

> assert 1..9 == tuple(range(1,10))
> for i in 1..12: pass
> for c in "a".."z": pass

It's been proposed several times. I thought there was a PEP
but I can't find it. One problem with it; what does

for x in 1 .. "a":

do? (BTW, it needs to be 1 .. 12 not 1..12 because 1. will be
interpreted as the floating point value "1.0".)

What does
a = MyClass()
b = AnotherClass()
for x in a .. b:
print x

do? That is, what's the generic protocol? In Pascal it
works because you also specify the type and Pascal has
an incr while Python doesn't.

> 3) I think it can be useful a way to define infix functions, like this
> imaginary decorator syntax:
>
> @infix
> def interval(x, y): return range(x, y+1) # 2 parameters needed
>
> This may allow:
> assert 5 interval 9 == interval(5,9)

Maybe you could give an example of when you need this in
real life?

What does

1 + 2 * 3 interval 9 / 3 - 7

do? That is, what's the precedence? Does this only work
for binary or is there a way to allow unary or other
n-ary (including 0-ary?) functions?

> 4) The printf-style formatting is powerful, but I still think it's
> quite complex for usual purposes, and I usually have to look its syntax
> in the docs. I think the Pascal syntax is nice and simpler to remember
> (especially for someone with a little Pascal/Delphi experience ^_^),

But to someone with C experience or any language which derives
its formatting string from C, Python's is easier to understand than
your Pascal one.

A Python view is that there should be only one obvious way to do
a task. Supporting both C and Pascal style format strings breaks
that. Then again, having both the old % and the new PEP 292 string
templates means there's already two different ways to do string
formatting.

> For example this Delphi program:

...
> const a = -12345.67890;
...
> writeln(a:4:20);
...
> Gives:
...
> -12345.67890000000000000000
Python says that's

>>> "%.20f" % -12345.67890
'-12345.67890000000079453457'

I don't think Pascal is IEEE enough.

note also that the Pascal-style formatting strings are less
capable than Python's, though few people use features like

>>> "% 2.3f" % -12.34
'-12.340'
>>> "% 2.3f" % 12.34
' 12.340'

> 5) From the docs about round:

...

> But to avoid a bias toward rounding up there is another way do this:

There are even more ways than that. See
http://www.python.org/peps/pep-0327.html#rounding-algorithms

The solution chosen was not to change 'round' but to provide
a new data type -- Decimal. This is in Python 2.4.

> 6) map( function, list, ...) applies function to every item of list and
> return a list of the results. If list is a nested data structure, map
> applies function to just the upper level objects.
> In Mathematica there is another parameter to specify the "level" of the
> apply.

..

> I think this semantic can be extended:

A real-life example would also be helpful here.

What does
map(len, "Blah", level = 200)
return?

In general, most people prefer to not use map and instead
use list comprehensions and (with 2.4) generator comprehensions.

> level=0 means that the map is performed up to the leaves (0 means
> infinitum, this isn't nice, but it can be useful because I think Python
> doesn't contain a built-in Infinite).

You need to learn more about the Pythonic way of thinking
of things. The usual solution for this is to have "level = None".

> 7) Maybe it can be useful to extended the reload(module) semantic:
> reload(module [, recurse=False])
> If recurse=True it reloads the module and recursively all the modules
> that it imports.

It isn't that simple. Reloading modules isn't sufficient.
Consider

import spam
a = spam.Eggs()
reload(spam)
print isinstance(a, spam.Eggs)

This will print False because a contains a reference to
the old Eggs which contains a reference to the old spam module.

As I recall, Twisted has some super-reload code that may be
what you want. See
http://twistedmatrix.com/documents/current/api/twisted.python.rebuild.html

> 8) Why reload is a function and import is a statement? (So why isn't
> reload a statement too, or both functions?)

import uses the underlying __import__ function.

Consider using the __import__ function directly

math = __import__("math")

The double use of the name "math" is annoying and error prone.

It's more complicated with module hierarchies.

>>> xml = __import__("xml.sax.handler")
>>> xml.sax.handler
<module 'xml.sax.handler' from '/sw/lib/python2.3/xml/sax/handler.pyc'>
>>> xml.sax.saxutils
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'module' object has no attribute 'saxutils'
>>> __import__("xml.sax.saxutils")
<module 'xml' from '/sw/lib/python2.3/xml/__init__.pyc'>
>>> xml.sax.saxutils
<module 'xml.sax.saxutils' from '/sw/lib/python2.3/xml/sax/saxutils.pyc'>
>>>

Reload takes a reference to the module to reload. Consider
import UserString
x = UserString
reload(x)

This reloads UserString. It could be a statement but there's
no advantage to doing so.

> 9) Functions without a return statement return None: def foo(x): print x
> I think the compiler/interpreter can give a "compilation warning" where
> such function results are assigned to something: y = foo(x)

You might think so but you'ld be wrong.

Consider

def vikings():
pass

def f():
global vikings
def vikings():
print "Spammity spam!"
return 1.0

if random.random() > 0.5:
f()

x = vikings()

More realistically I've done

class DebugCall:
def __init__(self, obj):
self.__obj = obj
def __call__(self, *args, **kwargs):
print "Calling", self.__obj, "with", args, kwargs
x = self.__obj(*args, **kwargs)
print "Returned with", x
return x

from wherever import f
f = DebugCall(f)

I don't want it generating a warning for those cases where
the implicit None is returned.

A more useful (IMHO) is to have PyChecker check for cases
where both explicit and implicit returns can occur in the
same function. Don't know if it does that already.

Why haven't you looked at PyChecker to see what it does?

> (I know that some of such cases cannot be spotted at compilation time,
> but the other cases can be useful too). I don't know if PyChecker does
> this already. Generally speaking I'd like to see some of those checks
> built into the normal interpreter. Instructions like:
> open = "hello"
> Are legal, but maybe a "compilation warning" can be useful here too (and
> maybe even a runtime warning if a Verbose flag is set).

There's a PEP for that. See
http://www.python.org/peps/pep-0329.html

Given your questions it would be appropriate for you to read all the
PEPs.

> 10) There can be something in the middle between the def statement and
> the lambda. For example it can be called "fun" (or it can be called
> "def" still). With it maybe both def and lambdas aren't necessary
> anymore. Examples:
> cube = fun x:
> return x**3
> (the last line is indented)
>
> sorted(data, fun x,y: return x-y)
> (Probably now it's almost impossible to modify this in the language.)

It's been talked about. Problems are:
- how is it written?
- how big is the region between a lambda an a def?

Try giving real world examples of when you would
use this 'fun' and compare it to lambda and def forms.
you'll find there's at most one extra line of code
needed. That doesn't seem worthwhile.

>
> 11) This is just a wild idea for an alternative syntax to specify a
> global variable inside a function. From:
>
> def foo(x):
> global y
> y = y + 2
> (the last two lines are indented)
>
> To:
>
> def foo(x): global.y = global.y + 2
>
> Beside the global.y, maybe it can exist a syntax like upper.y or
> caller.y that means the name y in the upper context. upper.upper.y etc.

It does have the advantage of being explicit rather than
implicit.

> 12) Mathematica's interactive IDE suggests possible spelling errors;

I don't know anything about the IDEs. I have enabled the
tab-complete in my interactive Python session which is nice.

I believe IPython is the closest to giving a Mathematica-like feel.
There's also tmPython.

> 13) In Mathematica language the = has the same meaning of Python, but
> := is different:
>
> lhs := rhs assigns rhs to be the delayed value of lhs. rhs is maintained
> in an unevaluated form. When lhs appears, it is replaced by rhs,
> evaluated afresh each time.
>
> I don't know if this can be useful...

Could you explain why it might be useful?

> 14) In one of my last emails of notes, I've tried to explain the Pattern
> Matching programming paradigm of Mathematica. Josiah Carlson answered
> me:

Was there a question in all that?

You are proposing Python include a Prolog-style (or
CLIPS or Linda or ..) programming idiom, yes? Could you
also suggest a case when one would use it?

> 15) NetLogo is a kind of logo derived from StarLogo, implemented in
> Java.

How about the "turtle" standard library?

I must say it's getting pretty annoying to say things like
"when would this be useful?" and "have you read the documentation?"
for your statements.

> Maybe this library can also be faster than the Tkinter pixel
> plotting and the pixel matrix visualisation).

See also matplotlib, chaco, and other libraries that work hard
to make this simple. Have you done any research on what Python
can do or do you think ... no, sorry, I'm getting snippy.

Andrew
da...@dalkescientific.com

Paul Rubin

unread,

Dec 28, 2004, 11:20:17 PM12/28/04

to

Andrew Dalke <da...@dalkescientific.com> writes:
> What does
> a = MyClass()
> b = AnotherClass()
> for x in a .. b:
> print x
>
> do? That is, what's the generic protocol?

".." just becomes an operator like + or whatever, which you can define
in your class definition:

class MyClass:
def __dotdot__(self, other):
return xrange(self.whatsit(), other.whatsit())

The .. operation is required to return an iterator.

Steven Bethard

unread,

Dec 28, 2004, 11:37:15 PM12/28/04

to

bearoph...@lycos.com wrote:
> 4) The printf-style formatting is powerful, but I still think it's
> quite complex for usual purposes, and I usually have to look its syntax
> in the docs. I think the Pascal syntax is nice and simpler to remember
> (especially for someone with a little Pascal/Delphi experience ^_^), it
> uses two ":" to format the floating point numbers (the second :number
> is optional). For example this Delphi program:
>
> {$APPTYPE CONSOLE}
> const a = -12345.67890;
> begin
> writeln(a);
> writeln(a:2:0);
> writeln(a:4:2);
> writeln(a:4:20);
> writeln(a:12:2);
> end.
>
> Gives:
> -1.23456789000000E+0004
> -12346
> -12345.68
> -12345.67890000000000000000
> -12345.68
> (The last line starts with 3 spaces)

Even after looking at your example, I have no idea what the two numbers
on each side of the :'s do. The last number appears to be the number of
decimal places to round to, but I don't know what the first number does.

Since I can't figure it out intuitively (even with examples), I don't
think this syntax is any less inscrutable than '%<width>.<decimals>f'.
My suspicion is that you're just biased by your previous use of Pascal.
(Note that I never used Pascal or enough C to use string formatting
before I used Python, so I'm less biased than others may be in this
situation.)

> 6) map( function, list, ...) applies function to every item of list and
> return a list of the results. If list is a nested data structure, map
> applies function to just the upper level objects.
> In Mathematica there is another parameter to specify the "level" of the
> apply.
>
> So:
> map(str, [[1,[2]], 3])
> ==>
> ['[1, [2]]', '3']
>
> With a hypothetical level (default level = 1, it gives the normal
> Python map):
>
> map(str, [[1,[2]], 3], level=1)
> ==>
> ['[1, [2]]', '3']
>
> map(str, [[1,[2]], 3], level=2)
> ==>
> ['1', '[2]', '3']
>
> I think this semantic can be extended:
> level=0 means that the map is performed up to the leaves (0 means
> infinitum, this isn't nice, but it can be useful because I think Python
> doesn't contain a built-in Infinite).
> level=-1 means that the map is performed to the level just before the
> leaves.
> Level=-n means that the map is performed n levels before the leaves.

This packs two things into map -- the true mapping behavior (applying a
function to a list) and the flattening of a list. Why don't you lobby
for a builtin flatten instead? (Also, Google for flatten in the
python-list -- you should find a recent thread about it.)

> 10) There can be something in the middle between the def statement and
> the lambda. For example it can be called "fun" (or it can be called
> "def" still). With it maybe both def and lambdas aren't necessary
> anymore. Examples:
> cube = fun x:
> return x**3
> (the last line is indented)
>
> sorted(data, fun x,y: return x-y)
> (Probably now it's almost impossible to modify this in the language.)

Google the python-list for 'anonymous function' or 'anyonymous def' and
you'll find a ton of discussion about this kind of thing. I'll note
only that your first example gains nothing over

def cube(x):
return x**3

and that your second example gains nothing over

sorted(data, lambda x, y: return x-y)

or

sorted(data, operator.sub)

> 11) This is just a wild idea for an alternative syntax to specify a
> global variable inside a function. From:
>
> def foo(x):
> global y
> y = y + 2
> (the last two lines are indented)
>
> To:
>
> def foo(x): global.y = global.y + 2

Well, you can do:

def foo(x):
globals()['y'] = globals()['y'] + 2

Not exactly the same syntax, but pretty close.

> 13) In Mathematica language the = has the same meaning of Python, but
> := is different:
>
> lhs := rhs assigns rhs to be the delayed value of lhs. rhs is
> maintained in an unevaluated form. When lhs appears, it is replaced by
> rhs, evaluated afresh each time.
>
> I don't know if this can be useful...

You can almost get this behavior with lambdas, e.g.:

x = lambda: delayed_expression()

then you can get a new instance of the expression by simply doing:

new_instance = x()

I know this isn't exactly what you're asking for, but this is one
current possibility that does something similar. You might also look at:

http://www.python.org/peps/pep-0312.html

which suggests a simpler syntax for this kind of usage.

Steve

Steven Bethard

unread,

Dec 28, 2004, 11:54:11 PM12/28/04

to

Andrew Dalke wrote:
> I must say it's getting pretty annoying to say things like
> "when would this be useful?" and "have you read the documentation?"
> for your statements.

I'll second that. Please, "Bearophile", do us the courtesy of checking

(1) Google groups archive of the mailing list:
http://groups-beta.google.com/group/comp.lang.python

and

(2) The Python Enhancement Proposals:
http://www.python.org/peps/

before posting another such set of questions. While most of the people
on this list are nice enough to answer your questions anyway, the
answers are already out there for at least half of your questions, if
you would do us the courtesy of checking first.

Thanks!

Steve

Andrew Dalke

unread,

Dec 29, 2004, 2:41:36 AM12/29/04

to

Paul Rubin wrote:
> ".." just becomes an operator like + or whatever, which you can define
> in your class definition:
>
> class MyClass:
> def __dotdot__(self, other):
> return xrange(self.whatsit(), other.whatsit())
>
> The .. operation is required to return an iterator.

Ahh, I see.

This should be put into a PEP. Some obvious questions:
- ".." or "..." ? The latter is already available for
use in slices

- If "..." should the name be "__ellipsis__"? If ".."
should the name be "__range___"?

- Should range(x, y) first attempt x.__range__(y)?

- Can it be used outside of a for statement? Ie, does
x = "a" ... "b"
return a generic iterator? Almost certainly as that
fits in nicely with the existing 'for' syntax.

- What's the precedence? Given
x = a .. b .. c
x = 1 + 2 .. 5 * 3
x = 1 ** 5 .. 4 ** 2
etc., what is meant? Most likely .. should have the
lowest precedence, evaluated left to right.

- is there an "__rdotdot__"?

- any way to specify "use the normal beginning"? Like
for x in .. 5: # same as 0 .. 5
-or (the oft rejected)-
for x in 5:

- any way to specify "ad infinitum"? Like
for x in 0 .. Infinity:
-or-
for x in 0 ... :

- does "for x in 10 .. 0" the same as xrange(10,0) (what
you propose) or xrange(10, 0, -1)?

- do strings work for len(s) > 1? Eg, "A000" .. "A999"?

- What do you think of (assuming the use of "...")
for x in 0.....100:?

- What's the advantage of ".." over, say a function or
method? That is, why does the new binary operator
prove so useful? In one of my gedanken experiments
I considered getting the 10th through 20th prime
for x in primes(10) .. primes(20):
but that seems clumsier than
for x in primes_between(10, 20):
in part because it requires "primes(10)" to have some
meaning. I suppose
for x in primes(10) .. 20:
could also work though that should in my mind generate
primes up to the number 20 and not the 20th prime.
Note that primes(10) cannot return the 10th prime as
the value 29.

Andrew
da...@dalkescientific.com

Mike Meyer

unread,

Dec 29, 2004, 12:42:00 PM12/29/04

to

bearoph...@lycos.com writes:

> @infix
> def interval(x, y): return range(x, y+1) # 2 parameters needed
>
> This may allow:
> assert 5 interval 9 == interval(5,9)

I don't like the idea of turning words into operators. I'd much rather
see something like:

@infix('..')

def interval(x, y):
return range(x, y + 1)

assert 5 .. 9 == interval(5, 10)

This would also allow us to start working on doing away with the magic
method names for current operators, which I think would be an
improvement.

As others have pointed out, you do need to do something about operator
precedence. For existing operators, that's easy - they keep their
precedence. For new operators, it's harder.

You also need to worry about binding order. At the very least, you
can specify that all new operators bind left to right. But that might
not be what you want.

<mike
--
Mike Meyer <m...@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

Steve Holden

unread,

Dec 29, 2004, 1:11:43 PM12/29/04

to

Mike Meyer wrote:

> bearoph...@lycos.com writes:
>
>
>>@infix
>>def interval(x, y): return range(x, y+1) # 2 parameters needed
>>
>>This may allow:
>>assert 5 interval 9 == interval(5,9)
>
>
> I don't like the idea of turning words into operators. I'd much rather
> see something like:
>
> @infix('..')
> def interval(x, y):
> return range(x, y + 1)
>
> assert 5 .. 9 == interval(5, 10)
>
> This would also allow us to start working on doing away with the magic
> method names for current operators, which I think would be an
> improvement.
>

Well, perhaps you can explain how a change that's made at run time
(calling the decorator) can affect the parser's compile time behavior,
then. At the moment, IIRC, the only way Python code can affect the
parser's behavior is in the __future__ module, which must be imported at
the very head of a module.

> As others have pointed out, you do need to do something about operator
> precedence. For existing operators, that's easy - they keep their
> precedence. For new operators, it's harder.
>

Clearly you'd need some mechanism to specify preference, either
relatively or absolutely. I seem to remember Algol 68 allowed this.

> You also need to worry about binding order. At the very least, you
> can specify that all new operators bind left to right. But that might
> not be what you want.
>

Associativity and precedence will also have to affect the parsing of the
code, of course. Overall this would be a very ambitious change.

regards
Steve
--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/
Holden Web LLC +1 703 861 4237 +1 800 94 3119

Jp Calderone

unread,

Dec 29, 2004, 1:24:53 PM12/29/04

to pytho...@python.org

On Wed, 29 Dec 2004 11:42:00 -0600, Mike Meyer <m...@mired.org> wrote:
>bearoph...@lycos.com writes:
>
> > @infix
> > def interval(x, y): return range(x, y+1) # 2 parameters needed
> >
> > This may allow:
> > assert 5 interval 9 == interval(5,9)
>
> I don't like the idea of turning words into operators. I'd much rather
> see something like:
>

Really? I like "not", "and", "or", "is", and "in". It would not be nice
if they were replaced with punctuation.

This aside, not even Python 3.0 will be flexible enough to let you define
an infix decorator. The language developers are strongly against supporting
macros, which is what an infix decorator would amount to.

Now, they might be convinced to add a new syntax that makes a function
into an infix operator. Perhaps something like this:

def &(..)(x, y):

return range(x, y + 1)

This is much better than supporting macros because it will lead to a 6
month long debate over the exact syntax and adds exactly zero other
features to the language (whereas macros let you pick your own syntax
and give you essentially unlimited other abilities).

Jp

Mike Meyer

unread,

Dec 29, 2004, 1:38:56 PM12/29/04

to

Steve Holden <st...@holdenweb.com> writes:

> Mike Meyer wrote:
>
>> bearoph...@lycos.com writes:
>>
>>>@infix
>>>def interval(x, y): return range(x, y+1) # 2 parameters needed
>>>
>>>This may allow:
>>>assert 5 interval 9 == interval(5,9)
>> I don't like the idea of turning words into operators. I'd much
>> rather
>> see something like:
>> @infix('..')
>> def interval(x, y):
>> return range(x, y + 1)
>> assert 5 .. 9 == interval(5, 10)
>> This would also allow us to start working on doing away with the
>> magic
>> method names for current operators, which I think would be an
>> improvement.
>>
> Well, perhaps you can explain how a change that's made at run time
> (calling the decorator) can affect the parser's compile time behavior,
> then. At the moment, IIRC, the only way Python code can affect the
> parser's behavior is in the __future__ module, which must be imported
> at the very head of a module.

By modifying the parsers grammer at runtime. After all, it's just a
data structure that's internal to the compiler.

Mike Meyer

unread,

Dec 29, 2004, 1:38:02 PM12/29/04

to

Jp Calderone <exa...@divmod.com> writes:

> On Wed, 29 Dec 2004 11:42:00 -0600, Mike Meyer <m...@mired.org> wrote:
>>bearoph...@lycos.com writes:
>>
>> > @infix
>> > def interval(x, y): return range(x, y+1) # 2 parameters needed
>> >
>> > This may allow:
>> > assert 5 interval 9 == interval(5,9)
>>
>> I don't like the idea of turning words into operators. I'd much rather
>> see something like:
>
> Really? I like "not", "and", "or", "is", and "in". It would not be nice
> if they were replaced with punctuation.

They can't be turned into operators - they already are.

> This aside, not even Python 3.0 will be flexible enough to let you define
> an infix decorator. The language developers are strongly against supporting
> macros, which is what an infix decorator would amount to.

Could you please explain how allowing new infix operators amount to
supporting macros?

> Now, they might be convinced to add a new syntax that makes a function
> into an infix operator. Perhaps something like this:
>
> def &(..)(x, y):
> return range(x, y + 1)

And while you're at it, explain how this method of defining new infix
operators differs from using decorators in such a way that it doesn't
amount to supporting macros.

Jp Calderone

unread,

Dec 29, 2004, 1:54:04 PM12/29/04

to pytho...@python.org

On Wed, 29 Dec 2004 12:38:02 -0600, Mike Meyer <m...@mired.org> wrote:
>Jp Calderone <exa...@divmod.com> writes:
>
> > On Wed, 29 Dec 2004 11:42:00 -0600, Mike Meyer <m...@mired.org> wrote:
> >>bearoph...@lycos.com writes:
> >>
> >> > @infix
> >> > def interval(x, y): return range(x, y+1) # 2 parameters needed
> >> >
> >> > This may allow:
> >> > assert 5 interval 9 == interval(5,9)
> >>
> >> I don't like the idea of turning words into operators. I'd much rather
> >> see something like:
> >
> > Really? I like "not", "and", "or", "is", and "in". It would not be nice
> > if they were replaced with punctuation.
>
> They can't be turned into operators - they already are.
>

They weren't operators at some point (if necessary, select this point
prior to the creation of the first programming language). Later, they
were. Presumably in the interim someone turned them into operators.

> > This aside, not even Python 3.0 will be flexible enough to let you define
> > an infix decorator. The language developers are strongly against supporting
> > macros, which is what an infix decorator would amount to.
>
> Could you please explain how allowing new infix operators amount to
> supporting macros?

You misread - I said "what an infix decorator would amount to". Adding
new infix operators is fine and in no way equivalent to macros.

As you said in your reply to Steve Holden in this thread, one way support
for @infix could be done is to allow the decorator to modify the parser's
grammar. Doesn't this sound like a macro to you?

>
> > Now, they might be convinced to add a new syntax that makes a function
> > into an infix operator. Perhaps something like this:
> >
> > def &(..)(x, y):
> > return range(x, y + 1)
>
> And while you're at it, explain how this method of defining new infix
> operators differs from using decorators in such a way that it doesn't
> amount to supporting macros.

Simple. You can't do anything except define a new infix operator with
the hypothetical "def &( <operator> )" syntax. With real macros, you can
define new infix operators, along with any other syntactic construct your
heart desires.

Jp

Steve Holden

unread,

Dec 29, 2004, 2:33:16 PM12/29/04

to

Mike Meyer wrote:

> Steve Holden <st...@holdenweb.com> writes:
>
[...]

>>
>>Well, perhaps you can explain how a change that's made at run time
>>(calling the decorator) can affect the parser's compile time behavior,
>>then. At the moment, IIRC, the only way Python code can affect the
>>parser's behavior is in the __future__ module, which must be imported
>>at the very head of a module.
>
>
> By modifying the parsers grammer at runtime. After all, it's just a
> data structure that's internal to the compiler.
>

But the parser executes before the compiled program runs, was my point.
What strange mixture of compilation and interpretation are you going to
use so the parser actually understands that ".." (say) is an operator
before the operator definition has been executed?

regards
Steve
--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/

Holden Web LLC +1 703 861 4237 +1 800 494 3119

Terry Reedy

unread,

Dec 29, 2004, 2:59:36 PM12/29/04

to pytho...@python.org

"Steven Bethard" <steven....@gmail.com> wrote in message
news:TLqAd.719331$mD.298058@attbi_s02...

> I'll second that. Please, "Bearophile", do us the courtesy of checking
>
> (1) Google groups archive of the mailing list:
> http://groups-beta.google.com/group/comp.lang.python
>
> and
>
> (2) The Python Enhancement Proposals:
> http://www.python.org/peps/
>
> before posting another such set of questions. While most of the people
> on this list are nice enough to answer your questions anyway, the answers
> are already out there for at least half of your questions, if you would
> do us the courtesy of checking first.

I also suggest perusing the archived PyDev (Python Development mailing
list) summaries for the last couple of years (see python.org). Every two
weeks, Brett Cannon has condensed everything down to a few pages. You can
easily focus on the topics of interest to you. For instance, there was
discussion of making lists truly double-ended, but it was decided to settle
for certain improvements in the list implementation while adding
collections.deque (sp?) in 2.4.

Terry J. Reedy

Terry Reedy

unread,

Dec 29, 2004, 3:19:50 PM12/29/04

to pytho...@python.org

"Mike Meyer" <m...@mired.org> wrote in message
news:86acrxt...@guru.mired.org...

> Steve Holden <st...@holdenweb.com> writes:
>> Well, perhaps you can explain how a change that's made at run time
>> (calling the decorator) can affect the parser's compile time behavior,
>> then. At the moment, IIRC, the only way Python code can affect the
>> parser's behavior is in the __future__ module, which must be imported
>> at the very head of a module.
>
> By modifying the parsers grammer at runtime. After all, it's just a
> data structure that's internal to the compiler.

Given that xx.py is parsed in its entirety *before* runtime, that answer is
no answer at all. Runtime parser changes (far, far from trivial) could
only affect the result of exec and eval.

Terry J. Reedy

Bengt Richter

unread,

Dec 29, 2004, 9:38:55 PM12/29/04

to

On Wed, 29 Dec 2004 11:42:00 -0600, Mike Meyer <m...@mired.org> wrote:

>bearoph...@lycos.com writes:
>
>> @infix
>> def interval(x, y): return range(x, y+1) # 2 parameters needed
>>
>> This may allow:
>> assert 5 interval 9 == interval(5,9)
>
>I don't like the idea of turning words into operators. I'd much rather
>see something like:
>
>@infix('..')
>def interval(x, y):
> return range(x, y + 1)
>
>assert 5 .. 9 == interval(5, 10)
>

I like that for punctuation-character names.

OTOH, there is precedent in e.g. fortran (IIRC) for named operators of the
form .XX. -- e.g., .GE. for >= so maybe there could be room for both.

A capitalization _convention_ would make such infix operators pretty readable
even if the names were'nt always the best, e.g.,

for x in 5 .UP_TO_AND_INCLUDING. 9:
...

Hm, you could make

x .infix. y

syntactic sugar in general (almost like a macro) for

infix(x, y)

and you could generalize that to permit .expression. with no embedded spaces, e.g.,

x .my_infix_module.infix. y
for
my_infix_module.infix(x, y)

or to illustrate expression generality ridiculously,

a = x .my_infix_module.tricky_func_factory_dict[random.choice(range(4))](time.time()). y

for
a = my_infix_module.tricky_func_factory_dict[random.choice(range(4))](time.time())(x, y)

<aside>
I wonder if it's a good idea to post ridiculous but functionally illustrative uses
of possibly good ideas, or if the net effect detracts from the reception of the ideas.
Also verbiage like this ;-/
</aside>

If '..' were special-cased as a synonym for __nullname__ you could handle it
by def __nullname__(x, y): whatever, but since it's punctuation-only, your @infix('..')
seem preferable.

Hm, ... if single (to exlude .infix.) missing dotted names defaulted to __nullname__,
I wonder what that would open up ;-) obj. would be obj.__nullname__, which could be
defined as a concise way of referring to the one special attribute or property.
And .name would be __nullname__.name -- which with the approriate descriptor definition
in the function class could make .name syntactic sugar for self.name (or whatever the first arg
name was). ... just musing ;-)

>This would also allow us to start working on doing away with the magic
>method names for current operators, which I think would be an
>improvement.
>
>As others have pointed out, you do need to do something about operator
>precedence. For existing operators, that's easy - they keep their
>precedence. For new operators, it's harder.
>
>You also need to worry about binding order. At the very least, you
>can specify that all new operators bind left to right. But that might
>not be what you want.

Yes. My .expression. infix if implemented essentially as a left-right
rewrite-as-soon-as-you-recognize macro would create the precedence rules
of the macro-rewritten source. Which might cover a fair amount of useful
ground. Needs to be explored though. E.g.,

x .op1. y .op2. z => op2(op1(x, y), z)

But you could override with parens in the ordinary way:

x .op1. (y .op2. z) => op1(x, op2(y, z))

But
2 * x + 3 .op1. y => 2 * x + op1(3, y)

Etc. Well, something to think about ;-)

Regards,
Bengt Richter

Andrew Dalke

unread,

Dec 29, 2004, 10:37:38 PM12/29/04

to

Bengt Richter:

> OTOH, there is precedent in e.g. fortran (IIRC) for named operators of the
> form .XX. -- e.g., .GE. for >= so maybe there could be room for both.

> Hm, you could make
>
> x .infix. y

> x .op1. y .op2. z => op2(op1(x, y), z)

The problem being that that's already legal syntax

>>> class Xyzzy:
... def __init__(self):
... self.op1 = self.op2 = self.y = self
... self.z = "Nothing happens here"
...
>>> x = Xyzzy()

>>> x .op1. y .op2. z

'Nothing happens here'
>>>

Andrew
da...@dalkescientific.com

Bengt Richter

unread,

Dec 29, 2004, 10:55:12 PM12/29/04

to

On Wed, 29 Dec 2004 13:11:43 -0500, Steve Holden <st...@holdenweb.com> wrote:

>Mike Meyer wrote:
>
>> bearoph...@lycos.com writes:
>>
>>
>>>@infix
>>>def interval(x, y): return range(x, y+1) # 2 parameters needed
>>>
>>>This may allow:
>>>assert 5 interval 9 == interval(5,9)
>>
>>
>> I don't like the idea of turning words into operators. I'd much rather
>> see something like:
>>
>> @infix('..')
>> def interval(x, y):
>> return range(x, y + 1)
>>
>> assert 5 .. 9 == interval(5, 10)
>>
>> This would also allow us to start working on doing away with the magic
>> method names for current operators, which I think would be an
>> improvement.
>>
>Well, perhaps you can explain how a change that's made at run time
>(calling the decorator) can affect the parser's compile time behavior,
>then. At the moment, IIRC, the only way Python code can affect the
>parser's behavior is in the __future__ module, which must be imported at
>the very head of a module.

Good point, which I didn't address in my reply. (in fact I said I liked
@infix('..') for punctuation-char-named ops, but I was too busy with my
idea to think about that implementation ;-)

Potentially, you could do it dynamically with a frame flag (to limit the damage)
which said, check all ops against a dict of overloads and infix definitions while
executing byte code for this frame. Of course, the compiler would have to defer
some kinds of syntax error 'til run time. I.e., '..' would have to be translated
to OP_POSSIBLE_CUSTOM_INFIX or such. I doubt if it would be worth doing.

OTOH, I think my suggestion might be ;-) I.e., just do a macro-like (not a general
macro capability for this!!) translation of expressions with dots at both ends and
no embedded spaces (intial thought, to make things easier) thus:
x .expr. y => expr(x, y)

when expr is a simple name, you can use that expression format to call a two-arg function
of that name, e.g.,

def interval(x, y): return xrange(x, y+1)
for i in x .interval. y: print i, # same as for i in interval(x, y): print i,

but you could also write stuff like

def GE(x,y): return x is MY_INFINITY or x >= y
if x .GE. y: print 'x is greater than y'

The .expr. as expression would allow module references or tapping into general
expression and attribute magic etc. I.e., (untested)

from itertools import chain as CHAIN
for k,v in d1.items() .CHAIN. d2.items(): print k, v

or if you had itertools imported and liked verbose infix spelling:

for k,v in d1.items() .itertools.chain. d2.items(): print k, v

or define a hidden-attribute access operation using an object's dict

def HATTR(obj, i):
try: return vars(obj)[i]
except KeyError: raise AttributeError('No such attribute: %r', i)

if thing .HATTR. 2 == 'two': print 'well spelled'

or
from rational import rat as RAT

if x .RAT. y > 1 .RAT. 3: do_it()

or
your turn ;-)

>
>> As others have pointed out, you do need to do something about operator
>> precedence. For existing operators, that's easy - they keep their
>> precedence. For new operators, it's harder.
>>
>Clearly you'd need some mechanism to specify preference, either
>relatively or absolutely. I seem to remember Algol 68 allowed this.
>
>> You also need to worry about binding order. At the very least, you
>> can specify that all new operators bind left to right. But that might
>> not be what you want.
>>
>Associativity and precedence will also have to affect the parsing of the
>code, of course. Overall this would be a very ambitious change.
>

My suggestion if implemented with left-right priority would be easy to
implement (I think ;-) And you could always override order with parens.

Regards,
Bengt Richter

Bengt Richter

unread,

Dec 29, 2004, 11:46:25 PM12/29/04

to

On Thu, 30 Dec 2004 03:37:38 GMT, Andrew Dalke <da...@dalkescientific.com> wrote:

>Bengt Richter:
>> OTOH, there is precedent in e.g. fortran (IIRC) for named operators of the
>> form .XX. -- e.g., .GE. for >= so maybe there could be room for both.
>
>> Hm, you could make
>>
>> x .infix. y
>
>
>> x .op1. y .op2. z => op2(op1(x, y), z)
>
>The problem being that that's already legal syntax

D'oh ;-)

>
>>>> class Xyzzy:
>... def __init__(self):
>... self.op1 = self.op2 = self.y = self
>... self.z = "Nothing happens here"
>...
>>>> x = Xyzzy()
>>>> x .op1. y .op2. z
>'Nothing happens here'
>>>>

Ok, well, that's happened to me before ;-)
We'll have to find a way to make it illegal, but it's not likely to be quite as clean.

x ..OP y
x ./OP y
x .<OP y
x .<OP>. y
X ._OP_. y
x ..OP.. y # for symmetry ;-)

X .(OP). y # emphasizes the .expression. returning a function accepting two args

That might be the best one.

OTOH some time ago I was thinking of .(statements). as a possible tokenizer-time star-gate into
a default-empty tokenizer-dynamically-created module which would essentially exec the statements
in that module and return the value of the last expression as a string for insertion into the token
source code stream at that point being tokenized.

Thus e.g. you could have source that said

compile_time = .(__import__('time').ctime()).

and get a time stamp string into the source text at tokenization time.

I had also thought obj.(expr) could be syntactic sugar for obj.__dict__[expr]
but that would also interfere ;-)

So maybe .(OP). should for infix, and .[stargate exec args]. should be for that ;-)

Regards,
Bengt Richter

Bengt Richter

unread,

Dec 29, 2004, 11:55:40 PM12/29/04

to

On Thu, 30 Dec 2004 03:55:12 GMT, bo...@oz.net (Bengt Richter) wrote:
[.. buncha stuff alzheimersly based on x<spaces>.attr not being parsed as x.attr ;-/ ]

> from rational import rat as RAT
>
> if x .RAT. y > 1 .RAT. 3: do_it()
>
>or
> your turn ;-)
>

Andrew got there first ;-)
Still, see my reply to his for more opportunities ;-)

Regards,
Bengt Richter

Bengt Richter

unread,

Dec 30, 2004, 12:28:41 AM12/30/04

to

On Thu, 30 Dec 2004 04:46:25 GMT, bo...@oz.net (Bengt Richter) wrote:
[...]

>Ok, well, that's happened to me before ;-)
>We'll have to find a way to make it illegal, but it's not likely to be quite as clean.
>
> x ..OP y
> x ./OP y
> x .<OP y
> x .<OP>. y
> X ._OP_. y

Bzzzt! ;-/

> x ..OP.. y # for symmetry ;-)
>
> X .(OP). y # emphasizes the .expression. returning a function accepting two args
>
>That might be the best one.

Regards,
Bengt Richter

Steve Holden

unread,

Dec 30, 2004, 8:07:35 AM12/30/04

to

Bengt Richter wrote:

> On Wed, 29 Dec 2004 13:11:43 -0500, Steve Holden <st...@holdenweb.com> wrote:
>

[...]

>>
>>Well, perhaps you can explain how a change that's made at run time
>>(calling the decorator) can affect the parser's compile time behavior,
>>then. At the moment, IIRC, the only way Python code can affect the
>>parser's behavior is in the __future__ module, which must be imported at
>>the very head of a module.
>
> Good point, which I didn't address in my reply. (in fact I said I liked
> @infix('..') for punctuation-char-named ops, but I was too busy with my
> idea to think about that implementation ;-)
>

Well, that explains the lack of detail. I realize that you are more
likely than most to be able to come up with an implementation.

> Potentially, you could do it dynamically with a frame flag (to limit the damage)
> which said, check all ops against a dict of overloads and infix definitions while
> executing byte code for this frame. Of course, the compiler would have to defer
> some kinds of syntax error 'til run time. I.e., '..' would have to be translated
> to OP_POSSIBLE_CUSTOM_INFIX or such. I doubt if it would be worth doing.
>

Right. I can't say I think deferring syntax errors until runtime is a
good idea.

Well, I can see that Fortran programmers might appreciate it :-). And I
suppose that the syntax is at least regular enough to be lexed with the
current framework, give or take. It would lead to potential breakage due
to the syntactic ambiguity between

module .function. attribute

and

module.function.attribute

though I don't honestly think that most people currently insert
gratuitous spaces into attribute references.

>
[precedence and associativity]

>
> My suggestion if implemented with left-right priority would be easy to
> implement (I think ;-) And you could always override order with parens.

Now you're just trying to make it easy :-)

regards
Steve
--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/

Holden Web LLC +1 703 861 4237 +1 800 494 3119

Mike Meyer

unread,

Dec 30, 2004, 4:16:42 PM12/30/04

to

Jp Calderone <exa...@divmod.com> writes:

> On Wed, 29 Dec 2004 12:38:02 -0600, Mike Meyer <m...@mired.org> wrote:
>>Jp Calderone <exa...@divmod.com> writes:
>> > This aside, not even Python 3.0 will be flexible enough to let you define
>> > an infix decorator. The language developers are strongly against supporting
>> > macros, which is what an infix decorator would amount to.
>>
>> Could you please explain how allowing new infix operators amount to
>> supporting macros?
>
> You misread - I said "what an infix decorator would amount to". Adding
> new infix operators is fine and in no way equivalent to macros.

You misread, I said "allowing new infix oerpators amount to supporting
macros?"

>> > Now, they might be convinced to add a new syntax that makes a function
>> > into an infix operator. Perhaps something like this:
>> >
>> > def &(..)(x, y):
>> > return range(x, y + 1)
>>
>> And while you're at it, explain how this method of defining new infix
>> operators differs from using decorators in such a way that it doesn't
>> amount to supporting macros.
>
> Simple. You can't do anything except define a new infix operator with
> the hypothetical "def &( <operator> )" syntax. With real macros, you can
> define new infix operators, along with any other syntactic construct your
> heart desires.

You failed to answer the question. We have two proposed methods for
adding new infix operators. One uses decorators, one uses a new magic
syntax for infix operators. Neither allows you to do anything except
declare new decorators. For some reason, you haven't explained yet,
you think that using decorators to declare infix operators would
amount to macros, yet using a new syntax wouldn't. Both require
modifying the grammer of the language accepted by the parser. How is
it that one such modification "amounts to macros", whereas the other
doesn't?

Mike Meyer

unread,

Dec 30, 2004, 4:18:54 PM12/30/04

to

Steve Holden <st...@holdenweb.com> writes:

> Mike Meyer wrote:
>
>> Steve Holden <st...@holdenweb.com> writes:
>>
> [...]
>>>
>>>Well, perhaps you can explain how a change that's made at run time
>>>(calling the decorator) can affect the parser's compile time behavior,
>>>then. At the moment, IIRC, the only way Python code can affect the
>>>parser's behavior is in the __future__ module, which must be imported
>>>at the very head of a module.
>> By modifying the parsers grammer at runtime. After all, it's just a
>> data structure that's internal to the compiler.
>>
> But the parser executes before the compiled program runs, was my
> point. What strange mixture of compilation and interpretation are you
> going to use so the parser actually understands that ".." (say) is an
> operator before the operator definition has been executed?

Ok, current decorators won't do. Clearly, any support for adding infix
operators is going to require compiler support.

Mike Meyer

unread,

Dec 30, 2004, 4:21:00 PM12/30/04

to

"Terry Reedy" <tjr...@udel.edu> writes:

and import. I.e., you could do:

import french
import python_with_french_keywords

Jp Calderone

unread,

Dec 30, 2004, 6:39:43 PM12/30/04

to pytho...@python.org

On Thu, 30 Dec 2004 15:16:42 -0600, Mike Meyer <m...@mired.org> wrote:
>Jp Calderone <exa...@divmod.com> writes:
>
> > On Wed, 29 Dec 2004 12:38:02 -0600, Mike Meyer <m...@mired.org> wrote:
> >>Jp Calderone <exa...@divmod.com> writes:
> >> > This aside, not even Python 3.0 will be flexible enough to let you define
> >> > an infix decorator. The language developers are strongly against supporting
> >> > macros, which is what an infix decorator would amount to.
> >>
> >> Could you please explain how allowing new infix operators amount to
> >> supporting macros?
> >
> > You misread - I said "what an infix decorator would amount to". Adding
> > new infix operators is fine and in no way equivalent to macros.
>
> You misread, I said "allowing new infix oerpators amount to supporting
> macros?"

Clearly neither of us is following the point of the other here.
Let me start over.

Defining an infix decorator means changing the language in such
a way that one function can change the syntax rules used in the
definition of another function. It has nothing to do with the '@'
syntax, since:

@x
def y(...): ...

is no different than:

def y(...): ...
y = x(y)

If one works to define an infix operator, both should. Otherwise,
what is being used is not a "decorator" as it is currently defined.

So my initial point was that If @infix("..") somehow works but
y = infix("..")(y) does not, it is not a decorator. If it does,
then I believe what you have added to Python is macros.

>
> >> > Now, they might be convinced to add a new syntax that makes a function
> >> > into an infix operator. Perhaps something like this:
> >> >
> >> > def &(..)(x, y):
> >> > return range(x, y + 1)
> >>
> >> And while you're at it, explain how this method of defining new infix
> >> operators differs from using decorators in such a way that it doesn't
> >> amount to supporting macros.
> >
> > Simple. You can't do anything except define a new infix operator with
> > the hypothetical "def &( <operator> )" syntax. With real macros, you can
> > define new infix operators, along with any other syntactic construct your
> > heart desires.
>
> You failed to answer the question. We have two proposed methods for
> adding new infix operators. One uses decorators, one uses a new magic
> syntax for infix operators. Neither allows you to do anything except
> declare new decorators. For some reason, you haven't explained yet,

^^^^^^^^^^
I assume you mean "infix operators" here. If so, this is likely
the point on which we disagree.

If you are suggesting adding "@infix('[punctuation]')" to the Python
grammar, then I understand how you would see that as something
less than macros. This is not how I interpreted the remarks.

> you think that using decorators to declare infix operators would
> amount to macros, yet using a new syntax wouldn't. Both require
> modifying the grammer of the language accepted by the parser. How is
> it that one such modification "amounts to macros", whereas the other
> doesn't?

Adding a new syntax for infix operator declaration is the same
as the addition of any other syntax. It has nothing to do with
macros, it is just new syntax.

Adding the ability for arbitrary functions to modify the syntax
used to define other arbtitrary functions, what would have to
happen for @infix('..') to work, is adding macros.

Hope I have expressed things more clearly,

Jp

bearoph...@lycos.com

unread,

Jan 5, 2005, 7:40:47 PM1/5/05

to

Thank you to all the gentle people that has given me some comments, and
sorry for bothering you...

Doug Holton:

>This will also likely never appear in Python.

I know, that's why I've defined it "wild".

>Also, you might request the NetLogo and StarLogo developers to support
Jython (in addition to Logo) scripting in their next version<

I was suggesting the idea of adding the complex data structures of
NetLogo (patches, etc) to the normal Python.

-------------

Andrew Dalke:

>(BTW, it needs to be 1 .. 12 not 1..12 because 1. will be interpreted
as the floating point value "1.0".)<

Uhm, I have to fix my ignorance about parsers.
Cannot a second "." after the first tell that the first "." isn't in
the middle of a floating point number?

>In Pascal it works because you also specify the type and Pascal has an
incr while Python doesn't.<

The Pascal succ function (that works on chars and all integer types) is
often useful (it's easy to define it in Python too).

>>This may allow: assert 5 interval 9 == interval(5,9)

>Maybe you could give an example of when you need this in real life?<

Every time you have a function with 2 parameters, you can choose to use
it infix.

>Does this only work for binary or is there a way to allow unary or
other n-ary (including 0-ary?) functions?<

It's for binary functions only.
For unary the normal fun(x) syntax is good enough, and for n-ary the
sintax becomes unnecessary complex, and you can use the normal function
syntax again.

>But to someone with C experience or any language which derives its
formatting string from C, Python's is easier to understand than your
Pascal one.<

You can be right, but "understanding" and "being already used to
something" are still two different things :-)

>A Python view is that there should be only one obvious way to do a
task. Supporting both C and Pascal style format strings breaks that.<

Okay.

>I don't think Pascal is IEEE enough.<

Well, if you assign that FP number inside the program (and not as a
constant) Delphi 4 too gives a more "correct" answer ^_^ (I don't
know/remember why there are such differences inside Delphi between
constants and variables).

>note also that the Pascal-style formatting strings are less capable
than Python's,<

I know...

>though few people use features like<

Right.

>A real-life example would also be helpful here.<

(I was parsing trees for Genetic Programming).
People here can probably suggest me 10 alternative ways to do what I
was trying to do :-)
As list comprehensions, that suggestion of mine cannot solve new kinds
of problems, it's just another way of doing things.

>What does map(len, "Blah", level = 200) return?<

Well:
"c" == "c"[0][0][0][0][0][0]
map(len, "blah") == [1, 1, 1, 1]
So I think that answer is still [1, 1, 1, 1].

>You need to learn more about the Pythonic way of thinking of things.
The usual solution for this is to have "level = None".<

To Steven Bethard I've suggested an alternative syntax (using a
level-ed flatten with a standard map command).

>There's also tmPython.<

Thank you, the screenshoots are quite nice :-)
http://dkbza.org/11.0.html

--------------

Steven Bethard:

>Since I can't figure it out intuitively (even with examples), I don't
think this syntax is any less inscrutable than '%<width>.<decimals>f'.<

Well, I haven't given a textual explanation, but just few examples.

>My suspicion is that you're just biased by your previous use of
Pascal.<

This is possible.

>This packs two things into map -- the true mapping behaviour (applying

a function to a list) and the flattening of a list.<

Okay, then this is my suggestion for the syntax of an iterable
xflatten:
xflatten(sequence, level=-1, monadtuple=False, monadstr=True,
safe=False])

- level allows to specify the mapping level:
level=0 no flattening.
level=1 the flattening is applied to the first and second level.
Etc.
And like in the indexing of lists:
level=-1 (default) means the flattening is applied up to the leaves.
level=-2 flattens up to pre-leaves.
Etc.
- monadtuple (default False) if True tuples are monads.
- monadstr (default True) if False then strings with len>1 are
sequences too.
- safe (default False) if True it cheeks (with something like an
iterative isrecursive) for recursive references inside the sequence.

>(Also, Google for flatten in the python-list -- you should find a
recent thread about it.)<

I've discussed it too in the past :-)
http://groups-beta.google.com/group/comp.lang.python/browse_thread/thread/d0ba195d98f35f66/

>and that your second example gains nothing over<

Right, but maybe with that you can unify the def and lambda into just
one thing ^_^

>def foo(x):
> globals()['y'] = globals()['y'] + 2
>Not exactly the same syntax, but pretty close.

Thank you for this idea.

>I'll second that. Please, "Bearophile", do us the courtesy of checking

[...] before posting another such set of questions. While most of the

people on this list are nice enough to answer your questions anyway,
the answers are already out there for at least half of your questions,
if you would do us the courtesy of checking first.<

I've read many documents, articles and PEPs and, but I'm still new, so
I've missed many things. I'm sorry... I'm doing my best.

-----------

Terry J. Reedy

>I also suggest perusing the archived PyDev (Python Development mailing
list) summaries for the last couple of years (see python.org). Every
two weeks, Brett Cannon has condensed everything down to a few pages.<

Okay, thank you.

Bear hugs,
Bearophile

Timo Virkkala

unread,

Jan 6, 2005, 5:08:13 AM1/6/05

to

bearoph...@lycos.com wrote:
> Andrew Dalke:
>>(BTW, it needs to be 1 .. 12 not 1..12 because 1. will be interpreted
>> as the floating point value "1.0".)<
> Uhm, I have to fix my ignorance about parsers.
> Cannot a second "." after the first tell that the first "." isn't in
> the middle of a floating point number?

Python uses an LL(1) parser. From Wikipedia:
""" LL(1) grammars, although fairly restrictive, are very popular because the
corresponding LL parsers only need to look at the next token to make their
parsing decisions."""

>>>This may allow: assert 5 interval 9 == interval(5,9)
>>Maybe you could give an example of when you need this in real life?<
> Every time you have a function with 2 parameters, you can choose to use
> it infix.

But why would you want to? What advantage does this give over the standard
syntax? Remember, in Python philosophy, there should be one obvious way to do
it, and preferably only one. Adding a whole another way of calling functions
complicates things without adding much advantage. Especially so because you
suggest it is only used for binary, i.e. two-parameter functions.

Steve Holden

unread,

Jan 6, 2005, 8:45:21 AM1/6/05

to

Timo Virkkala wrote:

> bearoph...@lycos.com wrote:
>
>> Andrew Dalke:
>>
>>> (BTW, it needs to be 1 .. 12 not 1..12 because 1. will be interpreted
>>> as the floating point value "1.0".)<
>>
>> Uhm, I have to fix my ignorance about parsers.
>> Cannot a second "." after the first tell that the first "." isn't in
>> the middle of a floating point number?
>
>
> Python uses an LL(1) parser. From Wikipedia:
> """ LL(1) grammars, although fairly restrictive, are very popular
> because the corresponding LL parsers only need to look at the next token
> to make their parsing decisions."""
>

Indeed, but if ".." is defined as an acceptable token then there's
nothing to stop a strict LL(1) parser from disambiguating the cases in
question. "Token" is not the same thing as "character".

>>>> This may allow: assert 5 interval 9 == interval(5,9)
>>>
>>> Maybe you could give an example of when you need this in real life?<
>>
>> Every time you have a function with 2 parameters, you can choose to use
>> it infix.
>
>
> But why would you want to? What advantage does this give over the
> standard syntax? Remember, in Python philosophy, there should be one
> obvious way to do it, and preferably only one. Adding a whole another
> way of calling functions complicates things without adding much
> advantage. Especially so because you suggest it is only used for binary,
> i.e. two-parameter functions.

This part of your comments I completely agree with. However, we are used
to people coming along and suggesting changes to Python on
comp.lang.python. Ironically it's often those with less experience of
Python who suggest it should be changed to be more like some other language.

One of the things I like best about c.l.py is its (almost) unfailing
politeness to such posters, often despite long stream-of-consciousness
posts suggesting fatuous changes (not necessarily the case here, by the
way). The level of debate is so high, and so rational, that the change
requesters are often educated as to why their suggested changes wouldn't
be helpful or acceptable, and having come to jeer they remain to
whitewash, to use an analogy from "Tom Sawyer" [1].

All in all a very pleasant change from "F*&% off and die, noob".

regards
Steve

[1]: http://www.cliffsnotes.com/WileyCDA/LitNote/id-2,pageNum-10.html

Andrew Dalke

unread,

Jan 6, 2005, 2:24:52 PM1/6/05

to

Me

>>>> (BTW, it needs to be 1 .. 12 not 1..12 because 1. will be interpreted
>>>> as the floating point value "1.0".)<

Steve Holden:

> Indeed, but if ".." is defined as an acceptable token then there's
> nothing to stop a strict LL(1) parser from disambiguating the cases in
> question. "Token" is not the same thing as "character".

Python's tokenizer is greedy and doesn't take part in the
lookahead. When it sees 1..12 the longest match is for "1."
which is a float. What remains is ".2". That also gets tokenized
as a float. <float> <float> is not allowed in Python so the
parser raises an compile time SyntaxError exception

>>> 1..12
File "<stdin>", line 1
1..12
^
SyntaxError: invalid syntax
>>>

Consider the alternative of "1..a". Again "1." is tokenized
as a float. What remains is ".a". The longest match is "."
with "a" remaining. Then the next token is "a". The token
stream looks like
<float 1.0><dot><name "a">
which gets converted to the same thing as
getattr(1.0, "a")

That is legal syntax but floats don't have the "a" property
so Python raises an AttributeError at run-time.

>>> 1..a
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'float' object has no attribute 'a'
>>>

Here's a property that does exist

>>> 1..__abs__
<method-wrapper object at 0x547d0>
>>>

Because of the greedy lexing it isn't possible to do
"1.__abs__" to get the __abs__ method of an integer.
That's because the token stream is

which is a syntax error.

>>> 1.__abs__
File "<stdin>", line 1
1.__abs__
^
SyntaxError: invalid syntax
>>>

One way to avoid that is to use "1 .__abs__". See the
space after the "1"? The tokenizer for this case creates

which create code equivalent to getattr(1, "__abs__") and
is valid syntax

>>> 1 .__abs__
<method-wrapper object at 0x54ab0>
>>>

Another option is to use parentheses: (1).__abs__

I prefer this latter option because the () is easier to
see than a space. But I prefer using getattr even more.

Andrew
da...@dalkescientific.com

Bengt Richter

unread,

Jan 6, 2005, 4:09:45 PM1/6/05

to

On Thu, 06 Jan 2005 19:24:52 GMT, Andrew Dalke <da...@dalkescientific.com> wrote:

>Me
>>>>> (BTW, it needs to be 1 .. 12 not 1..12 because 1. will be interpreted
>>>>> as the floating point value "1.0".)<
>
>Steve Holden:
>> Indeed, but if ".." is defined as an acceptable token then there's
>> nothing to stop a strict LL(1) parser from disambiguating the cases in
>> question. "Token" is not the same thing as "character".
>
>Python's tokenizer is greedy and doesn't take part in the
>lookahead. When it sees 1..12 the longest match is for "1."
>

But it does look ahead to recognize += (i.e., it doesn't generate two
successive also-legal tokens of '+' and '=')
so it seems it should be a simple fix.

>>> for t in tokenize.generate_tokens(StringIO.StringIO('a=b+c; a+=2; x..y').readline):print t
...
(1, 'a', (1, 0), (1, 1), 'a=b+c; a+=2; x..y')
(51, '=', (1, 1), (1, 2), 'a=b+c; a+=2; x..y')
(1, 'b', (1, 2), (1, 3), 'a=b+c; a+=2; x..y')
(51, '+', (1, 3), (1, 4), 'a=b+c; a+=2; x..y')
(1, 'c', (1, 4), (1, 5), 'a=b+c; a+=2; x..y')
(51, ';', (1, 5), (1, 6), 'a=b+c; a+=2; x..y')
(1, 'a', (1, 7), (1, 8), 'a=b+c; a+=2; x..y')
(51, '+=', (1, 8), (1, 10), 'a=b+c; a+=2; x..y')
(2, '2', (1, 10), (1, 11), 'a=b+c; a+=2; x..y')
(51, ';', (1, 11), (1, 12), 'a=b+c; a+=2; x..y')
(1, 'x', (1, 13), (1, 14), 'a=b+c; a+=2; x..y')
(51, '.', (1, 14), (1, 15), 'a=b+c; a+=2; x..y')
(51, '.', (1, 15), (1, 16), 'a=b+c; a+=2; x..y')
(1, 'y', (1, 16), (1, 17), 'a=b+c; a+=2; x..y')
(0, '', (2, 0), (2, 0), '')

Regards,
Bengt Richter

Andrew Dalke

unread,

Jan 7, 2005, 1:04:01 AM1/7/05

to

Bengt Richter:

> But it does look ahead to recognize += (i.e., it doesn't generate two
> successive also-legal tokens of '+' and '=')
> so it seems it should be a simple fix.

But that works precisely because of the greedy nature of tokenization.
Given "a+=2" the longest token it finds first is "a" because "a+"
is not a valid token. The next token is "+=". It isn't just "+"
because "+=" is valid. And the last token is "2".

Compare to "a+ =2". In this case the tokens are "a", "+", "=", "2"
and the result is a syntax error.

> >>> for t in tokenize.generate_tokens(StringIO.StringIO('a=b+c; a+=2; x..y').readline):print t
> ...

This reinforces what I'm saying, no? Otherwise I don't understand
your reason for showing it.

> (51, '+=', (1, 8), (1, 10), 'a=b+c; a+=2; x..y')

As I said, the "+=" is found as a single token, and not as two
tokens merged into __iadd__ by the parser.

After some thought I realized that a short explanation may be helpful.
There are two stages in parsing a data file, at least in the standard
CS way of viewing things. First, tokenize the input. This turns
characters into words. Second, parse the words into a structure.
The result is a parse tree.

Both steps can do a sort of look-ahead. Tokenizers usually only look
ahead one character. These are almost invariably based on regular
expressions. There are many different parsing algorithms, with
different tradeoffs. Python's is a LL(1) parser. The (1) means it
can look ahead one token to resolve ambiguities in a language.
(The LL is part of a classification scheme which summarizes how
the algorithm works.)

Consider if 1..3 were to be legal syntax. Then the tokenizer
would need to note the ambiguity that the first token could be
a "1." or a "1". If "1." then then next token could be a "."
or a ".3". In fact, here is the full list of possible choices

<1.> <.> <3> same as getattr(1., 3)
<1> <.> <.> 3 not legal syntax
<1.> <.3> not legal syntax
<1> <..> <3> legal with the proposed syntax.

Some parsers can handle this ambiguity, but Python's
deliberately does not. Why? Because people also find
it tricky to resolve ambiguity (hence problems with
precedence rules). After all, should 1..2 be interpreted
as 1. . 2 or as 1 .. 2? What about 1...2? (Is it 1. .. 2,
1 .. .2 or 1. . .2 ?)

Andrew
da...@dalkescientific.com

Steve Holden

unread,

Jan 7, 2005, 6:36:31 AM1/7/05

to

Andrew Dalke wrote:

> Bengt Richter:
>
>>But it does look ahead to recognize += (i.e., it doesn't generate two
>>successive also-legal tokens of '+' and '=')
>>so it seems it should be a simple fix.
>
>
> But that works precisely because of the greedy nature of tokenization.
> Given "a+=2" the longest token it finds first is "a" because "a+"
> is not a valid token. The next token is "+=". It isn't just "+"
> because "+=" is valid. And the last token is "2".
>

[...]

You're absolutely right, of course, Andrew, and personally I don't think
that this is worth trying to fix. But the original post I responded to
was suggesting that an LL(1) grammar couldn't disambiguate "1." and
"1..3", which assertion relied on a slight fuzzing of the lines between
lexical and syntactical analysis that I didn't want to leave unsharpened.

The fact that Python's existing tokenizer doesn't allow multi-character
tokens beginning with a dot after a digit (roughly speaking) is what
makes the whole syntax proposal infeasibly hard to adapt to.

regards
Steve

Bengt Richter

unread,

Jan 7, 2005, 4:31:25 PM1/7/05

to

On Fri, 07 Jan 2005 06:04:01 GMT, Andrew Dalke <da...@dalkescientific.com> wrote:

>Bengt Richter:
>> But it does look ahead to recognize += (i.e., it doesn't generate two
>> successive also-legal tokens of '+' and '=')
>> so it seems it should be a simple fix.
>
>But that works precisely because of the greedy nature of tokenization.

So what happens if you apply greediness to a grammar that has both . and ..
as legal tokens? That's the point I was trying to make. The current grammar
unfortunately IMHO tries to tokenize floating point numbers, which creates a problem
for both numbers and (if you don't isolate it with surrounding spaces) the .. token.

There would UIAM be no problem recognizing 1 .. 2 but 1..2 has the problem that
the current tokenizer recognizes 1. as number format. Otherwise the greediness would
work to solve the 1..2 "problem."

IMHO it is a mistake to form floating point at the tokenizer level, and a similar mistake
follows at the AST level in using platform-specific floating point constants (doubles) to
represent what really should preserve full representational accuracy to enable translation
to other potential native formats e.g. if cross compiling. Native floating point should IMO
not be formed until the platform to which it is supposed to be native is identified.

IOW, I think there is a fix: keep tokenizing greedily and tokenize floating point as
a sequence of integers and operators, and let <integer><dot><integer> be translated by
the compiler to floating point, and <integer><dotdot><integer> be translated to the
appropriate generator expression implementation.

>Given "a+=2" the longest token it finds first is "a" because "a+"
>is not a valid token. The next token is "+=". It isn't just "+"
>because "+=" is valid. And the last token is "2".
>

Exactly. Or I am missing something? (which does happen ;-)

>Compare to "a+ =2". In this case the tokens are "a", "+", "=", "2"
>and the result is a syntax error.

Sure.

>
>> >>> for t in tokenize.generate_tokens(StringIO.StringIO('a=b+c; a+=2; x..y').readline):print t
>> ...
>
>This reinforces what I'm saying, no? Otherwise I don't understand
>your reason for showing it.

It does reinforce your saying the matching is greedy, yes. But that led me to
think that .. could be recognized without a problem, given a grammar fix.

>
>> (51, '+=', (1, 8), (1, 10), 'a=b+c; a+=2; x..y')
>
>As I said, the "+=" is found as a single token, and not as two
>tokens merged into __iadd__ by the parser.

No argument.

>
>After some thought I realized that a short explanation may be helpful.
>There are two stages in parsing a data file, at least in the standard
>CS way of viewing things. First, tokenize the input. This turns
>characters into words. Second, parse the words into a structure.
>The result is a parse tree.
>
>Both steps can do a sort of look-ahead. Tokenizers usually only look
>ahead one character. These are almost invariably based on regular
>expressions. There are many different parsing algorithms, with
>different tradeoffs. Python's is a LL(1) parser. The (1) means it
>can look ahead one token to resolve ambiguities in a language.
>(The LL is part of a classification scheme which summarizes how
>the algorithm works.)
>
>Consider if 1..3 were to be legal syntax. Then the tokenizer
>would need to note the ambiguity that the first token could be
>a "1." or a "1". If "1." then then next token could be a "."
>or a ".3". In fact, here is the full list of possible choices
>
> <1.> <.> <3> same as getattr(1., 3)
> <1> <.> <.> 3 not legal syntax
> <1.> <.3> not legal syntax
> <1> <..> <3> legal with the proposed syntax.
>

Right, but a grammar fix to handle floating point "properly" (IMHO ;-)
would resolve that. Only the last would be legal at the tokenizer stage.

>Some parsers can handle this ambiguity, but Python's
>deliberately does not. Why? Because people also find

I'm not sure what you mean. If it has a policy of greedy matching,
that does handle some ambiguities in a particular way.

>it tricky to resolve ambiguity (hence problems with
>precedence rules). After all, should 1..2 be interpreted
>as 1. . 2 or as 1 .. 2? What about 1...2? (Is it 1. .. 2,

^^^^^^[1] ^^^^^^^[2]
[1] plainly (given my grammar changes ;-)
[2] as plainly not, since <1> would not greedily accept '.' to make <1.>

>1 .. .2 or 1. . .2 ?)

^^^^^^[3] ^^^^^^^[4]
[3] no, because greed would recognize an ellipsis <...>
[4] ditto. Greedy tokenization would produce <1> <...> <2> for the compiler.

Regards,
Bengt Richter

Nick Coghlan

unread,

Jan 8, 2005, 3:22:53 AM1/8/05

to Python List

Bengt Richter wrote:
> IOW, I think there is a fix: keep tokenizing greedily and tokenize floating point as
> a sequence of integers and operators, and let <integer><dot><integer> be translated by
> the compiler to floating point, and <integer><dotdot><integer> be translated to the
> appropriate generator expression implementation.

That would be:

<int-literal><dot><int-literal> -> float(<int-literal> + "." + <int-literal>)
<int-literal><dot><identifier> -> getattr(int(<int-literal>), <identifier>)
<int-literal><dot><dot><int-literal> -> xrange(<int-literal>, <int-literal>)

However, the problem comes when you realise that 1e3 is also a floating point
literal, as is 1.1e3.

Cheers,
Nick.

--
Nick Coghlan | ncog...@email.com | Brisbane, Australia
---------------------------------------------------------------
http://boredomandlaziness.skystorm.net

Bengt Richter

unread,

Jan 8, 2005, 5:55:36 AM1/8/05

to

On Sat, 08 Jan 2005 18:22:53 +1000, Nick Coghlan <ncog...@iinet.net.au> wrote:

>Bengt Richter wrote:
>> IOW, I think there is a fix: keep tokenizing greedily and tokenize floating point as
>> a sequence of integers and operators, and let <integer><dot><integer> be translated by
>> the compiler to floating point, and <integer><dotdot><integer> be translated to the
>> appropriate generator expression implementation.
>
>That would be:
>
><int-literal><dot><int-literal> -> float(<int-literal> + "." + <int-literal>)
><int-literal><dot><identifier> -> getattr(int(<int-literal>), <identifier>)
><int-literal><dot><dot><int-literal> -> xrange(<int-literal>, <int-literal>)
>
>However, the problem comes when you realise that 1e3 is also a floating point
>literal, as is 1.1e3.
>

Ok, that requires a little more imagination ;-)

I think it can be solved, but I haven't explored it all the way through ;-)

The key seems to be to be to condition the recognition of tokens as if recognizing
an old potentially floating point number, but emitting number-relevant separate tokens
so long as there is no embedded spaces. When a number ends, emitting an end marker should
permit the compiler to deal with the various compositions.

We still aren't looking ahead more than one, but we are carrying context, just as we do
to accumulate digits of an integer or characters of a name, but the context may continue
and determine what further tokens are emitted. E.g. the 'e' in the embedded numeric context,
becomes <fexp> rather than a name. In the following, <eon> :== end of number token

1.1 -> <1><dot><1><eon>
1 .1 -> <1><eon><dot><1><eon>
1.e1 -> <1><dot><fexp><1><eon>
1 .e1 -> <1><eon><dot><e1>
1.2e3 -> <1><dot><2><fexp><3><eon>
1..2 -> <1><eon><doubledot><1><eon>
1. .2 -> <1><dot><eon><dot><2><eon> (syntax error)
1 ..2 -> <1><eon><doubledot><1><eon>

I just have the feeling that there is a solution, whatever the hiccups ;-)

Regards,
Bengt Richter

beli...@aol.com

unread,

Jan 8, 2005, 9:18:39 AM1/8/05

to

Bengt Richter wrote:

>OTOH, there is precedent in e.g. fortran (IIRC) for named operators of
the
>form .XX. -- e.g., .GE. for >= so maybe there could be room for both.

Yes, but in Fortran 90 "==", ">=" etc. are equivalent to ".EQ." and
".GE.". It is also possible to define operators on native and
user-defined types, so that

Y = A .tx. B

can be written instead of the expression with the F90 intrinsic
functions

Y = matmul(transpose(A),B)

The Fortran 95 package Matran at
http://www.cs.umd.edu/~stewart/matran/Matran.html uses this approach to
simplify the interface of the Lapack library and provide syntax similar
to that of Matlab and Octave.

I don't know if the syntax of your idea clashes with Python, but it is
viable in general.