ANN: Stackless Python 0.3

Christian Tismer

unread,

Jul 12, 1999, 3:00:00 AM7/12/99

to

ANNOUNCING:

Stackless Python 0.3
A Python Implementation Which
Does Not Use The C Stack

What is it?
A plugin-replacement for core Python.
It should run any program which runs under Python 1.5.2 .
But it does not need space on the C stack.

Why did I write it?
Stackless Python was never written before (afaik), since it
was said to be impossible without major rewrites of core Python.
I am proving the controverse:
It is easy to write, just hard to think.

Recent changes:
Version 0.3 has been written while developing a continuation
module which is about to ship. A couple of enhancements and
changes were necessary to make continuations fly. Some drastic
changes to frame management have been made. Stackless Python
is now also able to keep track of recursive interpreter
incarnations and to detect/resolve possible collisions
between their frames. As a proof of concept, continuationsmodule.c
will ship in a few days.

Who needs it?
At the moment, this is only useful for C programmers
who want to try certain new ideas. Hardcore stuff.
It allows to modify the current execution state by
changing the frame stack chain without restictions,
and it allows for pluggable interpreters on a per-frame-basis.

The possibilities are for instance:

Continuations, with which we will build Coroutines and Generators

Restartable exceptions and Persistent execution state
are the next topic to think about. They are possible.

Stackless extension modules can be built. The new builtin
stackless "map" function is a small example for this.

Coroutines are able to run at the speed of
a single C function call, which makes them a considerable
alternative in certain algorithms.
This is no longer a speculation since I have working
coroutine prototypes, written with continuations.

Status of the project:
Stackless-ness has been implemented and tested with pystone.
pystone works correctly and is about 10 % slower than
with standard Python.

What I need at the moment is
- time to do the right design of coroutines on top of continuations
- your input, your testing, critics and hints.

Some still rough documentation is available at
http://www.pns.cc/stackless/stackless.htm

Source code and a VC++6.0 build for Windows can be found
from the document or directly from
ftp://ftp.pns.cc/pub/stackless_990712.zip
ftp://ftp.pns.cc/pub/stackless_990712_win32.zip

cheers - chris

--
Christian Tismer :^) <mailto:tis...@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net
10553 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
we're tired of banana software - shipped green, ripens at home

Michael Hudson

unread,

Jul 13, 1999, 3:00:00 AM7/13/99

to

Christian Tismer <tis...@appliedbiometrics.com> writes:

> ANNOUNCING:
>
> Stackless Python 0.3
> A Python Implementation Which
> Does Not Use The C Stack
>
> What is it?
> A plugin-replacement for core Python.
> It should run any program which runs under Python 1.5.2 .
> But it does not need space on the C stack.
>

Very nice, but

Python 1.5.42a2 (#5, Jul 13 1999, 07:42:11) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> l=range(10)
>>> l.reverse()
>>> apply(l.sort,())
Traceback (innermost last):
File "<stdin>", line 1, in ?
SystemError: error return without exception set
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

apply isn't totally hosed; it works for Python functions (I think) but
not for builtins.

This causes lots of make test to fail, though not test_builtin,
helpfully.

I've had a poke through the source, but don't have the time just now
to delve into its inner workings. I'll trace it with gdb sometime.

HTH
Michael

PS: To get it to compile I had to remove "static" from in front of
PyFrame_FiniChainFunc * PyFrame_FiniChain = NULL;
in Objects/frameobject.c.

Christian Tismer

unread,

Jul 13, 1999, 3:00:00 AM7/13/99

to

Michael Hudson wrote:
>
> Christian Tismer <tis...@appliedbiometrics.com> writes:
>
> > ANNOUNCING:
> >
> > Stackless Python 0.3
> > A Python Implementation Which
> > Does Not Use The C Stack
> >
> > What is it?
> > A plugin-replacement for core Python.
> > It should run any program which runs under Python 1.5.2 .
> > But it does not need space on the C stack.
> >
>
> Very nice, but

Ouchh!

> >>> apply(l.sort,())
> Traceback (innermost last):
> File "<stdin>", line 1, in ?
> SystemError: error return without exception set
> >>> l
> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>
> apply isn't totally hosed; it works for Python functions (I think) but
> not for builtins.

Does apply_orig behave well?

> This causes lots of make test to fail, though not test_builtin,
> helpfully.
>
> I've had a poke through the source, but don't have the time just now
> to delve into its inner workings. I'll trace it with gdb sometime.

Thank you, I will look into it soon.
I should run all the tests, and not only pystone before I deliver.

> PS: To get it to compile I had to remove "static" from in front of
> PyFrame_FiniChainFunc * PyFrame_FiniChain = NULL;
> in Objects/frameobject.c.

Gaah! Again! I also should not use MS VC++ but GCC I think.

thanks a lot - chris

Christian Tismer

unread,

Jul 13, 1999, 3:00:00 AM7/13/99

to

Michael Hudson wrote:

> Very nice, but

> >>> apply(l.sort,())
> Traceback (innermost last):
> File "<stdin>", line 1, in ?
> SystemError: error return without exception set
> >>> l
> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Here is the patch.
It was one of those "=" instead of "==" bugs.

Line 131 of bltinmodule.c should read now

if (retval == Py_UnwindToken) {
-----------!!------------------

I'm updating the files right now, without changing the version number.

Thanks, Michael!

Michael Hudson

unread,

Jul 13, 1999, 3:00:00 AM7/13/99

to

On Tue, 13 Jul 1999, Christian Tismer wrote:
> Michael Hudson wrote:
>
> > Very nice, but
>
> > >>> apply(l.sort,())
> > Traceback (innermost last):
> > File "<stdin>", line 1, in ?
> > SystemError: error return without exception set
> > >>> l
> > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>
> Here is the patch.
> It was one of those "=" instead of "==" bugs.
>
> Line 131 of bltinmodule.c should read now
>
> if (retval == Py_UnwindToken) {
> -----------!!------------------
>
> I'm updating the files right now, without changing the version number.
>
> Thanks, Michael!
>
>

A pleasure!
Passes all of make test now - except for rfc822, which is odd, and happens
with both stackless and vanilla python, so presumably isn't your fault.

Now, about continuations...

this-is-posted-from-Pine-rather-than-my-normal-gnus-so-I-hope-
it-all-works-OK-ly y'rs
Michael

PS: If anyone wants a patch from latest CVS source to stackless-python
0.3, building a binary called stackless-python (on unix), drop me a line
and I'll email one to you.

--
Oh, very funny. Very sar-cah-stic. http://www.ntk.net
http://www.ntk.net/doh/options.html - ho ho

Michael P. Reilly

unread,

Jul 13, 1999, 3:00:00 AM7/13/99

to

Christian Tismer <tis...@appliedbiometrics.com> wrote:

: Here is the patch.

: It was one of those "=" instead of "==" bugs.

And newbies wonder why Python doesn't have assignment expressions. ;)

-Arcege

Christian Tismer

unread,

Jul 13, 1999, 3:00:00 AM7/13/99

to

You say it. And by default, MSVC doesn't consider this worth a warning.

I was tempted, but decided to leave this remark to you :-)

Paul Prescod

unread,

Jul 13, 1999, 3:00:00 AM7/13/99

to

"Michael P. Reilly" wrote:
>
> And newbies wonder why Python doesn't have assignment expressions. ;)

Cheap shot. Its a problem with C's syntax, not the assignment expression
concept.

--
Paul Prescod - ISOGEN Consulting Engineer speaking for only himself
http://itrc.uwaterloo.ca/~papresco

To me programming is more than an important practical art. It is
also a gigantic undertaking in the foundations of knowledge.
- Grace Hopper

Neel Krishnaswami

unread,

Jul 14, 1999, 3:00:00 AM7/14/99

to

In article <378A40AD...@appliedbiometrics.com>,
Christian Tismer <tis...@appliedbiometrics.com> wrote:
>
> Stackless Python 0.3

>
>The possibilities are for instance:
>
>Continuations, with which we will build Coroutines and Generators
>

>Coroutines are able to run at the speed of a single C function call,
>which makes them a considerable alternative in certain algorithms.
>This is no longer a speculation since I have working coroutine
>prototypes, written with continuations.

I've been looking at Icon, and it occurs to me that if coroutines and
generators were available at the Python level, it might yield a way of
doing string processing that is more "Pythonic" than regexps.

Regexps are nice because when you have a pattern that they can
directly represent, then you can simply specify a pattern and then you
don't have to worry about the tedious bookkeeping of looping over the
string and keeping track of the state variable.

However, when you need to match a pattern that a regexp can't match,
then suddenly you need to break the string into tokens with regexps
and then loop over the pieces and keep track of a bunch of state
variables that don't necessarily correspond to the pieces you are
actually interested in.

This is unpleasant, because a) clumsy bookkeeping is bad, and b)
there's two different sorts of syntax needed to do basically
similar tasks. If we could compose generators just like functions,
then the bookkeeping can be abstracted away and the same approach
will work for arbitrarily complicated parsing tasks.

So I think it would be nice if these lovely toys were available at
the Python level. Is this a possibility?

(I'd love to be able to define coroutines in Python like so:

def evens(z):
for elt in z:
if z % 2 == 0:
suspend(z)

It would probably require some rethinking of Python's iteration
protocol though.)

Neel

Tim Peters

unread,

Jul 15, 1999, 3:00:00 AM7/15/99

to pytho...@python.org

[Neel Krishnaswami]
> ...

> I've been looking at Icon, and it occurs to me that if coroutines and
> generators were available at the Python level, it might yield a way of
> doing string processing that is more "Pythonic" than regexps.

Yes, and you can find much more about this in the very early days of the
String SIG archives.

> Regexps are nice because when you have a pattern that they can
> directly represent, then you can simply specify a pattern and then you
> don't have to worry about the tedious bookkeeping of looping over the
> string and keeping track of the state variable.
>
> However, when you need to match a pattern that a regexp can't match,
> then suddenly you need to break the string into tokens with regexps
> and then loop over the pieces and keep track of a bunch of state
> variables that don't necessarily correspond to the pieces you are
> actually interested in.
>
> This is unpleasant, because a) clumsy bookkeeping is bad, and b)
> there's two different sorts of syntax needed to do basically
> similar tasks.

The latter is one of the arguments Griswold gave in the paper that
introduced Icon, contrasting Icon's uniform approach to SNOBOL4's distinct
pattern and procedural sublanguages.

I don't think he'd make the same argument today! Icon is an idiosyncratic
delight, but most string pattern matching was in fact easier (to write,
modify, or read) in SNOBOL4. Once you start using Icon for complex pattern
matching, you'll soon discover that it's very hard to go back to old code
and disentangle "the pattern" from "the processing" -- *because* everything
looks the same, and it's all intermixed. The distinct sublanguages in
SNOBOL4 enforced a clean separation; and that was more often an aid than a
burden.

Have noted before that writing string matching code in Icon feels a lot like
writing in an assembly language for SNOBOL4. Of course the latter didn't
have Icon's power in other areas (e.g. it's at least possible to program
structural pattern-matchers in Icon, using generators to match e.g. tree
patterns; SNOBOL4's patterns started & ended with strings).

> If we could compose generators just like functions, then the bookkeeping
> can be abstracted away and the same approach will work for arbitrarily
> complicated parsing tasks.

The real advantage of regexps is speed, & that's probably while they'll
always be much more popular. SNOBOL4 and Icon didn't define "bal" builtins
because you couldn't code that pattern yourself <wink>. bal is beyond a
regexp's abilities, but it's still a simple kind of pattern, and just runs
too darned slow if implemented via the recursive definition as run with the
general backtracking machinery.

> So I think it would be nice if these lovely toys were available at
> the Python level. Is this a possibility?

I've been bugging Guido about generators since '91, but for the billion and
one uses other than string matching. Anything is possible -- except that
I'll ever stop bugging him <wink>.

> (I'd love to be able to define coroutines in Python like so:
>
> def evens(z):
> for elt in z:
> if z % 2 == 0:
> suspend(z)

How do you intend to resume it?

> It would probably require some rethinking of Python's iteration
> protocol though.)

That's not a hangup; Guido already knows how to do it cleanly. Like any Big
New Feature there are real questions about costs vs benefits, and so far
this stuff has been widely viewed as "too esoteric". I think it's natural
as breathing, to the point that a *non*-resumable function strikes me as
never inhaling <wink>.

For now, there's a working implementation of a generator protocol (somewhat
more flexible than Icon's) in the source distribution, under
Demo/threads/Generator.py. I also posted a general (thread-based) coroutine
implementation a bit over 5 years ago. Building these on Christian's
stackless Python instead should run much faster.

BTW, generators are much easier to implement than coroutines -- the former
don't require threads or stacklessness or continuations under the covers,
just some relatively straightforward changes to Python's current
implementation (Steven Majewski noted this 5 years ago). This is why
generators work in all ports of Icon, but coroutines are an optional feature
that's supported only if a platform guru has taken the time to write the
platform-dependent context-switching assembly code Icon coexps require.

degeneratoredly y'rs - tim

Michael Hudson

unread,

Jul 15, 1999, 3:00:00 AM7/15/99

to Christian Tismer

On Tue, 13 Jul 1999, Christian Tismer wrote:

>
>
> Michael Hudson wrote:
>
> > Very nice, but
>
> > >>> apply(l.sort,())
> > Traceback (innermost last):
> > File "<stdin>", line 1, in ?
> > SystemError: error return without exception set
> > >>> l
> > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>

> Here is the patch.
> It was one of those "=" instead of "==" bugs.
>

> Line 131 of bltinmodule.c should read now
>
> if (retval == Py_UnwindToken) {
> -----------!!------------------
>
> I'm updating the files right now, without changing the version number.
>
> Thanks, Michael!
>
>

Christian Tismer

unread,

Jul 15, 1999, 3:00:00 AM7/15/99

to

Michael Hudson wrote:

...

> echo "fudge" | stackless-python
>
> does *not* print an error!
>
> echo "fudge" | stackless-python -i
>
> does!

Yes, thanks. This has been found yesterday, and it was an oversight
in PyRun_SimpleFile. There I call the _nr version, which actually
doesn't execute its code but returns a Py_UnwindToken.
I always felt that I've overdone here, but it was simple to solve.
I just had to repeat the errorprint code, and it works.
In other words, the program just misses to print the error,
everything else is fine. I didn't update the source since
I'm again making a huge change, Version 0.4 this week. This
one solves all continuation problems which can occour, and
the final goal (continuationmodule.c) is going to become public.

Here the patch:

pythonrun.c must be changed this way, from line 548 on:
--------------------------------------------------------
int
PyRun_SimpleFile(fp, filename)
FILE *fp;
char *filename;
{
int retval;
PyObject * result = PyRun_SimpleFile_nr(fp, filename);
if (result == Py_UnwindToken) {
result = PyEval_Frame_Dispatch();
/* proper handling added 990714 */
if (result == NULL)
PyErr_Print();
if (Py_FlushLine())
PyErr_Clear();
}
if (result==NULL)
retval = -1;
else
retval = 0;
Py_XDECREF(result);
return retval;
}
--------------------------------------------------------

>
> As you can see I have built stackless python into an executable called
> stackless-python. This required extensive hackery of the build process, so
> that could be at fault. This makes even less sense to me.

Really so hard? I thought you can rename it when it's ready,
just in the install case?

> A consequence of all this is that stackless-python cannot install itself
> without error on my system (compileall.py "fails").

> PS: Did you see stackless python got a mention on Linux Weekly News Today?
> Cool.

Oh, they did it again?
What will they say when I claim full blown first
class continuations for python, next week? :-))

ciao - chris

Graham Matthews

unread,

Jul 15, 1999, 3:00:00 AM7/15/99

to

[Neel Krishnaswami]
: > If we could compose generators just like functions, then the bookkeeping

: > can be abstracted away and the same approach will work for arbitrarily
: > complicated parsing tasks.

Tim Peters (tim...@email.msn.com) wrote:
: The latter is one of the arguments Griswold gave in the paper that

: introduced Icon, contrasting Icon's uniform approach to SNOBOL4's distinct
: pattern and procedural sublanguages.
:
: I don't think he'd make the same argument today! Icon is an idiosyncratic
: delight, but most string pattern matching was in fact easier (to write,
: modify, or read) in SNOBOL4. Once you start using Icon for complex pattern
: matching, you'll soon discover that it's very hard to go back to old code
: and disentangle "the pattern" from "the processing" -- *because* everything
: looks the same, and it's all intermixed. The distinct sublanguages in
: SNOBOL4 enforced a clean separation; and that was more often an aid than a
: burden.

As far as I understand this thread I think Neel is right, and Tim and
Griswold are wrong. I have been recently doing a lot of programming with
parser combinators in Clean and Haskell. They are something like what
Neel is talking about (as the name "parser combinators" suggests you
write small "string parsers" as functions, and then have "parser
combinators" that stitch these together). Note that in this parser
combinator approach there is no distinction between the code for "the
pattern" and the code for "the processing" -- they are all written in
the same language (Clean/Haskell in my case). At first I found the
idea of having the two types of code (the pattern and the processing)
in the same language (rather than in 2 distinct languages as in the
traditional way) a bit difficult. But I perservered and I am glad I
did. I found that once you reach a certain level of familiarity with
combinators you can write new pattern and processing code very
quickly and very reliably. The trick is to get the combinators right.

graham
--
This ain't no technological breakdown
Oh no, this is the road to hell

Neel Krishnaswami

unread,

Jul 15, 1999, 3:00:00 AM7/15/99

to

In article <000601bece8f$b549a420$51a22299@tim>,
Tim Peters <tim...@email.msn.com> wrote:
>[Neel Krishnaswami]

>
>> (I'd love to be able to define coroutines in Python like so:
>>
>> def evens(z):
>> for elt in z:
>> if z % 2 == 0:

>> suspend(elt)

>
>
>> It would probably require some rethinking of Python's iteration
>> protocol though.)
>
>How do you intend to resume it?

I left this unspecified because I don't know the answer. :) I figured
that it could be rolled up into the iteration protocol and I could
use it without having to completely understand how it all works.

[The iteration protocol]

>That's not a hangup; Guido already knows how to do it cleanly.

Good -- now I don't have to worry at all.

>BTW, generators are much easier to implement than coroutines -- the former
>don't require threads or stacklessness or continuations under the covers,
>just some relatively straightforward changes to Python's current
>implementation (Steven Majewski noted this 5 years ago).

What's the difference, exactly? AFAICT you need to save the execution
state when suspending both coroutines and generators, but I might be
missing something obvious....

Neel

Neel Krishnaswami

unread,

Jul 15, 1999, 3:00:00 AM7/15/99

to

In article <7mkrrt$a81$1...@cronkite.cc.uga.edu>,
Graham Matthews <gra...@sloth.math.uga.edu> wrote:
>
>[Parser Combinators]

>
>They are something like what Neel is talking about (as the name
>"parser combinators" suggests you write small "string parsers" as
>functions, and then have "parser combinators" that stitch these
>together). Note that in this parser combinator approach there is no
>distinction between the code for "the pattern" and the code for "the
>processing" -- they are all written in the same language
>(Clean/Haskell in my case).

Thanks for the tip -- it led me to do some digging on the web and I
found a nice description of this in a paper called "Monadic Parser
Combinators" by Graham Hutton and Erik Meijer. Its URL is:

http://www.cs.nott.ac.uk/Department/Staff/gmh/monparsing.ps

>At first I found the idea of having the two types of code (the
>pattern and the processing) in the same language (rather than in 2
>distinct languages as in the traditional way) a bit difficult. But I
>perservered and I am glad I did. I found that once you reach a
>certain level of familiarity with combinators you can write new
>pattern and processing code very quickly and very reliably. The
>trick is to get the combinators right.

I think that this weekend I'm going to try and put together a basic
version of this parsing system in Python based on the ideas in this
paper -- basically just write a simple AST class and the combinators
they describe. Then I can add proper support for backtracking and lazy
evaluation and all the stuff with serious hack value when Christian
Tismer makes continuations available to us. :)

This really seems like a clean and straightforward way of doing
things. My experience with recursive descent parsers has been pretty
bad, but now I suspect that this is because I wasn't organizing my
work in a systematic enough fashion.

Neel

Tim Peters

unread,

Jul 15, 1999, 3:00:00 AM7/15/99

to pytho...@python.org

[Tim]
> BTW, generators are much easier to implement than coroutines -- ...

[Neel]

> What's the difference, exactly? AFAICT you need to save the execution
> state when suspending both coroutines and generators, but I might be
> missing something obvious....

Just that a generator (in the Icon sense) is a semi-coroutine: it always
suspends directly to its caller, at the point it was called. A world of
difference follows from that:

1) From the caller's point of view, it's an ordinary call. So on *that* end
of it, generators are no harder than vanilla procedure calls.

2) From the generator's point of view, "the execution state" is a single
frame (its own). A coroutine can transfer to any other at any time, with
intervening calls piling up an arbitrary number of frames in the state.

3) From a recursive interpreter's point of view, the implementation language
(C, in Python's case) can't get any of its own frames stuck between caller
and callee during suspension of a generator (the generator *returns* to its
caller; it doesn't *resume* it; this asymmetry is the heart of it).

It's #3 (dealing with an intertwined C stack) that requires
platform-specific hackery to make coexps work in Icon, while the lack of C
stack pollution is why generators can work without platform-specific cruft.

Continuations and threads are harder still to implement (efficiently) -- the
less restrictive the rules, the less the implementation can exploit to make
its life easier.

a-phenomenon-you-may-observe-again-some-day<wink>-ly y'rs - tim

Greg Ewing

unread,

Jul 16, 1999, 3:00:00 AM7/16/99

to

Neel Krishnaswami wrote:
>
> What's the difference, exactly? AFAICT you need to save the execution
> state when suspending both coroutines and generators, but I might be
> missing something obvious....

The crucial difference is that a generator is never resumed
after its caller has returned. This means that the generator's
state can be pushed onto the same stack as the caller's.

A coroutine, on the other hand, can outlive the context in
which it was created, and therefore needs a stack all of
its own.

Another way to think about it is that a generator call is
equivalent to an ordinary call where one of the parameters
is a procedure. For example, where in "generator python"
you might write

def even_numbers_up_to(n):
for i in range(n):
if i % 2 == 0:
yield(i)

for e in even_numbers_up_to(42):
print e, "is an even number!"

you could do the same thing in ordinary python as

def for_each_even_number_up_to(n, do_it):
for i in range(n):
if i % 2 == 0:
do_it(i)

def print_it(e):
print e, "is an even number!"

for_each_even_number_up_to(42, print_it)

which clearly can be executed quite happily using
a single stack.

If Python had something akin to Smalltalk code blocks,
generators wouldn't be needed. It would be nifty to
be able to write something like

for_each_even_number_up_to(42) -> (e):
print e, "is an even number!"

Greg

Christian Tismer

unread,

Jul 16, 1999, 3:00:00 AM7/16/99

to

Greg Ewing wrote:
>
> Neel Krishnaswami wrote:
> >
> > What's the difference, exactly? AFAICT you need to save the execution
> > state when suspending both coroutines and generators, but I might be
> > missing something obvious....
>
> The crucial difference is that a generator is never resumed
> after its caller has returned. This means that the generator's
> state can be pushed onto the same stack as the caller's.

Fine.

> A coroutine, on the other hand, can outlive the context in
> which it was created, and therefore needs a stack all of
> its own.

Smell of danger...

> Another way to think about it is that a generator call is
> equivalent to an ordinary call where one of the parameters
> is a procedure. For example, where in "generator python"
> you might write

...

> If Python had something akin to Smalltalk code blocks,
> generators wouldn't be needed. It would be nifty to
> be able to write something like
>
> for_each_even_number_up_to(42) -> (e):
> print e, "is an even number!"

Traceback (innermost last):
File "<inbox:Re: Stackless & String-processing>", line 22, in claim
SemanticMisconceptionError:
things should be made as simple as possible, but not any simpler
>>> #:-)

>>> # sorry did you really mean that?
>>> # Where is your generator's state?
>>> # how would you do backtracking?

hoping-that-I'm-misconcepted-and-not-you-ly y'rs - chris

Graham Matthews

unread,

Jul 16, 1999, 3:00:00 AM7/16/99

to

Graham Matthews <gra...@sloth.math.uga.edu> wrote:
: >[Parser Combinators]

Neel Krishnaswami (ne...@brick.cswv.com) wrote:
: Thanks for the tip -- it led me to do some digging on the web and I

: found a nice description of this in a paper called "Monadic Parser
: Combinators" by Graham Hutton and Erik Meijer. Its URL is:
:
: http://www.cs.nott.ac.uk/Department/Staff/gmh/monparsing.ps

Yes this is a nice place to start. Another nice place to look is in
the Clean documentation. Somewhere in their docs (I think in the old
book) there is a Clean implementation of combinators (without all the
"do" notation).

The trickiest part for me was to get the error handling to work well,
since Clean/Haskell don't have exceptions. This should be "easier"
in Python, although you are welcome to see my code if you want. Also
feel free to email me questions.

Neel Krishnaswami (ne...@brick.cswv.com) wrote:
: This really seems like a clean and straightforward way of doing

: things. My experience with recursive descent parsers has been pretty
: bad, but now I suspect that this is because I wasn't organizing my
: work in a systematic enough fashion.

I had similar experiences with rec.descent parsers. By the way this
technique is not limited to parsing. I use essentially the same
code to implement a type checker too. A parser has input a string
or a file, and output an AST. The type checker has the same structure
as a parser but it's input (the thing it parses) is an AST and it's
output an AST annotated with types. Combinators are more general
than just parsing and patterns. They are a form of sequencing
construct.

Graham Matthews

unread,

Jul 16, 1999, 3:00:00 AM7/16/99

to

Fernando Pereira (per...@research.att.com) wrote:
: but, as far as I know, nothing beats
: the automata-theoretic optimization techniques for pattern matching and
: parsing, of which the typical modern compiler for regular expressions
: is just the best-known instance.

In what sense are you using the words "nothing beats". I am sure that
the automata-theoretic approach produces the fastest pattern matchers
and parsers. So on this sense "nothing beats them". But the class of
languages recognisable by automaton doesn't cover all the languages of
interest (in fact my guess is it covers very few ... context ...). So
in this sense lots of things "beat" the automata approach. And then
there are software engineering issues ...

Graham Matthews

unread,

Jul 16, 1999, 3:00:00 AM7/16/99

to

Neel Krishnaswami (ne...@brick.cswv.com) wrote:
: I think that this weekend I'm going to try and put together a basic

: version of this parsing system in Python based on the ideas in this
: paper -- basically just write a simple AST class and the combinators
: they describe. Then I can add proper support for backtracking and lazy
: evaluation and all the stuff with serious hack value when Christian
: Tismer makes continuations available to us. :)

It seems to me that you might also be able to make the regular expression
module available through this interface. All you have to do is wrap the
regular expression up in a function/object that can be combined like the
other parsers. Indeed you will need "primitive parsers" for things like
numbers, and these are really just a special case of a "primitive"
regular expression parser.

Tim Peters

unread,

Jul 16, 1999, 3:00:00 AM7/16/99

to pytho...@python.org

[Neel Krishnaswami]
> ...

> I found a nice description of this in a paper called "Monadic Parser
> Combinators" by Graham Hutton and Erik Meijer. Its URL is:
> http://www.cs.nott.ac.uk/Department/Staff/gmh/monparsing.ps

[Graham Matthews]

> Yes this is a nice place to start. Another nice place to look is in
> the Clean documentation. Somewhere in their docs (I think in the old
> book) there is a Clean implementation of combinators (without all the
> "do" notation).

A nice paper on Clean parsing combinators can be downloaded as item 5 from
the "Case Studies" list at

http://www.cs.kun.nl/~clean/

Interestingly enough, the next paper on the list ("An Interpreter") declines
to use combinators for its parsing task, instead building a conventional
"by hand" recursive descent parser.

another-two-years-and-they'll-discover-lex/yacc<wink>-ly y'rs - tim

Graham Matthews

unread,

Jul 17, 1999, 3:00:00 AM7/17/99

to

Tim Peters (tim...@email.msn.com) wrote:
: A nice paper on Clean parsing combinators can be downloaded as item 5
: >
: Interestingly enough, the next paper on the list ("An Interpreter") declines

: to use combinators for its parsing task, instead building a conventional
: "by hand" recursive descent parser.

Yes it's kind of strange that they don't use the combinators to build
the interpreter. One of Peyton-Jones' papers has a good (but small)
example of making an interpreter with combinators (it's probably the
first paper of his on monads).

Tim Peters (tim...@email.msn.com) wrote:
: another-two-years-and-they'll-discover-lex/yacc<wink>-ly y'rs - tim

I'll take that as a BIG <wink>. Once you use these combinators you will
never go back to lex/yacc (unless you want LALR(n) grammars).

graham
--
The girls is all salty, the boys is all sweet,
the food ain't too shabby
an' they piss in the streets
In France, way down in France, way on down, way on down in France

Graham Matthews

unread,

Jul 17, 1999, 3:00:00 AM7/17/99

to

<gra...@sloth.math.uga.edu> wrote:
: > But the class of

: > languages recognisable by automaton doesn't cover all the languages of
: > interest (in fact my guess is it covers very few ... context ...).

Fernando Pereira (per...@research.att.com) wrote:
: In what sense of "very few"? Most programming languages are designed to
: have an LR(k) or LL(k) backbone, and regular lexical structure

So the regular expressions help you with the regular lexical structure.
But they don't help with the LR(n) bit. They also don't help with context
dependencies. That leaves out a lot of languages.

<gra...@sloth.math.uga.edu> wrote:
: > And then

: > there are software engineering issues ...

Fernando Pereira (per...@research.att.com) wrote:
: Agreed. However, more traditional lexer- and parser-generators had
: sensible answers to many of those issues, that might not be as elegant
: as monadic approaches, but that paid serious attention to optimization
: in the regular and CF backbones.

I don't really see why using parser combinators means that you cannot
optimise. You are not precluded from using regular expression parsers
as primitive parsers, nor are you precluded from using LALR(n) parsers
as primitive parsers. Hence you can optimise those parts of your
parsing with automaton if you want. So I don't really understand your
criticism.

Also as I said parser combinators are really more of a sequencing
technique -- one that applies nicely to parsing.

Fernando Pereira (per...@research.att.com) wrote:
: By the way, I find it faintly amusing
: to see the functional programming community rediscovering and dressing
: in categorial garb a simple idea that was a main motivator for the
: development of logic programming in the early 70s (see
: <http://akpublic.research.att.com:9000/~pereira/bib.html#Pereira+Warren-
: 80:DCGs> for a survey).

Yes combinators are much like definite clause grammars in prolog. But if
I remember correctly from my use of DCG's, they gave you a fixed set of
combinators? Moreover they were limited to string parsing. The combinator
approach in the fp world has neither limitation.

Fernando Pereira (per...@research.att.com) wrote:
: It *is* a good idea for rapid prototyping, but
: interspersing pattern matching/parsing with arbitrary computations
: (even purely functional ones) effectively blocks optimization, which is
-------------------------------

Can you explain why? I don't see it, since you can have reg expr parsers
and other such fast parsers as primitive parsers that you combine ...

Graham Matthews

unread,

Jul 18, 1999, 3:00:00 AM7/18/99

to

<gra...@sloth.math.uga.edu> wrote:
> I don't really see why using parser combinators means that you cannot
> optimise. You are not precluded from using regular expression parsers
> as primitive parsers, nor are you precluded from using LALR(n) parsers
> as primitive parsers. Hence you can optimise those parts of your
> parsing with automaton if you want. So I don't really understand your
> criticism.

Fernando Pereira (per...@research.att.com) wrote:
: The monadic approach mixes up the search structure for pattern matching
: or parsing (what is elsewhere represented as an automaton) with other
: data processing. Unless your optimizer can solve the halting problem,
: it will not be able to determinize otherwise determinizable control
: structures because it cannot figure out the data dependencies.

You have totally misunderstood what I said. I said "you can have regular
expression parsers as primitive parsers". There is no need for the compiler
to "solve the halting problem". The programmer is free to use regular
expression/automaton techniques where he/she wants. Such parsers take
strings and return ASTs. They can then be combined with other parsers
as if they were primitive parsers. This is purely a programming issue
and has nothing to do with the halting problem, compiler optimisations,
etc. To make it clearer a programmer can do

p = some_parser_that_uses_automaton :: String -> Ast
q = some_parser_that_uses_some_other_technique :: String -> Ast
pq = p <&> q
where <&> is the combinator that sequences parsers

Where does the halting problem enter into this? The program has to write
the autoton code him/herself, but there are fp tools that do this, much
like for example Yacc.

<gra...@sloth.math.uga.edu> wrote:
: > Also as I said parser combinators are really more of a sequencing

: > technique -- one that applies nicely to parsing.
Fernando Pereira (per...@research.att.com) wrote:

: Sure, but so what? Automata are too a sequencing technique, usable for
: many other things besides pattern matching and parsing. The fundamental
: difference is that automata-theoretic approaches separate neatly the
: representation of sequences from what one wants to do with the parts of
: those sequences, allowing optimizations that are difficult to obtain
: within general-purpose programs.

I still don't understand why you think the combinator approach is so
inefficient, given that you can have automaton based parsers embedded
in it at will.

The point of saying that combinators are more a of sequence tool than
a parsing tool is to point out that they are more general than automaton.
You can program much richer sequencing with combinators than you can
with automaton (not suprisingly). For those parts of your parser or
sequenceing where you only need automaton you use automaton as outlined
above. But where you need more powerful sequencing you have it in the
combinator approach.

<gra...@sloth.math.uga.edu> wrote:
: > Can you explain why? I don't see it, since you can have reg expr parsers

: > and other such fast parsers as primitive parsers that you combine ...

Fernando Pereira (per...@research.att.com) wrote:
: Either you put all the interesting stuff in separate primitive parsers,
: in which case I'm not sure what's the point of the monadic approach,

Err isn't this what I said you do ... I said "you can have regular
expression parsers as primitive parsers". The point of the monadic
approach is that is allows you to cleanly program arbitrary sequencing
when you need it. Moreover it makes for sound software engineering,
etc. That too is what I said in the last mail ....

Tim Peters

unread,

Jul 19, 1999, 3:00:00 AM7/19/99

to pytho...@python.org

[Graham Matthews]

> You have totally misunderstood what I said.

> ...

[Fernando Pereira]
> One last try. I understood your point from the beginning. However, you
> did not understood *mine*.
> ...

Trust me on this one, fellows: you each understand the other, and have for
several (interesting) msgs already. Graham has an unusual way of expressing
his appreciation for a good discussion <wink>.

http://www.deja.com/getdoc.xp?AN=316262255

leaping-thru-hoops-ly y'rs - tim

Graham Matthews

unread,

Jul 19, 1999, 3:00:00 AM7/19/99

to

[Graham Matthews]
: > You have totally misunderstood what I said.

[Fernando Pereira]
: > One last try. I understood your point from the beginning. However, you
: > did not understood *mine*.

Tim Peters (tim...@email.msn.com) wrote:
: Trust me on this one, fellows: you each understand the other, and have for

: several (interesting) msgs already. Graham has an unusual way of expressing
: his appreciation for a good discussion

Tim I don't think that we are agreeing.: Fernando seems to want to
view combinators as parsing constructs and hence says things like
"the combinators don't have to do much if you are using automata
based black boxes" (to paraphrase). But combinators are more than
parsing constructs -- they are sequencing constructs used for a wide
variety of tasks where automata techniques are not available.

Graham Matthews

unread,

Jul 19, 1999, 3:00:00 AM7/19/99

to

<gra...@sloth.math.uga.edu> wrote:
> expression parsers as primitive parsers". There is no need for the compiler
> to "solve the halting problem". The programmer is free to use regular
> expression/automaton techniques where he/she wants. Such parsers take
> strings and return ASTs. They can then be combined with other parsers
> as if they were primitive parsers.

Fernando Pereira (per...@research.att.com) wrote:
: One last try. I understood your point from the beginning. However, you

: did not understood *mine*. If you push the complex
: pattern-matching/parsing machinery into black boxes, there's not much
: for the monadic glue to do, is there?

What your combinators do depends on your application. If all you are
doing is parsing using primitive automaton based parsers then your
combinators won't have much to do. But if you are doing more than
that then they will have more to do, and hence be more complex. For
example you can use the same combinators you define for parsing to do
type checking, model state, etc -- things I would not want to do with
an automaton. That is why I said that combinators are really sequencing
constructs.

Fernando Pereira (per...@research.att.com) wrote:
: And even then, some
: optimizations, those involving the various black boxes, will be
: blocked. Here's a simple example. Assume that there are two black boxes
: A and B, and that you use monadic glue to say "A or B". A and B may be
: deterministic, but the result will not be, and exhaustive search will
: be required in the glue. Sure, I can recognize such cases by hand and
: create a new black box "A or B" which is optimized, but that is an
: error-prone hand optimization. And one may not notice subtler
: inefficiencies.

Your point appears to be that if you were programming only using automaton
for parsing you wouldn't miss the above optimisation, but that somehow
using combinators means you would miss the above optimisation. You
and I must program differently then. If I were using automaton, even
in the context of also using combinators, I would be aware of the
potential for such optimisations and look for them.

Graham Matthews

unread,

Jul 20, 1999, 3:00:00 AM7/20/99

to

<gra...@sloth.math.uga.edu> wrote:
> What your combinators do depends on your application. If all you are
> doing is parsing using primitive automaton based parsers then your
> combinators won't have much to do. But if you are doing more than
> that then they will have more to do, and hence be more complex. For
> example you can use the same combinators you define for parsing to do
> type checking, model state, etc -- things I would not want to do with
> an automaton. That is why I said that combinators are really sequencing
> constructs.
Fernando Pereira (per...@research.att.com) wrote:

: *What* I've been saying is that there is a generality-efficiency
: tradeoff, and that the monadic approach goes for generality at the
: expense of efficiency, by precluding optimizations in those very
: important cases in which combinators are used for parsing.

And I am saying that this claim that generality implies loss of
efficiency is a red herring. If you are writing a program that uses
combinator based parsing there is nothing to stop you using automaton
based techniques in your program. You just add the appropriate
automaton based primitives (eg. for regular expressions). You
might end up adding in so many automaton based primitives that
you don't need to do much with your combinators, but whether or
not that happens depends entirely on your application and your
needs. The point is that combinator based parsing doesn't prevent
you from using other more efficient techniques as and where
appropriate!

Have you ever written a combinator based parsing program?

<gra...@sloth.math.uga.edu> wrote:
> Your point appears to be that if you were programming only using automaton
> for parsing you wouldn't miss the above optimisation, but that somehow
> using combinators means you would miss the above optimisation. You
> and I must program differently then. If I were using automaton, even
> in the context of also using combinators, I would be aware of the
> potential for such optimisations and look for them.

Fernando Pereira (per...@research.att.com) wrote:
: You might be, by looking at the code and changing it by hand. But a
: program would not, because in general the optimizable skeleton of
: sequence, alternation and Kleene closure combinators is interspersed
: with arbitrary functional code.

Could you please point out to me where I claimed that "a program could
optimise your combinator program". If you can't find such a quote then
what is your point?

Fernando Pereira (per...@research.att.com) wrote:
: But from my readings (those mentioned in the thread
: and others) as well as your comments, the whole point of monadic
: combinators for parsing is to interweave the grammatical structure with
: other computations, thus making the underlying grammatical structure
: less accessible to optimization. As I suggested above, it would be
: interesting to define classes of monadic programs that still allow (an
: appropriate extension of) those optimizations. But AFAIK that has not
: been done.

It hasn't been done, it won't likely be done, and I never said it
could be done. So what is your point?