Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Pratt-parser

249 views
Skip to first unread message

Rod Pemberton

unread,
Nov 5, 2009, 2:50:02 AM11/5/09
to

Is anyone here familiar with Vaughn Pratt's concept of "Top Down Operator
Precedence" (1973) or Pratt-parsers?

I'd like to know more about the technique. Unfortunately, the articles
below are programming language heavy.

Apparently, Pratt-parsers are recursive descent parsers modified by top down
operator precedence. It seems that this dramatically reduces the code size.

AFAICT, Pratt-parsers are used with CGOL, LISP, and Scheme. Otherwise, the
concept was obscure until recently. However, it has been rediscovered
apparently by Crockford:

for Javascript, by Douglas Crockford
http://javascript.crockford.com/tdop/tdop.html

for Python, by Fredrik Lundh
http://effbot.org/zone/simple-top-down-parsing.htm


Rod Pemberton


Dmitry A. Kazakov

unread,
Nov 5, 2009, 4:04:51 AM11/5/09
to
On Thu, 5 Nov 2009 02:50:02 -0500, Rod Pemberton wrote:

> Is anyone here familiar with Vaughn Pratt's concept of "Top Down Operator
> Precedence" (1973) or Pratt-parsers?

I don't have the paper, but I remember a Pratt's book (1975) where the
parsing technique was described.

> I'd like to know more about the technique. Unfortunately, the articles
> below are programming language heavy.
>
> Apparently, Pratt-parsers are recursive descent parsers modified by top down
> operator precedence. It seems that this dramatically reduces the code size.
>
> AFAICT, Pratt-parsers are used with CGOL, LISP, and Scheme. Otherwise, the
> concept was obscure until recently. However, it has been rediscovered
> apparently by Crockford:
>
> for Javascript, by Douglas Crockford
> http://javascript.crockford.com/tdop/tdop.html
>
> for Python, by Fredrik Lundh
> http://effbot.org/zone/simple-top-down-parsing.htm

I am using Pratt's method for decades. I am not sure if he invented it, he
didn't say it in the book, maybe he was. I have built several compilers on
it.

In my view it is simple and efficient. One of the compilers I built was in
Assembler for a 32K machine, which illustrates how simple the method is.

I also extended the methods towards two asymmetric priorities. You can find
a description and implementation here

http://www.dmitry-kazakov.de/ada/components.htm#Parsers_etc

The section 11.2 contains an explanation how the techinique works for most
complex expressions one can find in programming languages and mathematics
notation:

http://www.dmitry-kazakov.de/ada/components.htm#11.2

One of the crucial advantages of the technique is that it does not require
grammar, and as a consequence nothing need to be precompiled/generated. The
parser can be made 100% table-driven. It perfectly suitable for OO design.

I am glad to see that the method finally gains attention it certainly
deserves.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Rod Pemberton

unread,
Nov 5, 2009, 5:17:46 AM11/5/09
to
"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> wrote in message
news:1jgp7sm64eejy$.1tz85slyasyb7$.dlg@40tude.net...

Thank you!

Hmm, your page lists "T. Pratt" and it seems Terrence Pratt wrote the book
you mentioned. But, from the descriptions, it sounds like they do same
thing... or very similar.


RP


Dmitry A. Kazakov

unread,
Nov 5, 2009, 5:44:53 AM11/5/09
to
On Thu, 5 Nov 2009 05:17:46 -0500, Rod Pemberton wrote:

> Hmm, your page lists "T. Pratt" and it seems Terrence Pratt wrote the book
> you mentioned. But, from the descriptions, it sounds like they do same
> thing... or very similar.

Once I asked in comp.compilers if anybody knew the author of the method,
nobody was certain about it.

Grammar-based approaches have been dominating compiler construction for
many years being in my eyes inferior. History of software is full of such
examples. For example, regular expressions is a quite weak class of
languages incapable to recognize simple bracket constructs, very difficult
and uncomfortable to use. Nevertheless it is basically the only pattern
language known, except for wild-cards patterns. Far better and simpler
SNOBOL patterns are forgotten.

Ben Pfaff

unread,
Nov 5, 2009, 12:07:52 PM11/5/09
to
"Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> writes:

> For example, regular expressions is a quite weak class of
> languages incapable to recognize simple bracket constructs,
> very difficult and uncomfortable to use. Nevertheless it is
> basically the only pattern language known, except for
> wild-cards patterns. Far better and simpler SNOBOL patterns are
> forgotten.

That's very interesting. I have never heard of SNOBOL patterns
before. Wikipedia says:

A SNOBOL pattern can be very simple or extremely complex. A
simple pattern is just a text string (e.g. "ABCD"), but a
complex pattern may be a large structure describing, for
example, the complete grammar of a computer language. It is
possible to implement a language interpreter in SNOBOL almost
directly from a Backus-Naur form expression of it, with few
changes.

I see that there is a SNOBOL pattern extension for Python:
http://snopy.sourceforge.net/user-guide.html
--
"Writing is easy.
All you do is sit in front of a typewriter and open a vein."
--Walter Smith

Ben Bacarisse

unread,
Nov 5, 2009, 1:20:24 PM11/5/09
to
Ben Pfaff <b...@cs.stanford.edu> writes:

> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> writes:
>
>> For example, regular expressions is a quite weak class of
>> languages incapable to recognize simple bracket constructs,
>> very difficult and uncomfortable to use. Nevertheless it is
>> basically the only pattern language known, except for
>> wild-cards patterns. Far better and simpler SNOBOL patterns are
>> forgotten.
>
> That's very interesting. I have never heard of SNOBOL patterns
> before.

They are very powerful but a little "operational" in that some of the
time you feel you are telling the matcher what to _do_ rather than
what to match. None the less, SNOBOL left a deep impression on me. I
recall an implementation of Russel's paradox in SNOBOL: a pattern, R,
that matches only those patterns that don't match themselves. You
then ask if R matches R!

Have you come across Icon[1]? Designed, at least in part, by Griswold
of SNOBOL fame. It have some very elegant ideas. It is a shame it is
not more widely know.

[1] http://www.cs.arizona.edu/icon/

<snip>
--
Ben.

Ben Pfaff

unread,
Nov 5, 2009, 1:36:17 PM11/5/09
to
Ben Bacarisse <ben.u...@bsb.me.uk> writes:

> Ben Pfaff <b...@cs.stanford.edu> writes:
>
>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> writes:
>>
>>> For example, regular expressions is a quite weak class of
>>> languages incapable to recognize simple bracket constructs,
>>> very difficult and uncomfortable to use. Nevertheless it is
>>> basically the only pattern language known, except for
>>> wild-cards patterns. Far better and simpler SNOBOL patterns are
>>> forgotten.
>>
>> That's very interesting. I have never heard of SNOBOL patterns
>> before.
>
> They are very powerful but a little "operational" in that some of the
> time you feel you are telling the matcher what to _do_ rather than
> what to match. None the less, SNOBOL left a deep impression on me. I
> recall an implementation of Russel's paradox in SNOBOL: a pattern, R,
> that matches only those patterns that don't match themselves. You
> then ask if R matches R!

I looked around the web for a while and found a few superficial
descriptions of SNOBOL patterns. I also found a few SNOBOL
reference manuals, but again their descriptions of patterns were
very brief and I found it difficult to figure out how they were
used and why exactly they were so powerful.

Can you give a few examples?

> Have you come across Icon[1]? Designed, at least in part, by Griswold
> of SNOBOL fame. It have some very elegant ideas. It is a shame it is
> not more widely know.
>
> [1] http://www.cs.arizona.edu/icon/

I'm scanning through the book now:
http://www.cs.arizona.edu/icon/ftp/doc/lb1up.pdf
--
"...dans ce pays-ci il est bon de tuer de temps en temps un amiral
pour encourager les autres."
--Voltaire, _Candide_

Rod Pemberton

unread,
Nov 5, 2009, 5:58:14 PM11/5/09
to
"Ben Bacarisse" <ben.u...@bsb.me.uk> wrote in message
news:0.0d2ddf5bbc876993d58c.2009...@bsb.me.uk...

> Have you come across Icon[1]? Designed, at least in part, by Griswold
> of SNOBOL fame. It have some very elegant ideas. It is a shame it is
> not more widely know.
>
> [1] http://www.cs.arizona.edu/icon/
>

Icon was used to write GBURG parser by Chris Fraser and Todd Proebsting in
"Finite-State Code Generation". One or both of them and/or David Hanson
worked on other similarly named parsers BURG, IBURG. Chris Fraser and David
Hanson are the ones who created the LCC C compiler.


Rod Pemberton


Ben Bacarisse

unread,
Nov 5, 2009, 11:24:24 PM11/5/09
to
Ben Pfaff <b...@cs.stanford.edu> writes:

> Ben Bacarisse <ben.u...@bsb.me.uk> writes:
>
>> Ben Pfaff <b...@cs.stanford.edu> writes:
>>
>>> "Dmitry A. Kazakov" <mai...@dmitry-kazakov.de> writes:
>>>
>>>> For example, regular expressions is a quite weak class of
>>>> languages incapable to recognize simple bracket constructs,
>>>> very difficult and uncomfortable to use. Nevertheless it is
>>>> basically the only pattern language known, except for
>>>> wild-cards patterns. Far better and simpler SNOBOL patterns are
>>>> forgotten.
>>>
>>> That's very interesting. I have never heard of SNOBOL patterns
>>> before.
>>
>> They are very powerful but a little "operational" in that some of the
>> time you feel you are telling the matcher what to _do_ rather than
>> what to match. None the less, SNOBOL left a deep impression on me. I
>> recall an implementation of Russel's paradox in SNOBOL: a pattern, R,
>> that matches only those patterns that don't match themselves. You
>> then ask if R matches R!
>
> I looked around the web for a while and found a few superficial
> descriptions of SNOBOL patterns. I also found a few SNOBOL
> reference manuals, but again their descriptions of patterns were
> very brief and I found it difficult to figure out how they were
> used and why exactly they were so powerful.
>
> Can you give a few examples?

The main feature is that patterns are built from pattern primitives
using operators. Patterns are named so you can write:

DIGITS = SPAN('0123456789')
NUM = DIGITS | ( '+' | '-' ) DIGITS

The name is substituted with its value immediately unless you use what
is called an unevaluated expression:

P = *P 'a' *P | 'z'

matches the strings 'z', 'zaz', 'zazazaz' and so on. The classic
example being that you can match strings with balanced parentheses:

PAIRED = NOTANY('()') | '(' ARBNO(*PAIRED) ')'
BALANCED = PAIRED ARBNO(PAIRED)

NOTANY matches a string without any on the characters in the string
argument ([^()]* as a regular expression) and ARBNO matches an
arbitrary number of repetitions of a pattern (* in the more common RE
notation).

My comment about it being rather operational comes from the fact that
there are primitives that interact directly with the scanning
algorithm. For example, FAIL always simply fails to match causing the
scanner to backtrack if it can and a lot of the power comes from
assignments done mid-match, using the $ operator.

For example,

LEN(1) $ IT *IT

matches and repeated character. LEN(1) matches any single character
and $ IT assigns the matched character to the variable IT at that
point in the scan. *IT refers to the character stored, thereby
matching only double letters.

Output is done but assigning to the variable OUTPUT, so to illustrate
FAIL one could write:

'MISSISSIPPI' (LEN(1) $ X *X) $ OUTPUT FAIL

to print:

SS
SS
PP

Without the FAIL, only the first pair would be printed since the
pattern would have succeeded.

--
Ben.

James Dow Allen

unread,
Nov 6, 2009, 3:07:10 AM11/6/09
to
On Nov 6, 1:36 am, Ben Pfaff <b...@cs.stanford.edu> wrote:
> I looked around the web for a while and found a few superficial
> descriptions of SNOBOL patterns....

>
> Can you give a few examples?

Another nice feature of the SNOBOL programming language
is that the string matching a subpattern can be saved in
a variable. IIRC,
A B.X C
is the same as
A B C
but the string matching the subpattern B is saved in X.

I've often wanted to use this feature when sed'ing, e.g.
sed "sqhas [0-9]* wivesqwill have N happy widowsq"
where "N" is the matching [0-9]* remembered.

James Dow Allen

Ben Bacarisse

unread,
Nov 6, 2009, 7:13:22 AM11/6/09
to
James Dow Allen <jdall...@yahoo.com> writes:

> On Nov 6, 1:36 am, Ben Pfaff <b...@cs.stanford.edu> wrote:
>> I looked around the web for a while and found a few superficial
>> descriptions of SNOBOL patterns....
>>
>> Can you give a few examples?
>
> Another nice feature of the SNOBOL programming language
> is that the string matching a subpattern can be saved in
> a variable. IIRC,
> A B.X C

You recall the operator correctly but, unless it is my turn to
misremember, SNOBOL is very fussy about spaces and you must write:

A B . X C

> is the same as
> A B C
> but the string matching the subpattern B is saved in X.

In case anyone is puzzled, SNOBOL has 2 of these assignment
operators. The . assigns on a successful match and $ (that I
illustrated elsewhere) does so, repeatedly, during the match.

> I've often wanted to use this feature when sed'ing, e.g.
> sed "sqhas [0-9]* wivesqwill have N happy widowsq"
> where "N" is the matching [0-9]* remembered.

Which, of course, you can do. Using a name is nicer, but \1 does the
job. In SNOBOL, you use pretty much anything as the replacement:

LINE SPAN('0123456789') . N = N + 1

increments the first number found in the variable LINE.

--
Ben.

Christian Gollwitzer

unread,
Nov 6, 2009, 10:24:17 AM11/6/09
to
James Dow Allen schrieb:

> I've often wanted to use this feature when sed'ing, e.g.
> sed "sqhas [0-9]* wivesqwill have N happy widowsq"
> where "N" is the matching [0-9]* remembered.

Use backreferences:

$ sed 'sqhas \([0-9]*\) wivesqwill have \1 happy widowsq'
has 9 wives
will have 9 happy widows
has 4 wives
will have 4 happy widows

Christian


Ben Pfaff

unread,
Nov 6, 2009, 12:20:29 PM11/6/09
to
Ben Bacarisse <ben.u...@bsb.me.uk> writes:

> Ben Pfaff <b...@cs.stanford.edu> writes:
>> I looked around the web for a while and found a few superficial
>> descriptions of SNOBOL patterns. I also found a few SNOBOL
>> reference manuals, but again their descriptions of patterns were
>> very brief and I found it difficult to figure out how they were
>> used and why exactly they were so powerful.
>>
>> Can you give a few examples?
>
> The main feature is that patterns are built from pattern primitives
> using operators. Patterns are named so you can write:

Thank you for the examples.
--
"Long noun chains don't automatically imply security."
--Bruce Schneier

James Dow Allen

unread,
Nov 7, 2009, 2:53:02 AM11/7/09
to
On Nov 6, 10:24 pm, Christian Gollwitzer <Christian.Gollwit...@uni-

Thanks! I had a hunch sed had such a facility and thought it easier
to ask here than read the man. :-)

Odd that I remembered a feature (forgetting the spaces) of a language
(SNOBOL) which I've not used for *40 years* ... yet can't remember
what I had for breakfast yesterday.

James

0 new messages