Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

portable hygienic pattern matching

112 views
Skip to first unread message

Alex Shinn

unread,
Nov 29, 2006, 11:19:27 AM11/29/06
to
Runtime pattern matching is an extremely useful tool that
can make code both more concise and easier to follow.
Most implementations provide a MATCH macro, however
in many cases it is low-level, and thus breaks easily in
combination with hygienic macros.

I couldn't find any portable hygienic versions, so I've
written my own and made it available at:

http://synthcode.com/scheme/match.scm
http://synthcode.com/scheme/match-test.scm

The code is public domain. At less than 100 lines, it
aims for conciseness rather than efficiency, but could
easily be optimized. At least in the single pattern case
(i.e. destructuring bind) it is just as fast as the competitors.

The macro is almost entirely compatible with Andrew
Wright's pattern matcher. Without something like SRFI-42
or the (... ...) escape, ... can't be used in patterns, so you
have to use the alternate ___ symbol.

The basic usage is

(match expr clause ...)

where each clause is one of

(pattern body ...)
(pattern (=> next) body ...)

Expr is evaluated, and matched against each pattern in turn.
The first to succeed causes the corresponding body to be
evaluated, and the remaining patterns are ignored. However,
if the NEXT identifier is labeled, it may be used as a continuation
to continue matching, allowing for additional runtime tests on
the pattern.

The syntax for the patterns is:

patvar ;; matches anything, and binds patvar
| _ ;; matches anything
| literal ;; matches with equal?
| 'sexp ;; an s-expression
| 'symbol ;; a symbol (special case of s-expr)
| (pat1 ... patN) ;; list of n elements
| (pat1 ... patN . patN+1) ;; list of n or more
| (pat1 ... patN patN+1 ___) ;; list of n or more, each element
;; of remainder must match patN+1
| #(pat1 ... patN) ;; vector of n elements
| #(pat1 ... patN patN+1 ___) ;; vector of n or more, each element
;; of remainder must match patN+1
| (and pat ...) ;; if all of pats match
| (or pat ...) ;; if any of pats match
| (not pat ...) ;; if all pats don't match at all
| (? pred pat ...) ;; if pred true and all pats match

See the documentation in many implementations for a
more detailed discussion. The test suite can be used for
examples (tested in psyntax and mzscheme).

--
Alex, burned by an unhygienic MATCH for the last time today

David Van Horn

unread,
Nov 29, 2006, 12:13:11 PM11/29/06
to
Alex Shinn wrote:
> Runtime pattern matching is an extremely useful tool that
> can make code both more concise and easier to follow.
> Most implementations provide a MATCH macro, however
> in many cases it is low-level, and thus breaks easily in
> combination with hygienic macros.
>
> I couldn't find any portable hygienic versions, so I've
> written my own and made it available at:
>
> http://synthcode.com/scheme/match.scm
> http://synthcode.com/scheme/match-test.scm

Very nice.

I noticed the following bug, though:

(match 'x ((or y) y)) => unbound variable y

There was an extensible, portable pattern matcher that Andre van Tonder
and I worked out, which you might be interested in.


http://groups.google.com/groups?selm=4166F2D5.6813B1B9%40het.brown.edu

David

samth

unread,
Nov 29, 2006, 2:45:14 PM11/29/06
to

Alex Shinn wrote:
> Runtime pattern matching is an extremely useful tool that
> can make code both more concise and easier to follow.
> Most implementations provide a MATCH macro, however
> in many cases it is low-level, and thus breaks easily in
> combination with hygienic macros.

I'm not sure what you mean here. For example, the PLT match.ss library
is certainly implemented using the full power of the PLT Scheme syntax
system, including syntax-case and much else. However, it is not
unhygenic in the sense of introducing bindings that capture references
in the users code (or in the sense of introducing references to local
bindings). Further, I don't know of any problematic interactions with
macros written on top of match, and many such macros exist.

Can you explain the problems you've run into?

sam th

Gary Baumgartner

unread,
Nov 29, 2006, 5:14:56 PM11/29/06
to
In article <1164829514.0...@j44g2000cwa.googlegroups.com>,

I posted to the scsh mailing list about a problem using Version 1.18,
July 17, 1995 of match-s48.scm from:

ftp://ftp.cs.indiana.edu/pub/scheme-repository/code/match/match.tar.gz

(define-syntax m
(syntax-rules ()
((_) (match 1 (x 2)))))
(m)

Error: syntax error in pattern
#{Generated x 1455}

Gary Baumgartner

Alex Shinn

unread,
Nov 29, 2006, 9:17:21 PM11/29/06
to
David Van Horn のメッセージ:

> I noticed the following bug, though:
>
> (match 'x ((or y) y)) => unbound variable y

This was actually intentional. I wasn't sure how to handle the case:

(match 'x ((or (y) z) (list y z))

Which of Y and Z are bound in the body? My first instinct was that
if you're matching any one of a number of patterns, you're not
interested in destructuring them, since they may be different. So
my matcher just ignores all bindings in OR.

However, I checked and while Andrew Wright's match will give an
error for the above case, it allows things like

(match '(x . 1) ((or (y . 1) (y . 2)) y))

I can see where you might want this, but such cases can always
be written as

(match '(x . 1) ((y . (or 1 2)) y))

which is in addition simpler and more concise.

However, I'll consider adding support for named bindings in OR.

> There was an extensible, portable pattern matcher that Andre van Tonder
> and I worked out, which you might be interested in.
>
> http://groups.google.com/groups?selm=4166F2D5.6813B1B9%40het.brown.edu

Thanks! I looked around and couldn't find any other portable matchers.
I knew Andre had done a lot of work with matching, but could only
find his SRFI-57 matcher, which alas is not portable. I'll take a look
at this.

--
Alex

samth

unread,
Nov 30, 2006, 9:14:08 AM11/30/06
to

Alex Shinn wrote:
> David Van Horn のメッセージ:
>
> > I noticed the following bug, though:
> >
> > (match 'x ((or y) y)) => unbound variable y
>
> This was actually intentional. I wasn't sure how to handle the case:
>
> (match 'x ((or (y) z) (list y z))
>
> Which of Y and Z are bound in the body? My first instinct was that
> if you're matching any one of a number of patterns, you're not
> interested in destructuring them, since they may be different. So
> my matcher just ignores all bindings in OR.
>
> However, I checked and while Andrew Wright's match will give an
> error for the above case, it allows things like
>
> (match '(x . 1) ((or (y . 1) (y . 2)) y))
>
> I can see where you might want this, but such cases can always
> be written as
>
> (match '(x . 1) ((y . (or 1 2)) y))
>
> which is in addition simpler and more concise.

What about this code?

(match a [(or (1 . x) (x . _)) x])

I don't see an easy way to pull the bindings out of the `or'.

sam th

Alex Shinn

unread,
Nov 30, 2006, 9:29:20 AM11/30/06
to
samth wrote:
>
> What about this code?
>
> (match a [(or (1 . x) (x . _)) x])
>
> I don't see an easy way to pull the bindings out of the `or'.

I can implement this easily, I'm just not sure I want to :)
The cases where you really want this seem ill-defined, and
it seems too likely occur as an accident.

Do you have any realistic examples where you'd want this
sort of match?

--
Alex

samth

unread,
Nov 30, 2006, 10:18:43 AM11/30/06
to

After a bit of investigation (I don't use Scheme 48, and the PLT
matcher has been evolving for 11 years since 1995), I think the problem
here is that Scheme48 generates a fresh identifier for x in the output
of m, which isn't a symbol (because of how Scheme 48 handles hygenic
macro expansion. That doesn't play nicely with match, which assumes
that pattern variables are symbols, and thus doesn't understand the
pattern.

Assigning blame for this problem isn't obvious, since there are two
issues - first, defmacro is fundamentally broken (which is why the PLT
match library, which is based on Wright's code, was ported to use
syntax-case). Second, Scheme48 is providing explicit renaming, but a
version that behaves differently that one might expect (although not
differently than specified in the explicit renaming paper, which I
believe allows identifiers to not always be symbols).

However, none of this is because of the "low" or "high" level ness of
any of these macros. For example:

> (define-syntax m3 (lambda (e r c) `(match 1 (,(r 'x) 2))))
#{Unspecific}
> (m3)


Error: syntax error in pattern

#{Generated x 412}
(&error)

This is precisely the same error, using "low-level" explicit renaming
macros.

Further, here is the error, without any use of defmacro (or any other
hygiene-breaking tool):

> (define-syntax mylet (lambda (e r c) (let ((var (cadr e))) (if (symbol? var) `(,(r 'let)((,var ,(caddr e))) ,(cadddr e)) ''wrong))))
#{Unspecific}
> (define-syntax breaks (syntax-rules () ((_) (mylet z 1 2))))
#{Unspecific}
> (breaks)
'wrong

The problem is that instead of symbol?, macros should use identifier?,
or some similar predicate. I don't know of any such predicate in
Scheme48 though.

Note that in Larceny, which also uses explicit renaming, but uses
symbols for all identifiers (I believe) these macros work.

sam th

samth

unread,
Nov 30, 2006, 10:26:28 AM11/30/06
to

Alex Shinn wrote:
> samth wrote:
> >
> > What about this code?
> >
> > (match a [(or (1 . x) (x . _)) x])
> >
> > I don't see an easy way to pull the bindings out of the `or'.
>
> I can implement this easily, I'm just not sure I want to :)
> The cases where you really want this seem ill-defined, and
> it seems too likely occur as an accident.

The important thing is to check that all the parts of the or bind
precisely the same variables (otherwise very strange things happen).
That's harder to check with syntax-rules, though (the obvious way I can
think of requires quadratic code blowup).

> Do you have any realistic examples where you'd want this
> sort of match?

I don't use it very much, but I've gotten bug reports about these sorts
of patterns in the PLT matcher. So it does get used. I think the
primary use case is when you have several different structures, each of
which contains, among other things, the one piece of info you want.
For example, say you have a heap, with the max value at the top,
implemented as two structures (leaf and node):

(define-struct leaf (val))
(define-struct node (left val right))

(define (max-val t)
(match t
[(or ($ leaf v) ($ node _ v _)) v]))

In this case, it's easy to pull apart into two clauses, but if the
patterns are nested, then you have to duplicate a bunch of code.

sam th

Jens Axel Søgaard

unread,
Nov 30, 2006, 10:44:47 AM11/30/06
to
Alex Shinn skrev:
> David Van Horn のメッセージ:

>> There was an extensible, portable pattern matcher that Andre van Tonder
>> and I worked out, which you might be interested in.
>>
>> http://groups.google.com/groups?selm=4166F2D5.6813B1B9%40het.brown.edu
>
> Thanks! I looked around and couldn't find any other portable matchers.
> I knew Andre had done a lot of work with matching, but could only
> find his SRFI-57 matcher, which alas is not portable. I'll take a look
> at this.

A naïve pattern (but well documented) pattern matcher can be found here:

<http://www.google.com/codesearch?hl=en&q=+file:(.scm%7C.ss%7C.sch)+s%C3%B8gaard+show:lWtXh347560:SFTNDMqdVbc:TY3sq8wfuQc&sa=N&cd=2&ct=rc&cs_p=http://download.gna.org/spells/bundles/spells.tar.gz&cs_f=spells/match.scm#a0>

or here

<http://masl.to/?L56A35A4E>

--
Jens Axel Søgaard

Alex Shinn

unread,
Nov 30, 2006, 11:15:50 AM11/30/06
to
samth wrote:

> Alex Shinn wrote:
> > Most implementations provide a MATCH macro, however
> > in many cases it is low-level, and thus breaks easily in
> > combination with hygienic macros.
>
> I'm not sure what you mean here.

Oops, sorry, I meant to say unhygienic, not low-level.
That is to say, many implementations use the public
domain MATCH essentially unmodified in its unhygienic
defmacro form. Obviously, if you use defmacro you'll
get hygiene problems (*especially* if you mix it with
hygienic macros).

You can adapt the defmacro MATCH to work with your
hygienic macro system of choice (as PLT did), but
this isn't portable.

--
Alex

Jens Axel Søgaard

unread,
Nov 30, 2006, 12:19:41 PM11/30/06
to
Alex Shinn skrev:

The implementation of match changed a few years back,
but the surface syntax were kept. Bruce Hauman
started the change, and Sam took over at some point.

<http://www.cs.wcu.edu/~bhauman/scheme/pattern.php>

That is, not much code from the original implementation
is left.


--
Jens Axel Søgaard

Andre

unread,
Nov 30, 2006, 1:46:34 PM11/30/06
to
David Van Horn wrote:

> There was an extensible, portable pattern matcher that Andre van Tonder
> and I worked out, which you might be interested in.
>
> http://groups.google.com/groups?selm=4166F2D5.6813B1B9%40het.brown.edu

The trick to make it extensible using only syntax-rules
was a bit of a hack, though. Although it works in some
r5rs toplevels, I would not recommend using this trick.
This kind of thing really calls for procedural macros.

Cheers
Andre

Lauri Alanko

unread,
Nov 30, 2006, 2:18:23 PM11/30/06
to
In article <1164896960....@j72g2000cwa.googlegroups.com>,
Alex Shinn <alex...@gmail.com> wrote:

> samth wrote:
> > (match a [(or (1 . x) (x . _)) x])

> Do you have any realistic examples where you'd want this
> sort of match?

See page 4 in http://www.eecs.usma.edu/webs/people/okasaki/jfp99.ps


Lauri

samth

unread,
Nov 30, 2006, 6:14:30 PM11/30/06
to

Alex Shinn wrote:
> samth wrote:
> > Alex Shinn wrote:
> > > Most implementations provide a MATCH macro, however
> > > in many cases it is low-level, and thus breaks easily in
> > > combination with hygienic macros.
> >
> > I'm not sure what you mean here.
>
> Oops, sorry, I meant to say unhygienic, not low-level.
> That is to say, many implementations use the public
> domain MATCH essentially unmodified in its unhygienic
> defmacro form. Obviously, if you use defmacro you'll
> get hygiene problems (*especially* if you mix it with
> hygienic macros).

Again, I'm not sure what you mean here. Wright's original matcher is
obviously not hygenic in the sense that bindings it introduces might
capture those in the original program, and references might be captured
by bindings in the original program. However, it is hygenic in the
sense that it does not *intentionally* capture any user references.
Further, there aren't any hygiene problems with match in particular
(that I know of) as opposed to general problems shared by all defmacro
macros (which are serious, of course).

I thought you were suggesting that there was something about matchers
that was either unhygenic, or that was particulary bad about matchers
implemented in defmacro (as opposed to the or macro being implemented
with defmacro). I still don't believe either of those statements to be
true. If all you are saying is that defmacro is a problematic macro
system, and that we shouldn't still be using ancient (11 years old!)
code that uses it, I totally agree.

sam th

Alex Shinn

unread,
Dec 1, 2006, 9:38:01 AM12/1/06
to
samth wrote:
>
> That's harder to check with syntax-rules, though (the obvious way I can
> think of requires quadratic code blowup).

Your lack of faith is disturbing. :)

SYNTAX-RULES is unable to create or destructure scheme atoms
(chars, numbers, strings and symbols). That is its only limitation.
Anything else it can do, even straightforwardly if you're comfortable
with CPS-style programming, and for tasks like this it's actually
fairly natural.

The new version supports bound identifiers in OR patterns,
so long as each branch uses the same identifiers. There's
no duplication of code (although there are more optimizations
I can do to trim the generated code in general).

In addition, you can refer to perviously bound identifiers to
require an EQUAL? match, so that, e.g.

(x _ x)

matches any list of three elements where the first and third
elements are the same.

Quasiquoting has been added, GET! and SET! are now
supported, MATCH-LET is now optimal (Wright's version is not),
and the vector matching has been optimized to not use
intermediate lists.

Real ellipses are now optionally supported if you comment out
the right section of code. This is a little tricky to do portably.
R5RS states clearly that ... can't be used as an identifier, but
does not prohibit its use as a literal, so one would assume
that

(define-syntax syntax-ellipse?
(syntax-rules (...)
((_ (x ...) sk fk) sk)
((_ x sk fk) fk)
))

would work, allowing

(syntax-ellipse? (foo ...) #t #f) => #t
(syntax-ellipse? (foo bar) #t #f) => #f

however this only works in Scheme48, Larceny and MIT-Scheme.
PLT works if you quote the ... in the body as (... ...). Psyntax
doesn't seem to provide any way at all to match the ellipse without
resorting to SYNTAX-CASE. If your Scheme doesn't provide any
way to match ..., you'll have to stick with ___.

In the meantime I'm working on optimizing the generated output
better for patterns that share structure, and then will clean up the
code, add better error messages, and consider making the whole
thing extensible like Andre and David's matcher. I'm also looking into
possible tree matching extensions, since to me this is a lot more
natural to use for SXML than something like XPATH, and being
compiled it's also substantially faster.

The new version is available at the same location

http://synthcode.com/scheme/match.scm

which at 350 lines has gotten a fair bit bigger (though a good deal
smaller than the competitors), so I've put the original at

http://synthcode.com/scheme/match-simple.scm

--
Alex

David Van Horn

unread,
Dec 1, 2006, 10:10:46 AM12/1/06
to
Alex Shinn wrote:
> In the meantime I'm working on optimizing the generated output
> better for patterns that share structure, and then will clean up the
> code, add better error messages, and consider making the whole
> thing extensible like Andre and David's matcher.

I would say don't bother. The extensibility trick is obtained by
expanding into a re-definition of match and it works poorly in practice
because it must be used at the top-level and most implementations'
module systems will disallow it. It is a nice idea (due to Andre), but
it is all but unworkable in real Scheme systems. It's still not clear
to me that a well-defined, extensible matcher can be expressed in R6RS,
either.

David

samth

unread,
Dec 1, 2006, 12:44:04 PM12/1/06
to

Alex Shinn wrote:
> samth wrote:
> >
> > That's harder to check with syntax-rules, though (the obvious way I can
> > think of requires quadratic code blowup).
>
> Your lack of faith is disturbing. :)
>
> SYNTAX-RULES is unable to create or destructure scheme atoms
> (chars, numbers, strings and symbols). That is its only limitation.
> Anything else it can do, even straightforwardly if you're comfortable
> with CPS-style programming, and for tasks like this it's actually
> fairly natural.

First, this is false. Syntax-rules is a very limited system, which is
rarely suitable for any 'production' macro (meaning any macro that has
customers other than the author). Here are a few things that
syntax-rules can't do

1. Report intelligent syntax errors
2. Use literal identifiers without regard for their bindings
3. Take arguments that might be abstracted by macros

There are lots more, of course. This is even leaving aside things that
are inconvenient of expensive to do in syntax-rules.

Second, your or patterns consume quadratic space while expanding.
Consider the expansion the following expression: (match-extract-vars
(or a ...) () () ()) where a ... represents n variables.

After ~ n/2 * 6 steps (6 being an irrelevant constant factor here),
this will expand into n/2 letrec-syntaxes, each of which contains at
least n/2 of the original n variables. Therefore, the total size of
these letrec-syntaxes is at least n/2*n/2 = n^2/4. Therefore, the
expansion of an or pattern takes worst-case quadratic space (and thus
time).

sam th

Alex Shinn

unread,
Dec 1, 2006, 7:22:37 PM12/1/06
to
samth wrote:
> Alex Shinn wrote:
> >
> > SYNTAX-RULES is unable to create or destructure scheme atoms
> > (chars, numbers, strings and symbols). That is its only limitation.
> > Anything else it can do, even straightforwardly if you're comfortable
> > with CPS-style programming, and for tasks like this it's actually
> > fairly natural.
>
> First, this is false. Syntax-rules is a very limited system, which is
> rarely suitable for any 'production' macro (meaning any macro that has
> customers other than the author).

Wow. That must be a big shock to all the production
SYNTAX-RULES macros out there.

> Here are a few things that syntax-rules can't do

> 1. Report intelligent syntax errors

(define-syntax syntax-error
(syntax-rules ()
((_))))

(syntax-error "invalid pattern")

works portably on all systems.

> 2. Use literal identifiers without regard for their bindings

Do you mean it can't break hygiene? This is false, as Al Petrofsky
and Oleg Kielyov already proved.

http://groups.google.com/group/comp.lang.scheme/browse_frm/thread/6501ea8c54b1a07b/058b12b4a7bdc8b1

> 3. Take arguments that might be abstracted by macros

????

--
Alex

Jens Axel Søgaard

unread,
Dec 1, 2006, 9:18:24 PM12/1/06
to
Alex Shinn skrev:

>> Here are a few things that syntax-rules can't do
>
>> 1. Report intelligent syntax errors
>
> (define-syntax syntax-error
> (syntax-rules ()
> ((_))))
>
> (syntax-error "invalid pattern")
>
> works portably on all systems.

Reporting an error isn't enough. The key in a production
system is to provide precise error messages. With *many*
syntax-rules macros in use today, you just get "an error",
when something is wrong.

Consider this example, which is typical of the current
usage of syntax-rules:

; first let is defined in terms of lambda
; (we ignore named lambda)

(define-syntax let
(syntax-rules ()
((_ ((x v) ...) e1 e2 ...)
((lambda (x ...) e1 e2 ...) v ...))))

Now we want to define let1 in terms of let:

(define-syntax let1
(syntax-rules ()
((_ x e e1 ...)
(let ((x e)) e1 ...))))

An unsuspecting beginner now writes:

(let1 1 2 3)

What error should he get? Well, he ought to be told,
that let1 expects its first argument to be an identifier.

What happens in real life?


$ ./scheme48.bat
Welcome to Scheme 48 1.3 (made by js on 22-05-2005 16:30)
Copyright (c) 1993-2005 by Richard Kelsey and Jonathan Rees.
Please report bugs to scheme-...@s48.org.
Get more information at http://www.s48.org/.
Type ,? (comma question-mark) for help.

> (define-syntax let
(syntax-rules ()
((_ ((x v) ...) e1 e2 ...)
((lambda (x ...) e1 e2 ...) v ...))))
#{Unspecific}

> (define-syntax let1
(syntax-rules ()
((_ x e e1 ...)
(let ((x e)) e1 ...))))
#{Unspecific}

> (let1 1 2 3)

Warning: invalid expression
(#{Name #{Generated lambda 349}} (1) 3)
(&syntax-error)

Warning: non-procedure in operator position
('syntax-error 2)
(procedure: #{Type :other #f :symbol})
(&warning)

Error: attempt to call a non-procedure
('syntax-error 2)
1>


The error message wasn't in terms of let1, but in terms
of the macro to which let1 expands.

It is probably possible to write a better let1 macro
with the syntax-rules system (where better means
the error is reported in terms of let), but in practice
this is seldom done. (As an aside, how would you
write let1 with syntax-rules?)

In non-syntax-rules macros systems, it is usually easier
add error-checking than it is in the syntax-rules system.


Consider the following syntax-case implementation of let:

(require-for-syntax (lib "list.ss")) ; contains filter

(define-syntax (let stx)
(syntax-case stx ()
; intended usage
[(_ ((x v) ...) e1 e2 ...)
(andmap identifier? (syntax->list #'(x ...)))
#'((lambda (x ...) e1 e2 ...) v ...)]
; error: xs must be identifiers
[(_ ((x v) ...) e1 e2 ...)
(not (andmap identifier? (syntax->list #'(x ...))))
(let ([first-non-identifier
(car (filter (lambda (s) (not (identifier? s)))
(syntax->list #'(x ...))))])
(raise-syntax-error #f "identifier expected"
stx first-non-identifier))]
; error: empty body
[(_ ((x v) ...))
(raise-syntax-error #f
"at least one expression must follow the bindings" stx)]
; error: other situations
[_
(raise-syntax-error #f
"wrong syntax, expected (let ((<var> <expr>) ...) <expr> ...)"
stx)]))

Let's try it in DrScheme on some syntax errors:

> (let ((1 2)) 3) ; <- the 1 is colored red
. let: identifier expected in: 1

> (let ((x 1))) ; <- the entire let-expression is colored red
. let: at least one expression must follow the bindings in:
(let ((x 1)))

> (let 1)
. let: wrong syntax, expected (let ((<var> <expr>) ...) <expr> ...) in:
(let 1)


The part doing the same work as the syntax-rules macro was:

[(_ ((x v) ...) e1 e2 ...)
(andmap identifier? (syntax->list #'(x ...)))
#'((lambda (x ...) e1 e2 ...) v ...)]

which is more or less the same as the syntax-rules macro.

In R6RS fenders were added to the syntax-rules system, and
that will help syntax-rules writes to provide better
error messages in the future.

--
Jens Axel Søgaard

Jens Axel Søgaard

unread,
Dec 1, 2006, 9:24:37 PM12/1/06
to
Jens Axel Søgaard skrev:

An extra error case check ought be added: With help of
check-duplicate-identifier, we can check that all the xs are different.

--
Jens Axel Søgaard

samth

unread,
Dec 1, 2006, 11:01:08 PM12/1/06
to

Alex Shinn wrote:
> samth wrote:
> > Alex Shinn wrote:
> > >
> > > SYNTAX-RULES is unable to create or destructure scheme atoms
> > > (chars, numbers, strings and symbols). That is its only limitation.
> > > Anything else it can do, even straightforwardly if you're comfortable
> > > with CPS-style programming, and for tasks like this it's actually
> > > fairly natural.
> >
> > First, this is false. Syntax-rules is a very limited system, which is
> > rarely suitable for any 'production' macro (meaning any macro that has
> > customers other than the author).
>
> Wow. That must be a big shock to all the production
> SYNTAX-RULES macros out there.

C is an unsuitable language for lots of tasks it's used for too.

> > Here are a few things that syntax-rules can't do
>
> > 1. Report intelligent syntax errors
>
> (define-syntax syntax-error
> (syntax-rules ()
> ((_))))
>
> (syntax-error "invalid pattern")
>
> works portably on all systems.

Jens answered this well.

>
> > 2. Use literal identifiers without regard for their bindings
>
> Do you mean it can't break hygiene? This is false, as Al Petrofsky
> and Oleg Kielyov already proved.
>

First, what Al and Oleg (very impressively) prove is that hygiene
doesn't mean what we might intuitively think it means. But since the
only definition of hygiene is "what the hygenic expander does", the
expander can't be proved to be unhygenic (modulo bugs).

Second, I mean the following:

(let ([else #f])
(mycond
[else 3]))
; => 3

which you can't do in syntax-rules.

> > 3. Take arguments that might be abstracted by macros
>
> ????

This is what the the extensible version of match described earlier in
this thread is trying to do.

sam th

Alex Shinn

unread,
Dec 2, 2006, 8:35:11 AM12/2/06
to
Jens Axel Søgaard wrote:
> Alex Shinn skrev:

>
> Reporting an error isn't enough. The key in a production
> system is to provide precise error messages. With *many*
> syntax-rules macros in use today, you just get "an error",
> when something is wrong.

I couldn't find any that didn't report the full error message
with my SYNTAX-ERROR. I did a quick check of PLT,
psyntax (which covers many implementations), Scheme48
and Gauche, and every one of them reported the syntax-error
message.

> It is probably possible to write a better let1 macro
> with the syntax-rules system (where better means
> the error is reported in terms of let), but in practice
> this is seldom done. (As an aside, how would you
> write let1 with syntax-rules?)

In this case the matter of error checking and reporting
is unrelated to the macro system, and purely the
responsibility of the macro author to produce intelligent
errors where appropriate.

I believe this gives the error message you want:

(define-syntax let1
(syntax-rules ()
((_ var val body0 body1 ...)
(syntax-symbol? var
(let ((var val)) body0 body1 ...)
(syntax-error "LET1 expected a variable but got" var)))))

--
Alex

Alex Shinn

unread,
Dec 2, 2006, 8:58:10 AM12/2/06
to
samth wrote:
> Alex Shinn wrote:
> >
> > [...] Obviously, if you use defmacro you'll

> > get hygiene problems (*especially* if you mix it with
> > hygienic macros).
>
> I thought you were suggesting that there was something about matchers
> that was either unhygenic, or that was particulary bad about matchers
> implemented in defmacro (as opposed to the or macro being implemented
> with defmacro).

No, I don't think there's anything inherently special about matchers.
The
problem (apart from the inherent unhygienic nature of defmacro) is
specifically the *mixing* of both hygienic and unhygienic macros in the
same code. You can get unexpected behavior when identifiers are
hygienically relabeled inconsistently.

The thing that broke for me was trying to add unhygienic pattern
matching to my hygienic loop macro:

http://www.call-with-current-continuation.org/eggs/loopy-loop.html

As a simplified example, consider an EACH macro, that iterates
a variable over each element of an abstract sequence:

(define-syntax each
(syntax-rules ()
((_ var (seq args ...) . body)
(let-syntax
((gen
(syntax-rules ()
((_ init cursor ref next done?)
(let loop ((cursor init))
(if (not (done? cursor))
(let1 var (ref cursor)
(begin . body)
(loop (next cursor)))))))))
(seq (gen) args ...)))))

We'll provide one sequence type, IN-LIST:

(define-syntax in-list
(syntax-rules ()
((_ k ls)
(in-list k ls tmp))
((_ (k ...) ls tmp)
(k ... ls tmp car cdr null?))
))

Now we can write code like

> (each x (in-list '(a b c)) (write x) (newline))
a
b
c

Note we used LET1 in the expansion. This is fine with
Jens' hygienic LET1, but if we use an unhygienic version:

(define-macro (let1 var val . body)
`(let ((,var ,val))
,@body))

the macro breaks (in different ways in both psyntax and PLT),
and this is an extremely simple macro that doesn't even introduce
any identifiers.

--
Alex

Jens Axel Søgaard

unread,
Dec 2, 2006, 8:58:44 AM12/2/06
to
Alex Shinn skrev:
> Jens Axel Søgaard wrote:

>> It is probably possible to write a better let1 macro
>> with the syntax-rules system (where better means
>> the error is reported in terms of let), but in practice
>> this is seldom done. (As an aside, how would you
>> write let1 with syntax-rules?)
>
> In this case the matter of error checking and reporting
> is unrelated to the macro system, and purely the
> responsibility of the macro author to produce intelligent
> errors where appropriate.
>
> I believe this gives the error message you want:
>
> (define-syntax let1
> (syntax-rules ()
> ((_ var val body0 body1 ...)
> (syntax-symbol? var
> (let ((var val)) body0 body1 ...)
> (syntax-error "LET1 expected a variable but got" var)))))


What is the definition of syntax-symbol?

--
Jens Axel Søgaard

Alex Shinn

unread,
Dec 2, 2006, 9:09:02 AM12/2/06
to
samth wrote:
>
> This is what the the extensible version of match described earlier in
> this thread is trying to do.

Yes, but it _can_ do this. It is only in the presense of (non-R5RS)
module systems that (R5RS) syntax-rules has problems with this.

--
Alex

Alex Shinn

unread,
Dec 2, 2006, 9:14:37 AM12/2/06
to
Jens Axel Søgaard wrote:
>
> What is the definition of syntax-symbol?

Oleg called this symbol??:

http://groups.google.com/group/comp.lang.scheme/browse_frm/thread/db4f3af8cec39059/803ac7d550c1f685

But I suspect you know this, since you used it in your own
matcher. You couldn't write a MATCH-style pattern matcher
without it.

--
Alex

Jens Axel Søgaard

unread,
Dec 2, 2006, 9:12:42 AM12/2/06
to
Jens Axel Søgaard skrev:

And was this the version of syntax-error, you were using?

(define-syntax syntax-error
(syntax-rules ()
((_))))

--
Jens Axel Søgaard

Alex Shinn

unread,
Dec 2, 2006, 9:26:30 AM12/2/06
to

Well, it works in the default Chicken extension mechanism which
loads things into the top-level. It could be very useful to have a
standard match library, and external helpers for things like
convenient matching of SXML trees.

Extensible macros are easy if you're willing to stay within
a single namespace. For instance, my LOOP macro posted
earlier is extensible, where the iterator macros use the convention
of being name IN-<FOO>. In the case of a matcher, you could
have a special pattern that dispatches to helper macros:

(: match-permutation <pat1> <pat2> ...)

where the : is a special symbol and MATCH-PERMUTATION
is passed the appropriate continuation for expansion.

--
Alex

Alex Shinn

unread,
Dec 2, 2006, 9:29:34 AM12/2/06
to
Jens Axel Søgaard wrote:
>
> And was this the version of syntax-error, you were using?
>
> (define-syntax syntax-error
> (syntax-rules ()
> ((_))))

Almost:

(define-syntax syntax-error
(syntax-rules ()
((_) (0))))

since some implementations require a template.

--
Alex

Jens Axel Søgaard

unread,
Dec 2, 2006, 9:36:50 AM12/2/06
to
Alex Shinn skrev:

I found syntax-symbol? via Google after I posted. I am just trying
to get a complete example, so we can discuss pros and cons of
the error messages produced.

Using the syntax-error macro from JRM (the other contains
no template, and won't compile), is this the complete example?

(define-syntax let
(syntax-rules ()
((_ ((x v) ...) e1 e2 ...)
((lambda (x ...) e1 e2 ...) v ...))))

(define-syntax syntax-symbol?
(syntax-rules ()
((syntax-symbol? (?x . ?y) ?sk ?fk) ?fk)
((syntax-symbol? #(?x ...) ?sk ?fk) ?fk)
((syntax-symbol? ?x ?sk ?fk)
(let-syntax ((magical-mystery-macro
(syntax-rules ()
((magical-mystery-macro ?x ??sk ??fk) ??sk)
((magical-mystery-macro ?y ??sk ??fk) ??fk))))
(magical-mystery-macro the-walrus ?sk ?fk)))))

(define-syntax syntax-error
(syntax-rules ()
((syntax-error)
(syntax-error "Bad use of syntax error!"))))

(define-syntax let1
(syntax-rules ()
((_ var val body0 body1 ...)
(syntax-symbol? var
(let ((var val)) body0 body1 ...)
(syntax-error "LET1 expected a variable but got"
var)))))

(let1 1 2 3)


Let's try it (I saved the example to "c:/tmp/n.scm") :

Welcome to Scheme 48 1.3 (made by js on 22-05-2005 16:30)
Copyright (c) 1993-2005 by Richard Kelsey and Jonathan Rees.
Please report bugs to scheme-...@s48.org.
Get more information at http://www.s48.org/.
Type ,? (comma question-mark) for help.

> (load "c:/tmp/n.scm")
c:/tmp/n.scm

Warning: use of macro doesn't match definition
(syntax-error "LET1 expected a variable but got" 1)
(&syntax-error)
'syntax-error
>

This is an improvement. But - which let1-expression in my
program provoked the error?

And can I be sure to look for a let1-expression? What if
the let1 stems from the expansion of another macro?
That is, how can I get a syntax-rules expander to report the
original let1-expressions as the source of the error?


--
Jens Axel Søgaard

Alex Shinn

unread,
Dec 2, 2006, 9:42:33 AM12/2/06
to
samth wrote:
>
> Second, your or patterns consume quadratic space while expanding.

Thanks for pointing this out. I've only hastily thrown the macro
together in the past few days, and am putting more emphasis
into optimizing the runtime results rather than the compile-time.
It's unfortunate that existing syntax-rules imlementations
don't reclaim unused let-syntax patterns.

The easy fix is to chain the identifier checks, so that each one
only matches a single identifier and then passes to the previous
check. But I don't expect the number of pattern identifiers to
ever get that large, especially in OR patterns where the only
valid uses I've seen are destructuring of structures which share
common elements.

If it does ever get sluggish I'll fix it, but in the meantime I'll
leave it as is, since my tests are all quite speedy and I'm
still holding out for better syntax-rules implementations :)

--
Alex

Alex Shinn

unread,
Dec 2, 2006, 9:55:32 AM12/2/06
to
Jens Axel Søgaard wrote:
> > (load "c:/tmp/n.scm")
> c:/tmp/n.scm
>
> Warning: use of macro doesn't match definition
> (syntax-error "LET1 expected a variable but got" 1)
> (&syntax-error)
> 'syntax-error
> >
>
> This is an improvement. But - which let1-expression in my
> program provoked the error?
>
> And can I be sure to look for a let1-expression? What if
> the let1 stems from the expansion of another macro?
> That is, how can I get a syntax-rules expander to report the
> original let1-expressions as the source of the error?

Again, these same questions would apply to any macro
system (or non-macro system). If you have an error in
FOO that originated in bad data from BAR, it is BAR's
responsibility to catch and report that error.

If one system provides nice info such as expansion traces
and line numbers, that's just a well designed system, and is
not a property of the general macro approach itself, whether
you use defmacro, syntax-rules, syntax-case, explicit
renaming, syntactic-closures, or whatever.

--
Alex

Jens Axel Søgaard

unread,
Dec 2, 2006, 10:27:33 AM12/2/06
to
Alex Shinn skrev:

> Jens Axel Søgaard wrote:
>> > (load "c:/tmp/n.scm")
>> c:/tmp/n.scm
>>
>> Warning: use of macro doesn't match definition
>> (syntax-error "LET1 expected a variable but got" 1)
>> (&syntax-error)
>> 'syntax-error
>> >
>>
>> This is an improvement. But - which let1-expression in my
>> program provoked the error?
>>
>> And can I be sure to look for a let1-expression? What if
>> the let1 stems from the expansion of another macro?
>> That is, how can I get a syntax-rules expander to report the
>> original let1-expressions as the source of the error?
>
> Again, these same questions would apply to any macro
> system (or non-macro system). If you have an error in
> FOO that originated in bad data from BAR, it is BAR's
> responsibility to catch and report that error.

Thus we need tools, so that we easily can add write
error checks in BAR.

I am saying the tools in syntax-rules aren't powerful
ebnough today. It *is* a problem in syntax-rules to report
errors early.

In the let1 example, we want to check whether x were an
identifier or not. That were possible due to Oleg's magical
macro. However, it isn't particularly difficult to make up
stuff that is difficult to check with syntax-rules. Even
simple stuff can be difficult to check. Is a macro argument
a number? Is a macro argument a natural number?
Is it an even number? A prime larger than 1000? Or
a more realistic example: Are there duplicate identifiers
in a series of identifiers to be bound? (as in let).

Some simple question can indeed be answered within
the syntax-rules system, but in current practice it
is seldom done. Most syntax-rules macros assume
their argument are of the correct type, and expand
happily - then later another macro breaks, and
seldom is correct source location reported.

With effort we can write better syntax-rules macros,
but the portable syntax-error trick looses the source
location of the expression from which the error originated.

In the example of a syntax-case implementation of
let, I was careful to report the error in terms of the
original syntax. And it wasn't difficult to do so.

I know from #scheme that you aren't fond of syntax-case
macros, but for simple macros they are just as
easy to write as syntax-rules macros. In R6RS
we are in the fortunate situation that fenders
were added, so in the future we can expect improved
error reporting from syntax-rules macros.

--
Jens Axel Søgaard

Andre

unread,
Dec 2, 2006, 12:21:06 PM12/2/06
to
David Van Horn wrote:

> It's still not clear
> to me that a well-defined, extensible matcher can be expressed in R6RS,
> either.

I believe it can, by using an expand-time table. Here is the
basic trick, reproducing the gist of the example in Matthew
Flatt's "...you want it when" paper. For demonstration purposes,
here record-switch currently just checks whether the record type
is registered in the table. In a full-fledged matcher, it would
do some more interesting stuff with the registered information.

There is some freedom in the draft r6rs semantics for libraries,
but this should work in all the allowed variants.

The implementation is at

http://www.het.brown.edu/people/andre/macros/index.htm

;;======================================================
;;
;; Example from Matthew Flatt's paper.
;; Illustrates use of expand-time state.
;;
;;======================================================

(library records-helper
(export register! registered?)
(import r6rs)
(define table '())
(define (register! name)
(set! table (cons name table)))
(define (registered? name)
(memp (lambda (entry) (free-identifier=? name entry))
table)))

(library records
(export define-record record-switch)
(import r6rs (for records-helper expand))
(define-syntax define-record
(lambda (form)
(syntax-case form ()
((_ name)
(syntax
(begin
(define name 'record-type-descriptor)
(define-syntax dummy
(begin
(register! (syntax name))
(lambda (form) 'never-used)))))))))
(define-syntax record-switch
(lambda (form)
(syntax-case form ()
((_ exp (name consequence))
(if (registered? (syntax name))
(syntax (if (eq? exp 'name) consequence "no match"))
(syntax-violation #f "Invalid record type" (syntax name))))))))

(library zoo
(export zebra)
(import records)
(define-record zebra))

(library metrics
(export)
(import r6rs zoo records)
(display
(record-switch 'zebra (zebra 'zebra)))) ;==> zebra

Cheers
Andre

Alex Shinn

unread,
Dec 3, 2006, 8:36:59 PM12/3/06
to
Jens Axel Søgaard wrote:
>
> I know from #scheme that you aren't fond of syntax-case
> macros, but for simple macros they are just as
> easy to write as syntax-rules macros. In R6RS
> we are in the fortunate situation that fenders
> were added, so in the future we can expect improved
> error reporting from syntax-rules macros.

This has nothing to do with syntax-case. I may not be
fond of it, but to be honest I'm not really fond of anything :)

I haven't said anything against syntax-case this whole
discussion, because my whole point was to have something
*portable* that I can use right now. The macro I wrote is
fully portable, supports a superset of the features of the
MATCH I had been using previously, and is better optimized
for the use cases I'm immediately concerned with. I'm
pretty darned happy with it :)

I think fenders are an interesting idea, and certainly
syntax-rules in general has room for improvement,
but to be honest I think the error reporting and tests
available are sufficient for 99% of all macros. Macros
work on structure, and syntax-rules can check structure.
Anywhere you might want, say, a natural integer, you'd
most likely want the macro to support any expression
that evaluates to a natural integer, so INTEGER? isn't
much use until runtime.

--
Alex

0 new messages