Why R6RS is controversial

Alex Shinn

unread,

May 29, 2007, 1:56:52 AM5/29/07

to

I've seen a few links about R6RS on c.l.s, LtU, reddit, and other
forums, and many of the comments seem to be along the lines of
"cool, it would be great if Scheme had those new features." If that
is your opinion you can just choose any Scheme implementation and it
will have all those features and more. If you are curious why R6RS
is controversial at all, why there is a community vote, and why
calls for boycotts and for alternatives have been made, read on.

[Disclaimer: I dropped out of the R6RS discussions fairly early
because it was hopelessly far away from what I would even consider,
and because the editors were not responsive to my suggestions. I am
_not_ an unbiased reporter, but I doubt very much you'll be able to
find one who actually knows what s/he is talking about. And despite
an initial intention of writing this as a justification for a "no"
vote, I found that those more patient than I have fought on and the
editors became more responsive, and by the latest draft (r5.93rs)
R6RS had improved quite a bit. As the editors are still working and
this isn't even the final draft, I can't even be sure at this point
I will vote "no."]

As a bit of background, the basis for much of the criticism can be
found in the first sentence of the introduction to the report:

Programming languages should be designed not by piling feature
on top of feature, but by removing the weaknesses and
restrictions that make additional features appear necessary.

Thus, as opposed to almost any other language where a rich feature
set is considered a good thing, the goal in Scheme is to find the
simplest and most general way to be able to express any given
feature. For example, instead of specifying any core looping
constructs like FOR or WHILE loops, Scheme made only the "proper
tail recursion" requirement, allowing loops to be expressed as
recursive procedure calls. This is at the same time simpler and
more general than any single loop syntax, and allows us to build any
number of efficient loop syntaxes with macros. Likewise, Scheme
does not provide any core forms for local variables because they can
be implemented as syntactic sugar for procedure application.

In a sense, the point is to be the most axiomatic language possible.
Like mathematicians who struggled to reduce geometry beyond Euclid's
5 postulates, Scheme tries to reduce programming to its most
essential components (while still being high-level and generally
useful and not reducing so far as Turing machines or SK combinators,
which could be thought of as non-Euclidean programming :)

This does not mean all features should be rejected out of hand, but
rather that any new feature should be viewed with initial suspicion.
We should attempt to break that feature into the smallest and most
general essential primitives, so that the new feature can be
expressed as a library. If it's not clear yet how to reduce the
feature, it probably requires more research and isn't ready for the
official standard that all Schemes must conform to. If it is
nonetheless essential it is usually best to provide a high-level
interface with the low-level details left unspecified, so that it
could later be re-defined in terms of some simpler primitive. For
example, a simple TCP/IP interface could be specified that may later
be defined in terms of BSD sockets.

Some desired features can be reduced in more than one way, none of
which seems any more axiomatic than the others. A module system,
for instance, can be expressed entirely with lexical scope and
macros, usually with a few minor extensions to the macro system.
Alternately it can be expressed as simple namespace management, with
a few primitives for handling that. Alternately modules can exist
outside the code, rather than being built within it. None of these
are clearly better. However, a high-level module syntax can be
defined which can be implemented on top of any of these, and indeed
the R6RS modules are so defined.

[Note R6RS makes a core language vs. library distinction, but isn't
especially meaningful, as some of libraries imply semantic changes
to the language as a whole.]

A consequence of this axiomatic approach is not to over-specify
things, because that prevents further exploration in new features
and feature reduction, and restricts implementation strategy. As a
result the Scheme community is rich in ideas and implementations.
Many of the complaints against R6RS are precisely that it
over-specifies.

Below, for the record, I summarize some of the more controversial
issues people have with R6RS (based on the r5.93rs draft). I
include both complaints I agree with and those I don't, but have
undoubtedly missed many and misstated others, so if you have an
issue not mentioned please reply with it (preferably in summarized,
non-ranting form if you can restrain yourself).

------------------------------------------------------------------------

* IDENTIFIER-SYNTAX

Identifier syntax means that the macro system can expand a single
identifier even when not the first symbol in an expression. Thus
when you see an identifier, it may not actually be a real variable
reference, which can be confusing both for humans and other macros
which want to analyze code. It is a substantial complication to
the semantics of the language, with arguable benefits.

* Exceptions

Many things specified simply as "errors" in R5RS (with unspecified
behavior) are now required to signal exceptions. The exceptions
themselves fit within a complex hierarchy.

* Module system

- The enforced phase separation disallows some implementation
strategies and extensions.

- The versioning system is complex and not obviously necessary.
Something like it could always be added later.

- Libraries need to be wrapped in an extra layer of parenthesis,
as opposed to a single definition at the top of the file.

* Unicode

The standard makes Scheme (non-optionally) Unicode-specific, and
defines the character data-type as Unicode scalar values. This
prevents small implementations which only want to deal with ASCII
(e.g. in embedded systems), implementations which want to support
Unicode but want a different meaning of character (e.g. grapheme
clusters), and implementations which want to support a different
character altogether. There are a number of alternative proposals
to Unicode including Mojikyo, eKanji, TRON, and UTF-2000. Scheme
has been around for a long time, Lisp even longer, and many are
hesitant to wed themselves to a single character set forever.

Behavior for Unicode for those implementations which use it could
always be specified optionally (or even in a separate report).

* STRING-REF is recommended to be constant time

This discourages a number of implementation strategies that use
variable width character encodings or alternate string
representations such as ropes or trees. It is easy to provide a
string API that is convenient to use and efficient for both
traditional and alternative string representations.

* Safety

R6RS makes a (possibly too) strong claim about safety, and
introduce an exception type for implementation restrictions.

* Square brackets

Making [] identical to (), apart from all the arguments about
which looks better, breaks the entire axiomatic spirit and
prevents alternative extensions from using []. It also introduces
a trivial stylistic distinction where none existed before, and
puts Scheme among the ranks of languages where programmers need to
agree on a style guideline (there are many variations already)
before collaborating on a project.

* IEEE-754 values (-0.0, infinities, and NaN)

Many disagree whether these values belong in the language, what
their exactness is, and what the behavior of various operations on
them should be. The latest draft makes them optional, so this
isn't likely to be terribly controversial.

* CALL/CC

A commonly enough used abbreviation for
call-with-current-continuation used in talking about the operator,
and already supported by some systems, some argue the operator is
used rarely enough (and is supposed to be so) that the
abbreviation isn't needed. At any rate, no other procedure in the
language has two names.

* Comment Syntax

#; expression comments and #| ... |# block comments have been
added to the language, though are not needed.

* Bytevector Syntax

#vu8(...) reads as a bytevector. Bytevectors themselves are not
so controversial, though people disagree on the names and any
external representation.

------------------------------------------------------------------------

The following are technically part of the library specification,
though many of them imply core semantic changes.

* Ignored SRFI's

The SRFI process was specifically intended as a testbed for new
features for possible inclusion in further standards. R6RS was of
course under no obligation to use any SRFI's, however in a few
places it seemed to deliberately ignore the progress made by
SRFI's. SRFI-1, in particular, is almost universally supported,
is exceedingly popular, and in fact how to access SRFI-1 in a
given implementation is one of the most frequent beginner
questions asked. Yet despite this the R6RS draft uses
gratuitously incompatible names and API's to many SRFI-1
procedures. As R6RS claims to emphasize the community role, it
seems strange that it should ignore a previous community effort,
and seems to discourage future SRFI work.

* STRING-NORMALIZE-*

Normalization is hideously complicated, and may require many
manual conversions back and forth and after any operation that may
not have preserved normalization. A huge simplification very much
worth consideration is a system that maintains all internal
strings with a single consistent normalization, but explicitly
allowing conversion to any of a number of specific normalization
forms prevents this approach.

A simpler API could just provide a single STRING-NORMALIZE
procedure, which would normalize to a preferred internal
normalization form, and in the case of an automatically
normalizing implementation would just be the identity function.

As a separate extension it would be possible to provide utilities
to normalize to bytevectors with specific normalization forms, for
interaction with external tools.

* CHAR-*CASE

Case mapping is an incompletely defined operation on characters
when they are defined as Unicode scalar values, so it is likely
that any algorithm using individual character case mappings
instead of string case mappings is broken.

* Records

The records library is very large and complex, and cannot be
implemented as a portable library.

* Enumerations

These are highly un-Lisp-like and unpopular, and would not be
needed if not for several other controversial features.

* ENDIANNESS

There is no reason nor rationale for making this syntax. Since
macros are not first class and are less flexible than procedures,
it is generally accepted style to avoid making anything a macro
unless absolutely necessary, yet this doesn't have so much as a
justification.

* FILE-OPTIONS, BUFFER-MODE, EOL-STYLE, ERROR-HANDLING-MODE

These optional arguments to opening file ports are also defined as
syntax without reason. Four optional positional arguments is also
unwieldy to some. Others would rather have EOL-STYLE managed by
operations on a port (e.g. READ-LINE) rather than the port itself.
The EOL-style itself arguably shouldn't bother with support for
NEL or LS, and possibly should allow automatic detection. The
ERROR-HANDLING-MODE is complex.

* Binary vs. Text port distinction

Some want it, some don't. The primary argument in favor of the
distinction is efficient buffering of transcoded ports. The
compromise would seem to be to make the distinction but allow
implementations to optionally allow mixing procedures on both.
The current draft makes a distinction, but does not specify what
happens when binary procedures are applied to text ports and vice
versa.

* Pair and string mutation moved to separate libraries

SET-CAR!, SET-CDR! and STRING-SET! have been moved to separate
libraries. Pairs and strings are still mutable, so this does
nothing to change the semantics of the language or even to help
optimizations (it would require a global compiler to detect that
these modules were never imported, but at that point it's trivial
for the compiler to simply detect that these individual procedures
aren't used). It is thus simply a gesture of moving towards a
more functional Scheme. Some people disagree, others think the
gesture is silly.

------------------------------------------------------------------------

For anyone intending to vote, I hope you do read the final draft
carefully before making your decision.

Thanks to the editors for all their hard work, however it turns out.

--
Alex

Chris Rathman

unread,

May 29, 2007, 11:41:22 AM5/29/07

to

>From a minimalist standpoint, which of these methods used to construct
a list is axiomatic and which could theoretically be relegated to a
library?

(cons 1 (cons 2 (cons 3 ())))
(list 1 2 3)
'(1 2 3)
'(1 . (2 . (3 . ())))
'(1 2 3 . ())

An uneducated guess would be that the list function could be defined
in a library using car and cdr. Being a Scheme newbie, I still
haven't quite figured out how the dot pairing operator is distinct
from car and cdr. And SICP seems to show us how to define car and cdr
in terms of lambda, so perhaps they could be defined as library
functions?

Chris Rathman

unread,

May 29, 2007, 12:04:50 PM5/29/07

to

On May 29, 10:41 am, Chris Rathman <Chris.Rath...@tx.rr.com> wrote:

> An uneducated guess would be that the list function could be defined
> in a library using car and cdr. Being a Scheme newbie, I still
> haven't quite figured out how the dot pairing operator is distinct
> from car and cdr. And SICP seems to show us how to define car and cdr
> in terms of lambda, so perhaps they could be defined as library
> functions?

Doh! S/car and cdr/cons/ in all the above.

Anyhow, the question I had is how the Scheme community goes about
drawing the line between minimalism and compromise. From the outside
looking in, I don't see Scheme as purely driven towards PLT axioms -
and if I'm not mistaken, one doesn't have to go very deep into the
language to realize that the language designers did not take a purist
approach.

I think minimalism is more what you'd call "guidelines" than actual
rules.

Anton van Straaten

unread,

May 29, 2007, 1:09:51 PM5/29/07

to

Chris Rathman wrote:
>>From a minimalist standpoint, which of these methods used to construct
> a list is axiomatic and which could theoretically be relegated to a
> library?
>
> (cons 1 (cons 2 (cons 3 ())))
> (list 1 2 3)
> '(1 2 3)
> '(1 . (2 . (3 . ())))
> '(1 2 3 . ())
>
> An uneducated guess would be that the list function could be defined
> in a library using car and cdr. Being a Scheme newbie, I still
> haven't quite figured out how the dot pairing operator is distinct
> from car and cdr.

Dotted pairs are a literal syntax for (literal) pairs. On their own,
they don't replace cons, since something like quasiquote would be needed
to support constructing pairs from variables (e.g. `(,a . ,b)).

An extreme minimal approach would typically only provide cons. Literal
syntax for data types isn't necessary.

> And SICP seems to show us how to define car and cdr
> in terms of lambda, so perhaps they could be defined as library
> functions?

Exactly! And integers can also be defined in terms of lambda (e.g.
Church numerals). So we can eliminate all that pesky literal syntax for
integers that Scheme supports, along with that pointless and
theoretically non-primitive integer datatype.

But after we do that, I suspect we'll soon start discovering the
exceptions to the rule raised in another comment, "raw computation speed
is very seldom important at all."

Anton

Ray Dillinger

unread,

May 29, 2007, 2:01:28 PM5/29/07

to

Alex Shinn wrote:

> Below, for the record, I summarize some of the more controversial
> issues people have with R6RS (based on the r5.93rs draft). I
> include both complaints I agree with and those I don't, but have
> undoubtedly missed many and misstated others, so if you have an
> issue not mentioned please reply with it (preferably in summarized,
> non-ranting form if you can restrain yourself).

Wow. This is excellent work you've done here, collecting all
this stuff in one place and explaining why it raises cause
for concern. I agree with most of these criticisms, actually.

> * IDENTIFIER-SYNTAX
>
> Identifier syntax means that the macro system can expand a single
> identifier even when not the first symbol in an expression. Thus
> when you see an identifier, it may not actually be a real variable
> reference, which can be confusing both for humans and other macros
> which want to analyze code. It is a substantial complication to
> the semantics of the language, with arguable benefits.

This is one of the things that gave me misgivings, but I
wasn't able to form a cogent argument against it. It is a
powerful weapon in the "obfuscated scheme" programming
contestant's arsenal, but it's not clear to me that most
programmers will use it that badly.

> * Exceptions
>
> Many things specified simply as "errors" in R5RS (with unspecified
> behavior) are now required to signal exceptions. The exceptions
> themselves fit within a complex hierarchy.

A complex and highly overspecified hierarchy. I am strongly
of the opinion that a very different and much simpler method
for handling such things is better. The one expressed in the
R6RS candidate appears to have semantics mostly copied from
other languages, and does not suit most of the other programming
paradigms that Scheme otherwise supports.

> * Module system
>
> - The enforced phase separation disallows some implementation
> strategies and extensions.
>
> - The versioning system is complex and not obviously necessary.
> Something like it could always be added later.
>
> - Libraries need to be wrapped in an extra layer of parenthesis,
> as opposed to a single definition at the top of the file.

Valid points all. An additional point is that the module
system becomes an additional barrier to the use of scheme
as a pedagogic language, because it's something that beginners
have to deal with before much of anything else works, and
long before it is possible to explain to them why.

> * Unicode

> The standard makes Scheme (non-optionally) Unicode-specific, and
> defines the character data-type as Unicode scalar values. This
> prevents small implementations which only want to deal with ASCII
> (e.g. in embedded systems), implementations which want to support
> Unicode but want a different meaning of character (e.g. grapheme
> clusters), and implementations which want to support a different
> character altogether. There are a number of alternative proposals
> to Unicode including Mojikyo, eKanji, TRON, and UTF-2000. Scheme
> has been around for a long time, Lisp even longer, and many are
> hesitant to wed themselves to a single character set forever.

I think that largely covers it. I do want to point out that the
behavior of grapheme-cluster characters under most linguistic
operations is *far* more reasonable, consistent, and logical,
from the POV of actual linguistics and what a student of those
natural languages would expect, than the codepoint characters
selected by the committee. Further, I strongly feel that
behavior which is more reasonable, consistent and logical to
users of natural languages written in those characters is much
more likely to be implementable in other representations of those
characters.

The standard should specify binary I/O and primitives for
using binary I/O to build character ports, and then have unicode
I/O as a standard library - which need not be loaded for a
particular implementation or application. Unicode case operations
and other semantics should be another standard library, probably
a superset of the unicode I/O library.

> * STRING-REF is recommended to be constant time
>
> This discourages a number of implementation strategies that use
> variable width character encodings or alternate string
> representations such as ropes or trees. It is easy to provide a
> string API that is convenient to use and efficient for both
> traditional and alternative string representations.

Agree, again. Ropes with copy-on-write nodes are more efficient
as the strings grow longer. Once you're doing corpus linguistics,
there really is no alternative. This guarantees all atomic string
operations in either constant or logarithmic time with respect to
the length of the string, *and* automatically enables shared storage
for the actual character sequences when new strings are created by
minor modifications from old strings. Array strings, as implied
by this wording in the R6RS candidate, are more efficient only if
your strings are mostly under three kilobytes long.

The standard should not forbid either of these implementation
strategies; It should presume that the implementors (or the
users, if the implementor gives them a choice) know what they're
using the language for and can make a considered choice. It
should specify an API for strings, period.

> * Safety
>
> R6RS makes a (possibly too) strong claim about safety, and
> introduce an exception type for implementation restrictions.

Exceptions again. Highly overspecified again.

> * Square brackets
>
> Making [] identical to (), apart from all the arguments about
> which looks better, breaks the entire axiomatic spirit and
> prevents alternative extensions from using []. It also introduces
> a trivial stylistic distinction where none existed before, and
> puts Scheme among the ranks of languages where programmers need to
> agree on a style guideline (there are many variations already)
> before collaborating on a project.

Agree, again. I don't like them unless they mean something.
Given my druthers, they'd mean a simple vector instead of a list
in data and a syntax call instead of a procedure call in code.
But that would be a very fundamental change indeed and I don't
know if the resulting language would really be the same language.

> * CALL/CC
>
> A commonly enough used abbreviation for
> call-with-current-continuation used in talking about the operator,
> and already supported by some systems, some argue the operator is
> used rarely enough (and is supposed to be so) that the
> abbreviation isn't needed. At any rate, no other procedure in the
> language has two names.

I strongly suspect that the longer name will be disappearing
with R7RS or R8RS. Moreover, both names are now incorrect:
what the routine actually does could more accurately be
expressed by call/wc or call-with-winding-continuation.

> * Comment Syntax
>
> #; expression comments and #| ... |# block comments have been
> added to the language, though are not needed.

The "need" for expression comments, as far as I'm concerned,
just points out a(nother) limitation of our macrology, ie,
that one macro call can expand only into a single expression.
What the expression comment does is expand to zero expressions.
We ought to be able to define a macro that does that, or
expands to multiple expressions, easily.

The "need" for block comments, on the other hand, is not
really addressable by the language. You don't need them
if you have an editor that understands comment prefixes,
and you do if you don't.

> * Bytevector Syntax
>
> #vu8(...) reads as a bytevector. Bytevectors themselves are not
> so controversial, though people disagree on the names and any
> external representation.

Actually I object to these on the grounds that they
introduce de facto static typing to scheme. I think that
type should be an annotation or assertion added to an
otherwise correct procedure rather than something which
changes or specifies semantics.

> * STRING-NORMALIZE-*
>
> Normalization is hideously complicated, and may require many
> manual conversions back and forth and after any operation that may
> not have preserved normalization. A huge simplification very much
> worth consideration is a system that maintains all internal
> strings with a single consistent normalization, but explicitly
> allowing conversion to any of a number of specific normalization
> forms prevents this approach.
>
> A simpler API could just provide a single STRING-NORMALIZE
> procedure, which would normalize to a preferred internal
> normalization form, and in the case of an automatically
> normalizing implementation would just be the identity function.

Absolutely. It hugely overcomplicates things if your internal
strings are other than "a sequence of characters," full stop.
By overspecifying this, the R6RS candidate is setting up users
and impelementors for endless hair and bugs. I had not considered
a string-normalize! procedure; my thought was simply that
normalization ought to have no semantics anywhere except in the
code implementing character I/O ports or converting strings
to/from bitvectors. Seriously: a string is just a sequence
of characters. Normalization doesn't mean anything on characters.
Normalization only means something on a particular representation
of characters, and nothing outside your I/O port code or conversion
to/from binary code ought to have to deal with the idiosyncrasies
of that particular representation. If for any reason you want to
write invalid data (a non-normalized string) to a character stream,
you are clearly not using them as "characters" - you are doing
something that would make more sense as binary I/O. Conversely,
if you read something and want the exact binary sequence, as
opposed to the seqence of characters in a normalized string,
you are clearly not reading "characters." Once again, you are
doing something that would make more sense as binary I/O.

> As a separate extension it would be possible to provide utilities
> to normalize to bytevectors with specific normalization forms, for
> interaction with external tools.

Inside the code that implements character I/O ports and
binary-to-string and string-to-binary conversions. Never in
anything the users ought to be expected to write.

> * CHAR-*CASE
>
> Case mapping is an incompletely defined operation on characters
> when they are defined as Unicode scalar values, so it is likely
> that any algorithm using individual character case mappings
> instead of string case mappings is broken.

But drastically less broken if you are using grapheme-clusters
as characters rather than codepoints as characters. There is only
one extant case in unicode in which case mapping does not work
as a one-to-one mapping on grapheme-cluster characters.

If the standard requires codepoint characters only, then it would
be best to remove these procedures altogether. If the standard
permits representations on which the case relationships are less
broken, it would be better to keep them.

> * FILE-OPTIONS, BUFFER-MODE, EOL-STYLE, ERROR-HANDLING-MODE
>
> These optional arguments to opening file ports are also defined as
> syntax without reason. Four optional positional arguments is also
> unwieldy to some. Others would rather have EOL-STYLE managed by
> operations on a port (e.g. READ-LINE) rather than the port itself.
> The EOL-style itself arguably shouldn't bother with support for
> NEL or LS, and possibly should allow automatic detection. The
> ERROR-HANDLING-MODE is complex.

What these really are, amounts to dynamic-environment variables.
If we're going to keep introducing dynamic-environment variables
then clearly what we need is a reasonable semantics for dynamic
environments. After that, all of this stuff is just libraries
and more stuff like it, if desired, can be user-implemented.

> * Binary vs. Text port distinction
>
> Some want it, some don't. The primary argument in favor of the
> distinction is efficient buffering of transcoded ports. The
> compromise would seem to be to make the distinction but allow
> implementations to optionally allow mixing procedures on both.
> The current draft makes a distinction, but does not specify what
> happens when binary procedures are applied to text ports and vice
> versa.

I think the standard did the right thing, here. You've got to
have text ports distinct from (or built by layering code on top
of) binary ports in order to support more than one way of
reading and writing characters. Since Unicode has three
normalization forms in two endiannesses and (at least) four
character encodings, there are at least 24 different ways to
interpet binary data as characters just in Unicode alone! If
you want an entity someone can call just to say "read a
character" it's got to be a closure over the encoding
information as well as whatever buffering is necessary.

> * Pair and string mutation moved to separate libraries
>
> SET-CAR!, SET-CDR! and STRING-SET! have been moved to separate
> libraries. Pairs and strings are still mutable, so this does
> nothing to change the semantics of the language or even to help
> optimizations (it would require a global compiler to detect that
> these modules were never imported, but at that point it's trivial
> for the compiler to simply detect that these individual procedures
> aren't used). It is thus simply a gesture of moving towards a
> more functional Scheme. Some people disagree, others think the
> gesture is silly.

I think the gesture is silly. Oh, maybe there's a rationale
in that if you want guarantees that code is purely functional
you can just forbid the use of this library (and vectors, and
several other things). But it's silly. If you want a functional
lisp, you can do that. But that's not what scheme is for.
Scheme is for "any paradigm you've got, you can use scheme to
program in it."

Bear

Anton van Straaten

unread,

May 29, 2007, 2:44:39 PM5/29/07

to

Ray Dillinger wrote:
>> * Pair and string mutation moved to separate libraries
>>
>> SET-CAR!, SET-CDR! and STRING-SET! have been moved to separate
>> libraries. Pairs and strings are still mutable, so this does
>> nothing to change the semantics of the language or even to help
>> optimizations (it would require a global compiler to detect that
>> these modules were never imported, but at that point it's trivial
>> for the compiler to simply detect that these individual procedures
>> aren't used). It is thus simply a gesture of moving towards a
>> more functional Scheme. Some people disagree, others think the
>> gesture is silly.
>
>
> I think the gesture is silly. Oh, maybe there's a rationale
> in that if you want guarantees that code is purely functional
> you can just forbid the use of this library (and vectors, and
> several other things). But it's silly. If you want a functional
> lisp, you can do that. But that's not what scheme is for.
> Scheme is for "any paradigm you've got, you can use scheme to
> program in it."

I think this misses the real motivation. Pairs are quite central to
Scheme, and as a result Scheme implementors are hamstrung by pairs being
mutable by default. This is unfortunate, given that in a large
proportion of cases, that mutability isn't actually needed or used.

PLT Scheme is currently experimenting with making pairs immutable by
default. Here's a message about this by Matthew Flatt:

http://groups.google.com/group/plt-scheme/msg/482bcab20116530d

The goal is not to make a pure functional language, it's to make a
better language. A language which forces you to pay for features that
you're not using, as default-mutable pairs do, is not an ideal platform
for implementing "any paradigm you've got".

Anton

Pascal Costanza

unread,

May 29, 2007, 6:21:24 PM5/29/07

to

Ray Dillinger wrote:

>> * IDENTIFIER-SYNTAX
>>
>> Identifier syntax means that the macro system can expand a single
>> identifier even when not the first symbol in an expression. Thus
>> when you see an identifier, it may not actually be a real variable
>> reference, which can be confusing both for humans and other macros
>> which want to analyze code. It is a substantial complication to
>> the semantics of the language, with arguable benefits.
>
> This is one of the things that gave me misgivings, but I
> wasn't able to form a cogent argument against it. It is a
> powerful weapon in the "obfuscated scheme" programming
> contestant's arsenal, but it's not clear to me that most
> programmers will use it that badly.

Identifier syntax is actually a good idea IMHO. It allows you, for
example, to express object-oriented extensions where variables are
automatically taken from an implicit message receiver, roughly like this:

(define-method print <person> ()
(display this.name) (newline)
(display this.address) (newline))

Here, this.name and this.address are supposedly taken from the implicit
this argument for such a method. This is not easily expressible without
identifier syntax.

The argument that this may make code obfuscated is the same argument
other folks hold up against macros in general. The question is whether
there are good uses of such a feature, and there are.

Pascal

--
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/

wayo.c...@gmail.com

unread,

May 29, 2007, 7:46:18 PM5/29/07

to

On May 29, 5:21 pm, Pascal Costanza <p...@p-cos.net> wrote:

> Identifier syntax is actually a good idea IMHO. It allows you, for
> example, to express object-oriented extensions where variables are
> automatically taken from an implicit message receiver, roughly like this:
>
> (define-method print <person> ()
> (display this.name) (newline)
> (display this.address) (newline))

I implemented something like this just the other day (with alot of
help from Abdulaziz!).

(define-class point (x y z))

(define p (make-point 10 20 30))

(with-point p)

(list p.x p.y p.z)

;; expands to

(list (point-x p) (point-y p) (point-z p))

;; set the x

(p.x! 1)

;; expands to

(set-point-x! p 1)

;; The slots can be "typed"

(define-class airplane ((pos point) (vel point)))

(define a (make-airplane (make-point 1 2 3) (make-point 4 5 6)))

(with-airplane a)

(list a.pos a.vel)

;; expands to

(list (airplane-pos a) (airplane-vel a))

;; there is syntax for the components of the pos and vel as well:

(list a.pos.x a.pos.y a.pos.z)

;; expands to

(list (point-x (airplane-pos a))
(point-y (airplane-pos a))
(point-z (airplane-pos a))

;; set the z of the vel

(a.vel.z! 10)

;; expands to

(set-point-z! (airplane-vel a) 10)

http://dharmatech.onigirihouse.com/scheme/class/class.scm

I've used it with Gambit-C.

Ed

Ray Blaak

unread,

May 30, 2007, 12:37:24 AM5/30/07

to

"wayo.c...@gmail.com" <wayo.c...@gmail.com> writes:
> (define p (make-point 10 20 30))
>
> (with-point p)
>
> (list p.x p.y p.z)
>
> ;; expands to
>
> (list (point-x p) (point-y p) (point-z p))

What happens when you do:

(list (make-point 1 2 3).x)

--
Cheers, The Rhythm is around me,
The Rhythm has control.
Ray Blaak The Rhythm is inside me,
rAYb...@STRIPCAPStelus.net The Rhythm has my soul.

Kjetil S. Matheussen

unread,

May 30, 2007, 3:54:38 AM5/30/07

to

On Wed, 29 May 2007, wayo.c...@gmail.com wrote:

> On May 29, 5:21 pm, Pascal Costanza <p...@p-cos.net> wrote:
>
>> Identifier syntax is actually a good idea IMHO. It allows you, for
>> example, to express object-oriented extensions where variables are
>> automatically taken from an implicit message receiver, roughly like this:
>>
>> (define-method print <person> ()
>> (display this.name) (newline)
>> (display this.address) (newline))
>
> I implemented something like this just the other day (with alot of
> help from Abdulaziz!).
>

<>

>

> http://dharmatech.onigirihouse.com/scheme/class/class.scm
>
> I've used it with Gambit-C.
>

Cool. :-)
It works with sisc as well.

Alex Shinn

unread,

May 30, 2007, 10:18:03 PM5/30/07

to

Pascal Costanza wrote:
>
> Identifier syntax is actually a good idea IMHO. It allows you, for
> example, to express object-oriented extensions where variables are
> automatically taken from an implicit message receiver, roughly like this:
>
> (define-method print <person> ()
> (display this.name) (newline)
> (display this.address) (newline))
>
> Here, this.name and this.address are supposedly taken from the implicit
> this argument for such a method. This is not easily expressible without
> identifier syntax.
>
> The argument that this may make code obfuscated is the same argument
> other folks hold up against macros in general. The question is whether
> there are good uses of such a feature, and there are.

That's only one of the arguments. The real problem is that it
complicate the semantics of the language. Specifically, other macros
cannot know when they see an identifier if it really is a variable
reference or not.

As an example, consider a fast-math macro:

(fast-math (+ (* a b) (* a c)))
=> (* a (+ b c))

(fast-math (+ (* a a) (* a a a)))
=> (let ((t (* a a))) (+ t (* a t)))

That is, it takes an arithmetic expression and refactors and simplifies
and performs common subexpression elimination to achieve the most
optimal equivalent expression (compilers can't do this very well - even
GCC doesn't). Now, in the presence of identifier-syntax, we don't
actually know which of the identifiers are simple variable references,
or which may even expand into further arithmetic expressions, so our
optimization assumptions are off. Worst, we don't know if any of them
are actually side-effecting, so we can't safely do any rewriting at
all.

This is from a real example I wrote a while back. Other examples
include simple optimizations you may want to include in regexp syntax
or
pattern matchers.

So, in effect, by adding identifier-syntax you make _all_ macros less
powerful because they know less about the language they are expanding.

Now, also considering that anything you do with identifier-syntax can
be
trivially implemented with normal syntax by just wrapping the
identifier
in a pair of parenthesis, is it really worth including this as a
required feature of all standard Scheme implementations and
irreversibly
complicating the core language?

--
Alex

Alex Shinn

unread,

May 30, 2007, 10:57:58 PM5/30/07

to

Ray Dillinger wrote:

> Alex Shinn wrote:
>
> > * Bytevector Syntax
> >
> > #vu8(...) reads as a bytevector. Bytevectors themselves are not
> > so controversial, though people disagree on the names and any
> > external representation.
>
> Actually I object to these on the grounds that they
> introduce de facto static typing to scheme. I think that
> type should be an annotation or assertion added to an
> otherwise correct procedure rather than something which
> changes or specifies semantics.

Ideally perhaps yes, but we do want some common ground solution when
working with binary I/O, and plain vectors would just be far too
inefficient in many implementations.

I threw this in because I included all syntactic changes, but this and
the new comment syntax are pretty minor - I don't think anyone is that
opposed to either. But the ever-increasing amount of #foo syntax
sometimes worries me, and if SRFI-10 had been adopted then we could
have
had #,(vu8 ...) portably without altering the reader.

What I forgot to mention was the #!r6rs syntax which is just hideous.

> > * Binary vs. Text port distinction

> > [...]

>
> I think the standard did the right thing, here. You've got to
> have text ports distinct from (or built by layering code on top
> of) binary ports in order to support more than one way of
> reading and writing characters.

Personally I was arguing to make this distinction from the beginning,
and was quite happy when I saw it. I do think the standard should be a
little more clear and say something along the lines of "it is an error
to use a textual operation on a binary port or a binary operation on a
textual port." Or say it's unspecified, or even specify the error.
Right now it just says "a binary port [...] does not support textual
I/O."

--
Alex

William D Clinger

unread,

May 31, 2007, 6:46:55 AM5/31/07

to

Alex Shinn gave us an excellent explanation of why
the R6RS is controversial:

> Below, for the record, I summarize some of the more controversial
> issues people have with R6RS (based on the r5.93rs draft). I
> include both complaints I agree with and those I don't, but have
> undoubtedly missed many and misstated others, so if you have an
> issue not mentioned please reply with it (preferably in summarized,
> non-ranting form if you can restrain yourself).

Here is Alex's paragraph on records:

* Records

The records library is very large and complex, and cannot be
implemented as a portable library.

I would add that, despite its complexity, the
syntactic layer is strictly less general than
the procedural layer, and there are two distinct
failures of interoperability between the two
layers. The editors' rationale for this is
given in their response to Formal Comment #90
( http://www.r6rs.org/formal-comments/comment-90.txt ).

On a more trivial level, I would add:

* disappearance of #\newline

The #\newline character syntax of all previous
reports is to be replaced by the #\linefeed
syntax.

Will

Tom Lord

unread,

May 31, 2007, 2:24:23 PM5/31/07

to

On May 30, 7:18 pm, Alex Shinn <alexsh...@gmail.com> wrote:

> That's only one of the arguments. The real problem is
> that it complicate the semantics of the language.
> Specifically, other macros cannot know when they
> see an identifier if it really is a variable
> reference or not.

That's a good point.

My "R6 counterproposal" (imprecisely stated, though
it is) suggests adding not just fexprs and environments,
but also making the reader extensible. In Pascal's
example, he wanted to bind an identifier like "this.speed"
to an identifier macro that would generate a reference
to an object field rather than to a location made
lexically apparent by lambda (including let, etc.).

An alternative is to do that expansion in the reader,
so that programs might contain "#.speed" which is
read as "(self speed)" -- with "self" defined as an
ordinary macro.

That's a little bit awkward. For example, one would
not expect "(set! #.speed 'full-ahead)" to work since
the set! special form expects a named location
in the first subexpression.

I wonder how Schemer's would feel about getting into
the habit of using

(setf #.new-under-the-sun '())

-t

Pascal Costanza

unread,

May 31, 2007, 2:33:07 PM5/31/07

to

Tom Lord wrote:
> On May 30, 7:18 pm, Alex Shinn <alexsh...@gmail.com> wrote:
>
>> That's only one of the arguments. The real problem is
>> that it complicate the semantics of the language.
>> Specifically, other macros cannot know when they
>> see an identifier if it really is a variable
>> reference or not.
>
> That's a good point.

No, it's not. Those macros also cannot know whether a regular macro is
actually a variable reference or not. The only way out here is to
provide something like macroexpand with which you can check what a
respective form actually expands into. That would be a general solution
because macroexpand could be applied both to identifiers and regular
macro invocations. (That's, at least, the case in Common Lisp.)

Jens Axel Søgaard

unread,

May 31, 2007, 4:28:45 PM5/31/07

to

Tom Lord wrote:

> That's a little bit awkward. For example, one would
> not expect "(set! #.speed 'full-ahead)" to work since
> the set! special form expects a named location
> in the first subexpression.

If foo is an identifier macro, then it is the responsibility
of foo to make sure (set! foo ...) works.

The example 12.4 shows what to do:

(define p (cons 4 5))
(define-syntax p.car
(make-variable-transformer
(lambda (x)
(syntax-case x (set!)
[(set! _ e) #’(set-car! p e)]
[(_ . rest) #’((car p) . rest)]
[_ #’(car p)]))))
(set! p.car 15)
p.car => 15
p => (15 5)

Identifier macros is not a new invention. They have been
in the various syntax-case systems for a long time, and
to my knowledge haven't caused any problems.

--
Jens Axel Søgaard

Ray Dillinger

unread,

May 31, 2007, 7:28:08 PM5/31/07

to

Alex Shinn wrote:
> Pascal Costanza wrote:

>>(define-method print <person> ()
>> (display this.name) (newline)
>> (display this.address) (newline))

>>Here, this.name and this.address are supposedly taken from the implicit
>>this argument for such a method. This is not easily expressible without
>>identifier syntax.

> That's only one of the arguments. The real problem is that it

> complicate the semantics of the language. Specifically, other macros
> cannot know when they see an identifier if it really is a variable
> reference or not.

The only real solution for this is to partition the set
of identifiers into mutually exclusive "base" and "extended"
identifiers, where base identifiers are literal expressions
in themselves, and extended identifiers require identifier-
syntax macros to transform into expressions.

For example, if our base identifiers could not naturally
contain the period or dot character, then the reader could
know, on reading an extended identifier like "this.name,"
that it was not looking at a base identifier - and then
check it against its identifier macro patterns.

Indeed, this is something like what readers now do with the
prefix octothorpe. The octothorpe means it cannot be read
as a base identifier, so the reader has to look at its other
definitions. The proposal amounts to expanding the number
of ways something can be marked as "not a base identifier"
and telling the reader what to do about it.

But, there is still some controversy in my mind about it.
It does not extend language semantics at all, so it is in
some sense "trivial" and "unnecessary."

Also, murk arising from ambiguity in infix syntax-marker
expansion would have to be clarified before this would be
a hard enough proposal for a programming language. Consider
for example if some well-meaning person defines FOO-BAR as
identifier syntax for (- FOO BAR). Now what is the reader
to make of FOO-BAR-BAZ or similar? The result is ambiguous
depending on which *instance* of "-" the macroexpander
expands first.

The abbreviations could be very handy, but would the
resulting language have the clarity that is the virtue of
scheme?

Bear

Alex Shinn

unread,

May 31, 2007, 9:41:53 PM5/31/07

to

Pascal Costanza wrote:
> Tom Lord wrote:
> > On May 30, 7:18 pm, Alex Shinn <alexsh...@gmail.com> wrote:
> >
> >> That's only one of the arguments. The real problem is
> >> that it complicate the semantics of the language.
> >> Specifically, other macros cannot know when they
> >> see an identifier if it really is a variable
> >> reference or not.
> >
> > That's a good point.
>
> No, it's not. Those macros also cannot know whether a regular macro is
> actually a variable reference or not.

You misunderstand, I'm talking about those cases where you see a
lone identifier in an evaluated position. Currently this couldn't
possibly be anything other than a variable reference, but with
identifier-syntax you don't even know that anymore.

A simpler example:

(cond (foo (bar))
(foo (baz)))

Right now COND could provide a warning that the second branch is
unreachable. With identifier-syntax it can't.

Also, it's important to understand that identifier-syntax doesn't
make the language any more expressive. All it does shave off a pair
of parens, letting you write this.name instead of (this.name)

But I'm not interested in arguing the pros and cons of all these new
features. Yes, obviously every new feature has uses, and many of
them have been used previously in other languages or Schemes. But
they involve tradeoffs, and people should stop and think seriously
before adopting those tradeoffs into the core Scheme standard.

--
Alex

samth

unread,

Jun 1, 2007, 9:34:00 AM6/1/07

to

On May 31, 9:41 pm, Alex Shinn <alexsh...@gmail.com> wrote:
> Pascal Costanza wrote:
> > Tom Lord wrote:
> > > On May 30, 7:18 pm, Alex Shinn <alexsh...@gmail.com> wrote:
>
> > >> That's only one of the arguments. The real problem is
> > >> that it complicate the semantics of the language.
> > >> Specifically, other macros cannot know when they
> > >> see an identifier if it really is a variable
> > >> reference or not.
>
> > > That's a good point.
>
> > No, it's not. Those macros also cannot know whether a regular macro is
> > actually a variable reference or not.
>
> You misunderstand, I'm talking about those cases where you see a
> lone identifier in an evaluated position. Currently this couldn't
> possibly be anything other than a variable reference, but with
> identifier-syntax you don't even know that anymore.
>
> A simpler example:
>
> (cond (foo (bar))
> (foo (baz)))
>
> Right now COND could provide a warning that the second branch is
> unreachable. With identifier-syntax it can't.

But the same argument applies to macros, and to procedures. If foo is
a macro, then we can't do this warning:

(cond [(foo) (bar)]
[(foo) (baz)])

Because the foo's might expand differently.

But even if foo was a procedure, we're still out of luck, because it
might refer to mutable state, and return different values each time.

>
> Also, it's important to understand that identifier-syntax doesn't
> make the language any more expressive. All it does shave off a pair
> of parens, letting you write this.name instead of (this.name)

This is not the case. All systems that have identifier macros (that I
know of)
allow you to define the identifier foo such that

(set! foo bar)

does the 'appropriate' thing. This can't be done with other features
of the
language (short of rebinding set!).

sam th

wayo.c...@gmail.com

unread,

Jun 1, 2007, 4:28:28 PM6/1/07

to

Ray Blaak wrote:

> What happens when you do:
>
> (list (make-point 1 2 3).x)

Yeah... that's where things break down.

Jens Axel Søgaard

unread,

Jun 1, 2007, 5:00:30 PM6/1/07

to

Ray Blaak skrev:

> "wayo.c...@gmail.com" <wayo.c...@gmail.com> writes:
>> (define p (make-point 10 20 30))
>>
>> (with-point p)
>>
>> (list p.x p.y p.z)
>>
>> ;; expands to
>>
>> (list (point-x p) (point-y p) (point-z p))
>
> What happens when you do:
>
> (list (make-point 1 2 3).x)

You get a "reference to undefined identifier: .x" error.

The macro call (with-point p) binds the
three identifiers p.x p.y and p.z to identifier macros.

The identifier macro p.x expands to, say, (point-x p).

In

(make-point 1 2 3).x

the reader will give read the first sexpr (make-point 1 2 3).
And then read the identifier .x . Since .x is unbound,
you get an "reference to undefined identifier: .x" error.

--
Jens Axel Søgaard

Ray Blaak

unread,

Jun 2, 2007, 2:02:55 PM6/2/07

to

Jens Axel Søgaard <use...@soegaard.net> writes:
> Ray Blaak skrev:

> > What happens when you do:
> > (list (make-point 1 2 3).x)
>
> You get a "reference to undefined identifier: .x" error.
>
> The macro call (with-point p) binds the
> three identifiers p.x p.y and p.z to identifier macros.

Well, exactly. I guess may main point here is to show my dissatisfication with
these kind of approaches for pretending scheme can have an infix field
selection notation for record-like values.

We either build it in properly to scheme, or we do it the "proper" sexpr way.

So, what are the ways it can be properly done, and what are the tradeoffs?

E.g.

(point-x p)
(field p x)

The last is prehaps more general.

Jens Axel Søgaard

unread,

Jun 3, 2007, 9:52:40 AM6/3/07

to

Ray Blaak skrev:

> Jens Axel Søgaard <use...@soegaard.net> writes:
>> Ray Blaak skrev:
>>> What happens when you do:
>>> (list (make-point 1 2 3).x)
>> You get a "reference to undefined identifier: .x" error.
>>
>> The macro call (with-point p) binds the
>> three identifiers p.x p.y and p.z to identifier macros.
>
> Well, exactly. I guess may main point here is to show my dissatisfication with
> these kind of approaches for pretending scheme can have an infix field
> selection notation for record-like values.

I was of the impression that the main theme of discussion were
identifer macros in general and not a specific application of them.

It is not clear to me whether your argument is:

"infix field notation is a bad idea"
therefore "identifier macros are a bad idea".

If so, I can't help compare it to the classical argument against
macros:

"It is possible to write bad macros, hence they are a bad idea."

Of course identifier macros can be used to write unreadable code.
Of course macros can be used to write unreadable code.

--
Jens Axel Søgaard

Jens Axel Søgaard

unread,

Jun 3, 2007, 10:17:17 AM6/3/07

to

Alex Shinn wrote:

> * IDENTIFIER-SYNTAX
>
> Identifier syntax means that the macro system can expand a single
> identifier even when not the first symbol in an expression. Thus
> when you see an identifier, it may not actually be a real variable
> reference, which can be confusing both for humans and other macros
> which want to analyze code. It is a substantial complication to
> the semantics of the language, with arguable benefits.

Two points:

1) "Thus when you see an identifier, it may not actually be a real

variable reference, which can be confusing both for humans and
other macros which want to analyze code."

This would make a fine first paragraph in the "Identifier Macro
Style Guide". There is one thing that identifier macros are perfect
for: The implementation of various types of variable references.

Today, if you write a macro that doesn't play a fair game (say introduce
variables unhygienicly) you blame not macros, but the author of the
macro. Same thing applies to indentifier macros.

2) "It is a substantial complication to the semantics of the language,
with arguable benefits."

Why do you see this as a "substantial complication"?
After the macro expansion phase any use of identifier macros
are gone - so the standard semantics apply.

Is the macro expansion process more complicated?
Very little.

PS: I hope you don't find this post "extremely defensive".
Although you didn't say identifier macros are bad,
you did write "substantial complication", which
in my book is worse than bad.

--
Jens Axel Søgaard

William D Clinger

unread,

Jun 3, 2007, 10:37:00 AM6/3/07

to

Jens Axel Søgaard quoting Alex Shinn:

> 2) "It is a substantial complication to the semantics of the language,
> with arguable benefits."
>
> Why do you see this as a "substantial complication"?

The problem with identifier macros is that they
break a specific invariant on which R5RS can rely,
and on which some R5RS macros have relied:

================================================================
R5RS invariant:

A reference to an identifier has no side effects.
================================================================

With identifier macros as proposed for the R6RS, this
R5RS-enforced invariant becomes a matter of programming
style, which I can only hope programmers will continue
to follow. Continuing to use an R5RS hygienic macro
that depends on identifier references having no side
effects would therefore become a matter of hope in the
R6RS.

That seems odd, given the R6RS predilection for
mandating the enforcement of stylistic issues that
matter considerably less than whether references can
have side effects. Identifier macros introduce an
entirely new way for programmers to break macros
that, under the R5RS, were correct in all contexts.

It is not unreasonable to ask whether the benefits
of identifier macros justify their costs. In my
opinion, they don't.

Will

Pascal Costanza

unread,

Jun 3, 2007, 11:33:11 AM6/3/07

to

William D Clinger wrote:
> Jens Axel Søgaard quoting Alex Shinn:
>> 2) "It is a substantial complication to the semantics of the language,
>> with arguable benefits."
>>
>> Why do you see this as a "substantial complication"?
>
> The problem with identifier macros is that they
> break a specific invariant on which R5RS can rely,
> and on which some R5RS macros have relied:
>
> ================================================================
> R5RS invariant:
>
> A reference to an identifier has no side effects.
> ================================================================
>
> With identifier macros as proposed for the R6RS, this
> R5RS-enforced invariant becomes a matter of programming
> style, which I can only hope programmers will continue
> to follow. Continuing to use an R5RS hygienic macro
> that depends on identifier references having no side
> effects would therefore become a matter of hope in the
> R6RS.

Are you saying that such macros cannot be expressed anymore? Or better
that they cannot give such strong guarantees anymore? Why wouldn't it be
possible to solve such issues with more programming?

What would a good example for such a macro?

William D Clinger

unread,

Jun 3, 2007, 2:26:04 PM6/3/07

to

Pascal Costanza wrote:
> Are you saying that such macros cannot be expressed anymore?

No.

> Or better that they cannot give such strong guarantees anymore?

No. Although that may be true, it wasn't what I was saying.

> Why wouldn't it be possible to solve such issues with more programming?

It might be. That misses the point.

Expressibility is not the only, or even the primary,
criterion by which to judge a programming language.

Ditto for strength of guarantees.

My point is that the identifier macros that have
been proposed for R6RS interfere with the macro
system of the R5RS in specific ways. This would
break backward compatibility, most obviously, but
it would also make a certain class of problems
more difficult to solve: there are macros that,
under the R5RS guarantees, could be written using
syntax-rules, but would require syntax-case in
R6RS even if they could be written at all.

If simplicity is one of the criteria for judging
a programming language, then a proposed feature
that makes certain things more difficult than
they have been in the past should be questioned.

> What would a good example for such a macro?

Here's an example. It probably isn't a good
one, but it is certainly an example of an idiom
that has been fairly common in R5RS macros, but
would be broken by the addition of identifier
macros.

; My homemade range-checking macro that guarantees:
; left-to-right evaluation
; evaluated of expressions occurs before any range checking
;
; Syntax:
; (test-range <expr1> <op1> <expr2> <op2> <expr3>)
; where
; <expr1>, <expr2>, and <expr3> are arbitrary expressions
; <op1> and <op2> must be the identifiers < or <=
;
; Because many people still use interpreters or
; simple compilers that don't do copy propagation,
; this macro is careful not to introduce unnecessary
; let bindings for constants or simple variables.

(define-syntax test-range
(syntax-rules (< <=)
((_ (?expr ...) . ?rest)
(let ((lo (?expr ...)))
(test-range lo . ?rest)))

((_ lo < (?expr ...) . ?rest)
(let ((i (?expr ...)))
(test-range lo < i . ?rest)))
((_ lo <= (?expr ...) . ?rest)
(let ((i (?expr ...)))
(test-range lo < i . ?rest)))

((_ lo < i < (?expr ...))
(let ((hi (?expr ...)))
(test-range lo < i < hi)))
((_ lo < i <= (?expr ...))
(let ((hi (?expr ...)))
(test-range lo < i <= hi)))
((_ lo <= i < (?expr ...))
(let ((hi (?expr ...)))
(test-range lo <= i < hi)))
((_ lo <= i <= (?expr ...))
(let ((hi (?expr ...)))
(test-range lo <= i <= hi)))

((_ lo < i < hi)
(and (< lo i) (< i hi)))
((_ lo < i <= hi)
(and (< lo i) (<= i hi)))
((_ lo <= i < hi)
(and (<= lo i) (< i hi)))
((_ lo <= i <= hi)
(and (<= lo i) (<= i hi)))))

Will

Alex Shinn

unread,

Jun 3, 2007, 9:28:31 PM6/3/07

to

Jens Axel Søgaard wrote:
>
> Although you didn't say identifier macros are bad,
> you did write "substantial complication", which
> in my book is worse than bad.

The adjective is in the eye of the beholder. It breaks some of my
code,
and not in a trivial "darn, I need to fix this for R6RS" sort of way,
but in a fundamental way - I can no longer write that kind of macro at
all. So I wrote "substantial." The point I was trying to make - that
identifier-syntax warrants some serious thought before inclusion in the
standard - would hold just as much with an adjective such as "not
insignificant."

At any rate, that doesn't necessarily make it "bad." An even greater
complication to the language is the introduction of unhygienic macros,
but I didn't list that as an issue because to my knowledge just about
everyone wants these, and every implementation that supports any kind
of
macros has unhygienic macros. The choice of syntax-case is slightly
more controversial, but since it's been relegated to a library and I'm
pretty sure it can be implemented on top of either explicit renaming or
syntactic-closures, I didn't bother to mention it.

--
Alex

Alex Shinn

unread,

Jun 3, 2007, 9:35:17 PM6/3/07

to

samth wrote:
> On May 31, 9:41 pm, Alex Shinn <alexsh...@gmail.com> wrote:
> >
> > A simpler example:
> >
> > (cond (foo (bar))
> > (foo (baz)))
> >
> > Right now COND could provide a warning that the second branch is
> > unreachable. With identifier-syntax it can't.
>
> But the same argument applies to macros, and to procedures. If foo is
> a macro, then we can't do this warning:
>
> (cond [(foo) (bar)]
> [(foo) (baz)])

One could make the observation that a simple, old-fashioned screwdriver
has some advantages over an electric screwdriver. For example, the
long
neck and sturdy construction allows you to use it to pry open things
like paint cans. To then counter with the claim that there are things
too difficult for even the simple screwdriver to pry open would not
make
the original point any less valid.

--
Alex

William D Clinger

unread,

Jun 3, 2007, 10:12:42 PM6/3/07

to

I apologize for posting an incorrect macro.
This version is simpler and closer to being
correct. The point of this example is that
it illustrates an idiom that works reliably
with R5RS macros, but would become unreliable
if identifier macros were added to the language
(as proposed for R6RS).

; My homemade range-checking macro that guarantees:
; left-to-right evaluation

; evaluation of expressions occurs before range checking

;
; Syntax:
; (test-range <expr1> <op1> <expr2> <op2> <expr3>)
; where
; <expr1>, <expr2>, and <expr3> are arbitrary expressions
; <op1> and <op2> must be the identifiers < or <=
;
; Because many people still use interpreters or
; simple compilers that don't do copy propagation,
; this macro is careful not to introduce unnecessary
; let bindings for constants or simple variables.

(define-syntax test-range
(syntax-rules (< <=)

((_ ?lo ?op1 ?i ?op2 (?expr ...))
(let* ((lo ?lo)
(i ?i)
(hi (?expr ...)))
(test-range lo ?op1 i ?op2 hi)))
((_ ?lo ?op1 (?expr ...) ?op2 ?hi)
(let* ((lo ?lo)
(i (?expr ...)))
(test-range lo ?op1 i ?op2 ?hi)))

; ?i and ?hi must be constants or variables

((_ ?lo < ?i < ?hi)
(and (< ?lo ?i) (< ?i ?hi)))
((_ ?lo < ?i <= ?hi)
(and (< ?lo ?i) (<= ?i ?hi)))
((_ ?lo <= ?i < ?hi)
(and (<= ?lo ?i) (< ?i ?hi)))
((_ ?lo <= ?i <= ?hi)
(and (<= ?lo ?i) (<= ?i ?hi)))))

Will

Ray Blaak

unread,

Jun 4, 2007, 12:58:20 AM6/4/07

to

Jens Axel Søgaard <use...@soegaard.net> writes:
> It is not clear to me whether your argument is:
>
> "infix field notation is a bad idea"
> therefore "identifier macros are a bad idea".

No, no nothing like that. Identifier macros are fine.

It is simply that attempts at p.x macros and the like are a pet peeve of mine:
they attempt to make it seem that one can have field accessors, but it is an
illusion as any slightly more complex expression involving them shows.

I actually would not be against the experiment of truly implementing infix
field accessors in scheme, just to see if it can work decently with the scheme
syntax style.

Jens Axel Søgaard

unread,

Jun 4, 2007, 4:14:56 AM6/4/07

to

Ray Blaak wrote:
> Jens Axel Søgaard <use...@soegaard.net> writes:
>> It is not clear to me whether your argument is:
>>
>> "infix field notation is a bad idea"
>> therefore "identifier macros are a bad idea".
>
> No, no nothing like that. Identifier macros are fine.
>
> It is simply that attempts at p.x macros and the like are a pet peeve of mine:
> they attempt to make it seem that one can have field accessors, but it is an
> illusion as any slightly more complex expression involving them shows.

That makes sense.

> I actually would not be against the experiment of truly implementing infix
> field accessors in scheme, just to see if it can work decently with the scheme
> syntax style.

Something in same style as quote and and quasiquote is is handled?

The reader sees (foo bar).(baz quz)
and returns (dot (foo bar) (baz quz))

The tricky part is that . is already present in the literal syntax for
pairs, and allowed as a character in identifiers.

It's worth an experiment in an implementation with a configurable
reader.

--
Jens Axel Søgaard

Jens Axel Søgaard

unread,

Jun 4, 2007, 4:52:44 AM6/4/07

to

William D Clinger wrote:
> Jens Axel Søgaard quoting Alex Shinn:
>> 2) "It is a substantial complication to the semantics of the language,
>> with arguable benefits."
>>
>> Why do you see this as a "substantial complication"?
>
> The problem with identifier macros is that they
> break a specific invariant on which R5RS can rely,
> and on which some R5RS macros have relied:
>
> ================================================================
> R5RS invariant:
>
> A reference to an identifier has no side effects.
> ================================================================
>
> With identifier macros as proposed for the R6RS, this
> R5RS-enforced invariant becomes a matter of programming
> style, which I can only hope programmers will continue
> to follow. Continuing to use an R5RS hygienic macro
> that depends on identifier references having no side
> effects would therefore become a matter of hope in the
> R6RS.

Agree. The thing is, I am not too worried about
programmers breaking this invariant - based on the
experiences with identifier macros in the current
syntax-case implementations.

> That seems odd, given the R6RS predilection for
> mandating the enforcement of stylistic issues that
> matter considerably less than whether references can
> have side effects. Identifier macros introduce an
> entirely new way for programmers to break macros
> that, under the R5RS, were correct in all contexts.

Your argument makes perfect sense. For some reason
I can't put my finger on, I still think identifier
macros should be included. They at least allow
the user to experiment with some interesting
concepts:

- use-once variables (aka linear, unshared, ...)
- identifier aliases
- logging on set!
- persistence with non-intrusive syntax
- seemless integration variables <-> db
(the last may be a bad idea)
- Knuth's solve-equations-on-reference idea (METAFONT)

--
Jens Axel Søgaard

Pascal Costanza

unread,

Jun 4, 2007, 5:25:03 AM6/4/07

to

William D Clinger wrote:

> My point is that the identifier macros that have
> been proposed for R6RS interfere with the macro
> system of the R5RS in specific ways. This would
> break backward compatibility, most obviously, but
> it would also make a certain class of problems
> more difficult to solve: there are macros that,
> under the R5RS guarantees, could be written using
> syntax-rules, but would require syntax-case in
> R6RS even if they could be written at all.
>
> If simplicity is one of the criteria for judging
> a programming language, then a proposed feature
> that makes certain things more difficult than
> they have been in the past should be questioned.

Thanks a lot for the very clear explanation.

Matthew Flatt

unread,

Jun 4, 2007, 6:12:26 AM6/4/07

to

On Jun 4, 10:12 am, William D Clinger <cesur...@yahoo.com> wrote:
> ; My homemade range-checking macro that guarantees:
> ; left-to-right evaluation
> ; evaluation of expressions occurs before range checking
> ;
> ; Syntax:
> ; (test-range <expr1> <op1> <expr2> <op2> <expr3>)
> ; where
> ; <expr1>, <expr2>, and <expr3> are arbitrary expressions
> ; <op1> and <op2> must be the identifiers < or <=

In

(letrec ([x (test-range 1 < 0 < x)])
x)

is #f the result that you intended? I naively expected
this to be an error, like

(letrec ([x (< 1 0 x)])
x)

but I'm often very naive, indeed, on this sort of point.

Thanks,
Matthew

William D Clinger

unread,

Jun 4, 2007, 7:31:43 AM6/4/07

to

Matthew Flatt pointed out that my example macro
still doesn't satisfy its specification:

> In
>
> (letrec ([x (test-range 1 < 0 < x)])
> x)
>
> is #f the result that you intended?

Good point. Please revise the specification and the
macro itself for left-to-right short-circuiting, which
will simplify it further by omitting the first rule.

This was just an example I whipped up to illustrate
the idiom. I know of real-world examples, but they
are more complicated or would require explanation
of their context.

Will

Ray Dillinger

unread,

Jun 4, 2007, 10:28:20 AM6/4/07

to

Pascal Costanza wrote:
> William D Clinger wrote:
>
>> My point is that the identifier macros that have
>> been proposed for R6RS interfere with the macro
>> system of the R5RS in specific ways. This would
>> break backward compatibility, most obviously, but
>> it would also make a certain class of problems
>> more difficult to solve: there are macros that,
>> under the R5RS guarantees, could be written using
>> syntax-rules, but would require syntax-case in
>> R6RS even if they could be written at all.
>>
>> If simplicity is one of the criteria for judging
>> a programming language, then a proposed feature
>> that makes certain things more difficult than
>> they have been in the past should be questioned.
>
>
> Thanks a lot for the very clear explanation.
>

I'm going to repeat something I said but I don't
know if anybody actually read:

This confusion goes away if you partition the
set of identifiers into "base" and "extended"
where an "extended" identifier can only be
created by a syntax macro.

That is, if we want identifier syntax with infix
periods, we have to give up infix periods in our
base identifiers.

This allows the reader to know which is which,
and makes possible macro-writing that doesn't
suffer from the distinction. This implies
(and enables) all the identifier syntax to be
expanded before other macros are expanded.

Bear

Ray Blaak

unread,

Jun 4, 2007, 12:40:35 PM6/4/07

to

Ray Dillinger <be...@sonic.net> writes:
> This allows the reader to know which is which,
> and makes possible macro-writing that doesn't
> suffer from the distinction. This implies
> (and enables) all the identifier syntax to be
> expanded before other macros are expanded.

I don't understand something. Doesn't a macro control the interpretation of
everyting in it's expression?

E.g. (case ... else ...) interprets the else, not 'else on its own, right?

Or is it the case that identifier macros are expanded before the enclosing
macro? That seems completely backwards to me, and completely breaks any those
macros that introduce their own keywords into their expressions, i.e. te
"little languages" so to speak.

Would this break Olin Shiver's regex macros, for example?

It seems to me that macros should have full control of their content, and
identifier macros should only be interpreted *after* everything else, if they
are not otherwise consumed.

Alex Shinn

unread,

Jun 4, 2007, 10:31:19 PM6/4/07

to

Alex Shinn wrote:
>
> As an example, consider a fast-math macro:
>
> (fast-math (+ (* a b) (* a c)))
> => (* a (+ b c))

For those curious, I've made this macro publicly available at

http://synthcode.com/scheme/fast-math.scm

It uses the term-optimizer library from

http://synthcode.com/scheme/term-optimizer.scm

You'll have to excuse the poor quality of that code - it was
originally written in one day as a throw-away script for someone
who needed to optimize a C program. You can test the original
expression used with:

(define (f1 A B C D E F G H I J)
(fast-math
(/ (+ (* -1 D D F F) (* 2 C D F G) (* -1 C C G G) (* D D E H)
(* -2 B D G H) (* A G G H) (* -2 C D E I) (* 2 B D F I)
(* 2 B C G I) (* -2 A F G I) (* -1 B B I I) (* A E I I)
(* C C E J) (* -2 B C F J) (* A F F J) (* B B H J)
(* -1 A E H J))
(+ (* C C E) (* -2 B C F) (* A F F) (* B B H) (* -1 A E H)))))

This made the original C program about twice as fast, if I recall
correctly. A quick test in Chicken shows about a 30% speed
improvement. It doesn't do a complete search of the problem space
(which is exponentially large) and it's too expensive for
compilers to perform these operations in general.

The code works on optimizing the four arithmetic operators +, -, *
and /. It matches these hygienically (using the syntactic
closures macro system), so it won't break if you shadow any of
these, though you could match unhygienically if you wanted
(there's also a defmacro version commented out at the end of the
file). Any expression in parenthesis other than those four
operators will be bound once to a temporary variable. So using
the macro is simple and safe (modulo any of the undoubtedly many
bugs in the code) - anywhere you want to optimize an arithmetic
expression, just wrap it in fast-math (and be *very* patient while
it macro-expands).

--
Alex

Michael Sperber

unread,

Jun 15, 2007, 7:58:31 AM6/15/07

to

Alex Shinn <alex...@gmail.com> writes:

> Jens Axel Søgaard wrote:
>>
>> Although you didn't say identifier macros are bad,
>> you did write "substantial complication", which
>> in my book is worse than bad.
>
> The adjective is in the eye of the beholder. It breaks some of my
> code, and not in a trivial "darn, I need to fix this for R6RS" sort of
> way, but in a fundamental way - I can no longer write that kind of
> macro at all.

Folks - if you want something to change wrt. `identifier-syntax', you
need to submit a formal comment. (And do it soon.) Please note that
`identifier-syntax' was moved *into* the base library as a response to a
formal comment. (#114)

--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla