Order of primitive cons

skynet

unread,

Nov 16, 2000, 3:00:00 AM11/16/00

to

Hi,

there is something about cons that I do not get.
In my example below I defined p and q are being defines, both as being a
pair.

I discovered that changing the order of the attributes of the cons did
strange things.

For me the evaluation of q seems the most logic: two pair, one is (0 . 1)
and 2, both separated by a dot. Why isn't this the case in p? In p the pairs
are 0 and 1 . 2, but why are the brackets around the second pair and the dot
omitted?

+++++ [edscheme] +++++++++++
(define p (cons 0 (cons 1 2)))

>p
=>(0 1 . 2)

(define q (cons (cons 0 1) 2))
(0 1 . 2)

>q
=>((0 . 1) . 2)
+++++++++++++++++++++++++++

thanks a lot,
Vincent

Richard Cobbe

unread,

Nov 16, 2000, 3:00:00 AM11/16/00

to

"skynet" <v...@namahn.com> writes:

It's basically a shortcut that the Scheme printer uses to make lists
more tractable.

Let A and B be any arbitrary Scheme values.

(define C (cons A B))

In general, C's value will be printed

(A . B)

However, there are a number of exceptions:

* If B is the empty list, then C's value will be printed as

(A)

* If B is itself the value of another cons expression (as in your value for
p above), then we omit the dot and remove the parentheses around B:

(A B)

It's primarily so that printing lists works out neatly:

(cons 1 (cons 2 (cons 3 '())))

or, equivalently,

(list 1 2 3)

will print as

(1 2 3)

rather than the more verbose

(1 . (2 . (3 . ())))

The two written representations, however, are equivalent. (In addition,
you can type either one at the scheme prompt---quoted, of course---and
you'll get the same result.)

It's a little confusing when it's written out like this, yes, but most
people like it---it sure beats writing code like

(define . (q . ((cons . ((cons . (0 . (1 . ()))) . (2 . ()))) . ())))

(And, yes, that actually does work.)

Richard

Jacob J. A. Koot

unread,

Nov 16, 2000, 3:00:00 AM11/16/00

to

skynet <v...@namahn.com> schreef in berichtnieuws
8v0gpf$aot$1...@news0.skynet.be...

> Hi,
>
> there is something about cons that I do not get.
> In my example below I defined p and q are being defines, both as being a
> pair.
>

> I discovered that changing the order of the *attributes*

These are usually called *arguments*

> of the cons did strange things.
> For me the evaluation of q seems the most logic: two pair, one is (0 . 1)
> and 2, both separated by a dot. Why isn't this the case in p? In p the
pairs
> are 0 and 1 . 2, but why are the brackets around the second pair and the
dot
> omitted?
>
> +++++ [edscheme] +++++++++++
> (define p (cons 0 (cons 1 2)))
>
> >p
> =>(0 1 . 2)

This is just another notation for (0 . (1 . 2)).
Call it shorthand if you like. See section 6.3.2 of R5RS.

Regards, Jacob J. A. Koot

Lauri Alanko

unread,

Nov 16, 2000, 3:00:00 AM11/16/00

to

In article <871ywcx...@minbar.directlink.net>,

Richard Cobbe <co...@directlink.net> wrote:
>It's a little confusing when it's written out like this, yes, but most
>people like it---it sure beats writing code like
>
>(define . (q . ((cons . ((cons . (0 . (1 . ()))) . (2 . ()))) . ())))
>
>(And, yes, that actually does work.)

Maybe on your implementation. However, in R5RS the syntax for expressions is
not exactly the same as the syntax for external representations.

For instance:

<definition> ---> (define <variable> <expression>)
| (define (<variable> <def formals>) <body>)
| (begin <definition>*)

<list> ---> (<datum>*) | (<datum>+ . <datum>) | <abbreviation>

So, an external representation of a list can have a dot, but a definition in
a program can not.

Yes, it is confusing. The fact that every well-formed scheme expression is
also a well-formed external representation (that can even be read), doesn't
help to make things clearer. However, the program (expressions) and the data
therein (literals) definitely _are_ different beasts, more so in scheme than
in common lisp, I think.

Say, how about this?

q := cons(cons(0, 1), 2);

This is still essentially scheme, IMHO. Just an alternative syntax for
(define q (cons (cons 0 1) 2)). Might make the distinction clearer for some
people.

Other ideas (no, I'm not advocating these, just food for thought):

(begin a b c ..) -> { a ; b ; c .. }

(set! a b) -> a = b

(lambda (a b ..) c d ..) -> fun(a, b ..) { c ; d .. }

Et cetera... The point here is that this is just a _syntactic_
transformation, the expressive power of the language doesn't go anywhere.
It's still scheme.

Of course, the _real_ reason that the syntax of scheme's expressions is so
similar to the syntax of its literals is that it makes implementing eval so
much nicer. No need to write a dedicated parser, read will do just as
well...

Naturally, before R5RS eval wasn't "standard", but everyone used it
nevertheless.

I guess I got sort of carried away. I just wanted to emphasize that just as
in C, the string literal "1 + 2" is _worlds_ away from the expression 1 + 2,
and in a similar fashion, in scheme, '(+ 1 2) is a completely different
thing from (+ 1 2), and it's only the very magical (and, until recently,
optional) gadget called "eval" that takes you from one to the other. It took
me some while to learn, at least.

Lauri Alanko
l...@iki.fi

Joe Marshall

unread,

Nov 16, 2000, 3:00:00 AM11/16/00

to

Lauri Alanko <l...@iki.fi> writes:

> In article <871ywcx...@minbar.directlink.net>,
> Richard Cobbe <co...@directlink.net> wrote:
> >It's a little confusing when it's written out like this, yes, but most
> >people like it---it sure beats writing code like
> >
> >(define . (q . ((cons . ((cons . (0 . (1 . ()))) . (2 . ()))) . ())))
> >
> >(And, yes, that actually does work.)
>
> Maybe on your implementation. However, in R5RS the syntax for expressions is
> not exactly the same as the syntax for external representations.
>
> For instance:
>
> <definition> ---> (define <variable> <expression>)
> | (define (<variable> <def formals>) <body>)
> | (begin <definition>*)
>
> <list> ---> (<datum>*) | (<datum>+ . <datum>) | <abbreviation>
>
> So, an external representation of a list can have a dot, but a definition in
> a program can not.
>
> Yes, it is confusing. The fact that every well-formed scheme expression is
> also a well-formed external representation (that can even be read), doesn't
> help to make things clearer. However, the program (expressions) and the data
> therein (literals) definitely _are_ different beasts, more so in scheme than
> in common lisp, I think.

This depends on how you read the production rules in syntax section.
If you assume that the right-hand side of the rules is referring to
the printed representation, then you are correct.

However, if the right-hand side of the rules is *using* the printed
representation of a data structure as a shorthand, and that data
structure is constructed by READ, then

(define . (q . ((cons . ((cons . (0 . (1 . ()))) . (2 . ()))) . ())))

would indeed be a legitimate printed representation of an object that
was syntactically a definition.

I have always assumed that the later case holds. My reasons are
tradition (historically, lisp has always used READ to read the source
code before EVAL), implementation (I've never seen a lisp that did not
use READ to read the source code), and intuition (my belief is that
the R5RS authors intended that READ be used to read source code).

Also, note that the section `lexical structure', which concerns itself
with how a sequence of characters is interpreted as the printed
representation of objects, is separate from the `programs and
definitions' section, which I believe concerns itself with how lists
(a data structure with many legitimate printed representations) are
interpreted as program elements.

-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 80,000 Newsgroups - 16 Different Servers! =-----

Martin Rodgers

unread,

Nov 17, 2000, 2:58:44 AM11/17/00

to

Voice in the desert: Quiet, isn't it, Joe Marshall?

> However, if the right-hand side of the rules is *using* the printed
> representation of a data structure as a shorthand, and that data
> structure is constructed by READ, then
>
> (define . (q . ((cons . ((cons . (0 . (1 . ()))) . (2 . ()))) . ())))
>
> would indeed be a legitimate printed representation of an object that
> was syntactically a definition.

I've just tried that:

Gambit Version 3.0

> '(define . (q . ((cons . ((cons . (0 . (1 . ()))) . (2 . ()))) . ())))

(define q (cons (cons 0 1) 2))

> (define . (q . ((cons . ((cons . (0 . (1 . ()))) . (2 . ()))) . ())))

*** ERROR IN (stdin)@2.1 -- Ill-formed special form: define
>

Hmm. I understand why this happens, but it screws this example.
--
Email address intentially munged | You can never browse enough
will write code that writes code that writes code for food
<URL:http://www.wildcard.demon.co.uk>

Richard Cobbe

unread,

Nov 17, 2000, 3:00:00 AM11/17/00

to

Lauri Alanko <l...@iki.fi> writes:

> In article <871ywcx...@minbar.directlink.net>,
> Richard Cobbe <co...@directlink.net> wrote:
> >It's a little confusing when it's written out like this, yes, but most
> >people like it---it sure beats writing code like
> >

> >(define . (q . ((cons . ((cons . (0 . (1 . ()))) . (2 . ()))) . ())))
> >

> >(And, yes, that actually does work.)
>
> Maybe on your implementation.

True. I should have said that I tried that with MzScheme v103, and it does
in fact work. However, this lack of portability across scheme
implementations should not be a cause for concern, because nobody should
actually write code like that in real life.

Richard

Kellom{ki Pertti

unread,

Nov 17, 2000, 3:00:00 AM11/17/00

to

Richard Cobbe <co...@directlink.net> writes:
> > >(define . (q . ((cons . ((cons . (0 . (1 . ()))) . (2 . ()))) . ())))

> However, this lack of portability across scheme
> implementations should not be a cause for concern, because nobody should
> actually write code like that in real life.

Just for the record, the Revised^5 Report on Scheme does not
allow for programs in the above form. For example, the formal syntax
for a definition is

<definition> ---> (define <variable> <expression>)
| (define (<variable> <def formals>) <body>)
| (begin <definition>*)

which does not permit a dot after define. I guess all of the Scheme
reports have deliberately made program syntax a subset of syntax
allowed by read and friends (too lazy to check). Many implementations
use the Scheme read procedure to read in program text, with the
obvious implications on what they accept.

This has actually been discussed here fairly recently.
--
Pertti Kellom\"aki, Tampere Univ. of Technology, Software Systems Lab

Joe Marshall

unread,

Nov 17, 2000, 3:00:00 AM11/17/00

to

Martin Rodgers <m...@thiswildcardaddressintentiallyleftmunged.demon.co.uk> writes:

> Voice in the desert: Quiet, isn't it, Joe Marshall?

Damned straight. Why do you think I hang out here?

> > However, if the right-hand side of the rules is *using* the printed
> > representation of a data structure as a shorthand, and that data
> > structure is constructed by READ, then
> >
> > (define . (q . ((cons . ((cons . (0 . (1 . ()))) . (2 . ()))) . ())))
> >
> > would indeed be a legitimate printed representation of an object that
> > was syntactically a definition.
>
> I've just tried that:
>
> Gambit Version 3.0
>
> > '(define . (q . ((cons . ((cons . (0 . (1 . ()))) . (2 . ()))) . ())))
> (define q (cons (cons 0 1) 2))
> > (define . (q . ((cons . ((cons . (0 . (1 . ()))) . (2 . ()))) . ())))
> *** ERROR IN (stdin)@2.1 -- Ill-formed special form: define
> >
>
> Hmm. I understand why this happens, but it screws this example.

Obviously, gambit is interpreting the right-hand of those production
rules as a token sequences rather than as the printed represention of
list structure.

I don't see any good reason to do this, though. It means that you
need a special `code' reader that can reject token sequences that the
regular reader would accept. Furthemore, you would need to switch to
the regular reader in order to read the datum sequence in a quoted
form or macro expression. You would be unable to precisely mimic the
built-in special forms by macros (because you need to alter the
reader). Finally the `derived' expressions could *not* be implemented
as macros (or conversely not checked for correct token sequences)
because macro production would lose that information. In fact, the
sample macro expansions in the section that follow would have to be
considered incorrect.

This would be hard to implement, bizarre in the extreme and trivial
to hide.

(let-syntax ((define (syntax-rules ()
((define ...) (define ...)))))

(define . (q . ((cons . ((cons . (0 . (1 . ()))) . (2 . ()))) . ())))

)

Martin Rodgers

unread,

Nov 18, 2000, 3:00:00 AM11/18/00

to

Voice in the desert: Quiet, isn't it, Joe Marshall?

> I don't see any good reason to do this, though.

Error reporting? It helps to have line and character number info.

> This would be hard to implement, bizarre in the extreme and trivial
> to hide.

See the Gambit source code. ;) ISTR Stalin also works this way.

Algiers Petrofsky

unread,

Nov 19, 2000, 3:00:00 AM11/19/00

to

Joe Marshall <j...@content-integrity.com> writes:
> I don't see any good reason to do this, though. It means that you
> need a special `code' reader that can reject token sequences that the
> regular reader would accept. Furthemore, you would need to switch to
> the regular reader in order to read the datum sequence in a quoted
> form or macro expression. You would be unable to precisely mimic the
> built-in special forms by macros (because you need to alter the
> reader). Finally the `derived' expressions could *not* be implemented
> as macros (or conversely not checked for correct token sequences)
> because macro production would lose that information. In fact, the
> sample macro expansions in the section that follow would have to be
> considered incorrect.

No, r5rs requires very little error-checking. Accepting non-standard
usages does not make an implementation incorrect. If the standard
labels something an error, that puts a burden on programmers not to
use the construction (if they want their code to be accepted by all
standard scheme systems). It puts no burden on scheme implementors;
rather, it frees them to react to that construction however they would
like.

-al

Joe Marshall

unread,

Nov 20, 2000, 3:00:00 AM11/20/00

to

Algiers Petrofsky <Alg...@Petrofsky.Berkeley.CA.US> writes:

But there is no operational difference between accepting the
right-hand side of those rules as list structure and accepting them as
non-validated token structure that is isomorphic to list structure.

Joe Marshall

unread,

Nov 20, 2000, 3:00:00 AM11/20/00

to

Martin Rodgers <m...@thiswildcardaddressintentiallyleftmunged.demon.co.uk> writes:

> Voice in the desert: Quiet, isn't it, Joe Marshall?
>

> > I don't see any good reason to do this, though.
>

> Error reporting? It helps to have line and character number info.

That doesn't preclude using READ to read source code.

> > This would be hard to implement, bizarre in the extreme and trivial
> > to hide.
>
> See the Gambit source code. ;) ISTR Stalin also works this way.

That's extremely bizarre.

Admittedly, the standard doesn't say what the scheme command-line does
(or ever if it has one), but having a command line that distinguishes
between

(define . (foo 22)) and (define foo 22)

when *nothing else* is permitted to, is simply being pedantic.

Martin Rodgers

unread,

Nov 20, 2000, 3:00:00 AM11/20/00

to

Voice in the desert: Quiet, isn't it, Joe Marshall?

> > Error reporting? It helps to have line and character number info.

>
> That doesn't preclude using READ to read source code.

READ loses the line and character number info.

ste...@pcrm.win.tue.nl

unread,

Nov 23, 2000, 3:00:00 AM11/23/00

to

On Mon, 20 Nov 2000 18:51:34 -0000, Martin Rodgers
<m...@thiswildcardaddressintentiallyleftmunged.demon.co.uk> wrote:
>Voice in the desert: Quiet, isn't it, Joe Marshall?
>

>> > Error reporting? It helps to have line and character number info.
>>
>> That doesn't preclude using READ to read source code.
>
>READ loses the line and character number info.

There's nothing stopping READ from putting this info in extra
"hidden fields" in cons cells and vectors (but not in symbols, obviously).

Stephan
--
ir. Stephan H.M.J. Houben
tel. +31-40-2474358 / +31-40-2743497
e-mail: step...@win.tue.nl

Richard Cobbe

unread,

Nov 23, 2000, 3:00:00 AM11/23/00

to

ste...@pcrm.win.tue.nl () writes:

> On Mon, 20 Nov 2000 18:51:34 -0000, Martin Rodgers
> <m...@thiswildcardaddressintentiallyleftmunged.demon.co.uk> wrote:

> >Voice in the desert: Quiet, isn't it, Joe Marshall?
> >

> >> > Error reporting? It helps to have line and character number info.
> >>
> >> That doesn't preclude using READ to read source code.
> >
> >READ loses the line and character number info.
>
> There's nothing stopping READ from putting this info in extra
> "hidden fields" in cons cells and vectors (but not in symbols, obviously).

Why obviously not in symbols? Couldn't you use something like Common
Lisp's property lists? Granted, this would be an extension to the
language, but I don't see this as being a bigger change than the extra
"hidden fields" you mention.

Unless, of course, there's something in R5RS which explicitly prohibits
this sort of thing. (I'm not familiar enough with the spec to be able to
find such a statement quickly.)

Richard

ste...@pcrm.win.tue.nl

unread,

Nov 24, 2000, 2:56:23 AM11/24/00

to

On 23 Nov 2000 08:19:02 -0600, Richard Cobbe <co...@directlink.net> wrote:

>ste...@pcrm.win.tue.nl () writes:
>> There's nothing stopping READ from putting this info in extra
>> "hidden fields" in cons cells and vectors (but not in symbols, obviously).
>
>Why obviously not in symbols?

Uhm, I did some thinking and realize now my statement is wrong.

>Couldn't you use something like Common
>Lisp's property lists?

I'm not aware of this particular feature of CL, so I cannot comment
on how it would integrate with Scheme

>Unless, of course, there's something in R5RS which explicitly prohibits
>this sort of thing. (I'm not familiar enough with the spec to be able to
>find such a statement quickly.)

Well, I thought there was, since R5RS requires READ to produce the
same (in the sense of EQ?) symbol upon encountering the same
input text. But I was wrong in concluding that this means that
both symbols are physically the same. EQ? is not required to be
"pointer equality", it's just a strong equivalence relation.

So it is possible that READ produces different symbol objects with
different source coordinates in their "hidden fields", which are
nevertheless EQ?ual.

Martin Rodgers

unread,

Nov 24, 2000, 3:00:00 AM11/24/00

to

Voice in the desert: Quiet, isn't it, ste...@pcrm.win.tue.nl?

> There's nothing stopping READ from putting this info in extra
> "hidden fields" in cons cells and vectors (but not in symbols, obviously).

Is that more or less work, for an implementor, than a special READ?
Can you point me to the source code for such an implementation?

As I said earlier, I can understand why some implemetors use a special
parser. It may sometimes be a pain, but there are compensations.

Kellom{ki Pertti

unread,

Nov 24, 2000, 3:00:00 AM11/24/00

to

Martin Rodgers <m...@thiswildcardaddressintentiallyleftmunged.demon.co.uk> writes:
> Voice in the desert: Quiet, isn't it, ste...@pcrm.win.tue.nl?
> > There's nothing stopping READ from putting this info in extra
> > "hidden fields" in cons cells and vectors (but not in symbols, obviously).
>
> Is that more or less work, for an implementor, than a special READ?

More importantly, I would not be very happy with an implementation
that wastes memory for storing such information in every conceivable
object in the hope it might come handy some day. Of course one could
have two kinds of objects, ones with the extra information and ones
without. However, that would eat up one tag bit in the usual implementation
techniques, with the associated performance implications.

felix

unread,

Nov 24, 2000, 7:04:10 PM11/24/00

to

Martin Rodgers wrote in message ...

>Voice in the desert: Quiet, isn't it, ste...@pcrm.win.tue.nl?
>
>> There's nothing stopping READ from putting this info in extra
>> "hidden fields" in cons cells and vectors (but not in symbols,
obviously).
>
>Is that more or less work, for an implementor, than a special READ?

>Can you point me to the source code for such an implementation?
>

I'm not sure wether this is an answer to your question, but in CHICKEN
one can utilize a crude mechanism to decorate symbols or pairs with
extra information, like this:

(define-record-type pair-with-line-number
car ; <x>
cdr ; <y>
count) ; <integer>

(define (read-with-line-number-information . port)
(%%read
(if (pair? port)
(car port)
(current-input-port) )
(lambda (class data val)
(if (eq? 'list-info class)
(make-pair-with-line-number (car data) (cdr data) %%read-line-counter)
data) ) ) )

(define (with-line-number-information thunk)
(fluid-let ([%%read-line-counter 1])
(thunk) ) )

The basic idea is: the builtin procedure "%%read" accepts a callback
procedure that is called for each symbol or pair, with the "class" parameter
being either 'list-info or 'symbol-info and the "data" parameter being
the actual object read. See CHICKEN's source code ("library.scm")
for more information.
Very simple, not overly elegant, but sufficient to give a little helpful
line-number information in compiled code.

cheers,
felix

Martin Rodgers

unread,

Nov 25, 2000, 3:00:00 AM11/25/00

to

Voice in the desert: Quiet, isn't it, felix?

[code omitted]

> The basic idea is: the builtin procedure "%%read" accepts a callback
> procedure that is called for each symbol or pair, with the "class" parameter
> being either 'list-info or 'symbol-info and the "data" parameter being
> the actual object read. See CHICKEN's source code ("library.scm")
> for more information.

That's a neat hook. It reminds me of the Forth way of doing this,
except that Forth parsers work in a single pass so info like this is
simply held in variables like >IN.

> Very simple, not overly elegant, but sufficient to give a little helpful
> line-number information in compiled code.

Implementing such a parser hook is certainly easy. The parser in
Gambit's _source.scm is as big as your library.scm, possibly because
of the read table support. From a casual inspection, it looks like
Gambit only uses the file position info for parse errors.

Stalin, however, uses an s-expression structure with a line-position
field. This is used by the syntax-error function. Special list
functions like sx-car are used throughout the compiler. This is, I
suspect, the real cost of saving and using source position info. As
this is really just a matter of prefixing a lot of standard functions
with "sx-", the gain may be greater than the cost.

So, I can understand why some compiler writers do this. I can also
understand why others don't. It's your choice.