cons-specific optimizations?

Milo Turner

unread,

Nov 27, 2018, 8:24:36 PM11/27/18

to Racket Users

Hello all,

There's something I have been very curious of lately, about low level aspects of Racket's compiler.

First of all, I have always considered "cons" to be essentially the same as a struct:

(struct cons [hd tl] #:transparent)
(define car (procedure-rename cons-hd 'car))
(define cdr (procedure-rename cons-tl 'cdr))

However I would not be surprised if cons had special treatment in the compiler, runtime, etc.

Is this the case? How frequently does Racket contain special optimizations for cons cells? Does cons have a special representation at runtime or at the bytecode level?
Thanks.

Vishesh Yadav

unread,

Nov 27, 2018, 9:06:49 PM11/27/18

to iita...@gmail.com, racket...@googlegroups.com

I recall reading some 6.x code for RacketScript, and in 6.x world pair
was simply a pair of pointers in C structure[1] and `cons` was again a
primitive function that created those C objects[2]. No special
representation in byte-code [all values are opaque to bytecode (except
ints and floats I think)?]. My impression was it didn't do anything
special with it, at-least explicitly. Things can be different now with
Chez.

[1] https://github.com/racket/racket/blob/v6.12/racket/src/racket/include/scheme.h#L347
[2] https://github.com/racket/racket/blob/v6.12/racket/src/racket/src/list.c#L1050

> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Matthew Flatt

unread,

Nov 27, 2018, 9:43:26 PM11/27/18

to iita...@gmail.com, Vishesh Yadav, racket...@googlegroups.com

Pairs are special in both traditional Racket and Chez Scheme.

In the traditional Racket implementation, every allocated object has a
16-bit tag, and one of those is "pair". For a structure, the tag is
"structure", and then there's another word for the structure type. So,
the test for a pair is faster (e.g., in `car`) than the test for a
structure type (e.g., in a structure accessor). Pairs don't turn out to
be any more compact, though, due to alignment constraints.

Chez Scheme has similar difference, only better. Pairs are tagged using
low bits in the pointer that refers to a pair, so a `pair?` test
doesn't even need an indirection. A pair is also allocated more
compactly --- as exactly two words --- and it's treated specially by
the GC in other ways. A structure instance has a tag in the allocated
object, but the tag does at least combine the fact that the object a
structure and the structure type that it instantiates.

The compilers in both cases know some facts about how `cons` relates to
other primitives, but they also know how structure constructors and
accessors relate, so it's not as big a difference at that level.

Tony Garnock-Jones

unread,

Nov 28, 2018, 6:13:17 AM11/28/18

to Racket Users

On Wednesday, November 28, 2018 at 2:43:26 AM UTC, Matthew Flatt wrote:

The compilers in both cases know some facts about how `cons` relates to
other primitives, but they also know how structure constructors and
accessors relate, so it's not as big a difference at that level.

There are also some runtime differences, right? For example, the optimisation that reduces m invocations of `length` on the same length-n list from O(mn) to O(m).

Is there anything else interesting like that in there?

Tony

Matthew Flatt

unread,

Nov 28, 2018, 8:16:01 AM11/28/18

to Tony Garnock-Jones, Racket Users

At Wed, 28 Nov 2018 03:13:16 -0800 (PST), Tony Garnock-Jones wrote:
> On Wednesday, November 28, 2018 at 2:43:26 AM UTC, Matthew Flatt wrote:
> >
> > The compilers in both cases know some facts about how `cons` relates to
> > other primitives, but they also know how structure constructors and
> > accessors relate, so it's not as big a difference at that level.
> >
>
> There are also some runtime differences, right? For example, the
> optimisation that reduces m invocations of `length` on the same length-n
> list from O(mn) to O(m).

Yes, that's special handling for pairs in the sense that the
traditional Racket implementation takes advantage of leftover bits in a
pair object, and it uses two of them for "is a list" and "not a list".

Racket-on-Chez doesn't have the extra bits to work with, so "is a list"
and "not a list" information is recorded separately in an `eq?`-based
hash table.

> Is there anything else interesting like that in there?

I can't think of any other way that the representation of pairs is
exploited like that.

Some list primitives take advantage of special support or knowledge
about the GC. For example, Chez Scheme can allocate a small list using
a single pointer bump for allocation, making a contiguous sequence of
pairs and then filling in those pairs to chain them together.

Alexis King

unread,

Nov 28, 2018, 12:15:13 PM11/28/18

to Matthew Flatt, Tony Garnock-Jones, Racket Users

> On Nov 28, 2018, at 07:15, Matthew Flatt <mfl...@cs.utah.edu> wrote:
>
> Yes, that's special handling for pairs in the sense that the
> traditional Racket implementation takes advantage of leftover bits in a
> pair object, and it uses two of them for "is a list" and "not a list".
>
> Racket-on-Chez doesn't have the extra bits to work with, so "is a list"
> and "not a list" information is recorded separately in an `eq?`-based
> hash table.

Why does keeping track of “is a list” and “not a list” require two bits? It seems like a pair either is or is not a list, so one bit of information would be sufficient. Are there situations where the system can neither be sure that something is or is not a list? Circular lists, or something like that?

(This is not really important, of course, I’m just curious.)

Alexis

Robby Findler

unread,

Nov 28, 2018, 12:16:47 PM11/28/18

to Alexis King, Matthew Flatt, Racket Users, Tony Garnock-Jones

Also "don't know yet".

Robby

George Neuner

unread,

Nov 28, 2018, 12:26:31 PM11/28/18

to Alexis King, racket users

I'm guessing that one of those bits actually is just the "pair" bit, and
that the "list" bit differentiates a pair that is in a "proper" list
from one that is not.

George

Matthew Flatt

unread,

Nov 28, 2018, 1:26:17 PM11/28/18

to Robby Findler, Alexis King, Racket Users, Tony Garnock-Jones

Right. An alternative would be to set decide eagerly, but `list?` tests
are infrequent relative to pair constructions.

Then again, a function like `list` can and does set the "is a list" bit
eagerly and cheaply on newly allocated pairs.

Gustavo Massaccesi

unread,

Nov 28, 2018, 4:36:44 PM11/28/18

to Milo Turner, Matthew Flatt, Robby Findler, Alexis King, Tony Garnock-Jones, racket...@googlegroups.com

Some additional comments about this subject.

A big difference is that in Chez Scheme the cons are mutable, and that makes it almost impossible to make any optimization with the lists at the Chez Scheme level.

With the current Racket implementation there are a few reductions at compile time for the lists, for example

(car (list 1 2 3)) ; ==> 1

(when (list? l)

(let ([x (cdr l)])

(list? x) ;==> #t

))

The optimizer "understands" lists, but it doesn't understand custom structs that have are list like nesting.

Also, there was an attempt to implement the mcons using a struct instead of a special construction, but some programs were too slow with the modification, so it was never merged. https://github.com/racket/racket/pull/1316

Gustavo

Gustavo Massaccesi

unread,

Nov 28, 2018, 5:22:42 PM11/28/18

to Milo Turner, Matthew Flatt, Robby Findler, Alexis King, Tony Garnock-Jones, racket...@googlegroups.com

The last example should be

(when (and (list? l) (pair? l))

(let ([x (cdr l)])

(list? x) ;==> #t

))

Gustavo

Technical note: The optimizer wants to be sure that (cdr l) is safe before asserting that x is a list?. It should be ok to mark x as a list?, because if there is an error the inner code will not be run anyway. I'll try to remember why I made this, and send a PR if necessary.

Reply all

Reply to author

Forward