[racket] first vs car ; last vs cdr ; empty? vs null?

573 views
Skip to first unread message

Daniel Carrera

unread,
Mar 7, 2014, 5:45:23 AM3/7/14
to users
Hello,

Is there any difference between `first` and `car`, or between `last` and `cdr`, or between `empty? and null?` ?

I had assumed that these were just synonyms, added by Racket because they might be more memorable to a student. But apparently Racket doesn't think they are equal:

-> (equal? first car)
#f
-> (equal? last cdr)
#f
-> (equal? empty? null?)
#f


I suppose that they could be separate functions that happen to do the same thing, but if so, my next question would be why they aren't just aliases. As in:

-> (define myfirst car)
-> (equal? myfirst car)
#t

Cheers,
Daniel.
--
When an engineer says that something can't be done, it's a code phrase that means it's not fun to do.

Jens Axel Søgaard

unread,
Mar 7, 2014, 6:16:21 AM3/7/14
to Daniel Carrera, users
For lists first/rest works the same as car/cdr.
For non-lists there is a difference: first and rest signals an error.
The names first and rest makes it easier for a human reader of
a piece of code to see that the program works on lists only.

For the curious, the definition of first is:

(define (first x)
(if (and (pair? x) (list? x))
(car x)
(raise-argument-error 'first "(and/c list? (not/c empty?))" x)))

I found this definition like this:
1. Entered this program in DrRacket:
#lang racket
first
2. Clicked the "Check Syntax" button
3. Right clicked the identifier first and chose "Open defining file"
4. Chose "first" in the definition-drop-down in the upper left corner.

/Jens Axel

> ____________________
> Racket Users list:
> http://lists.racket-lang.org/users
>

--
--
Jens Axel Søgaard

____________________
Racket Users list:
http://lists.racket-lang.org/users

Ryan Davis

unread,
Mar 7, 2014, 6:19:31 AM3/7/14
to Daniel Carrera, users

On Mar 7, 2014, at 2:45, Daniel Carrera <dcar...@gmail.com> wrote:

> Hello,
>
> Is there any difference between `first` and `car`, or between `last` and `cdr`, or between `empty? and null?` ?
>
> I had assumed that these were just synonyms, added by Racket because they might be more memorable to a student. But apparently Racket doesn't think they are equal:
>
> -> (equal? first car)
> #f
> -> (equal? last cdr)
> #f
> -> (equal? empty? null?)
> #f
>
>
> I suppose that they could be separate functions that happen to do the same thing, but if so, my next question would be why they aren't just aliases. As in:
>
> -> (define myfirst car)
> -> (equal? myfirst car)

For many/most things in racket, you can bring up the definition for something inside of DrRacket:

(define (first x)
(if (and (pair? x) (list? x))
(car x)
(raise-argument-error 'first "(and/c list? (not/c empty?))" x)))

I couldn't for car, so I'm assuming it is considered a primitive.

last and cdr aren't synonymous:

(define (last l)
(if (and (pair? l) (list? l))
(let loop ([l l] [x (cdr l)])
(if (pair? x)
(loop x (cdr x))
(car l)))
(raise-argument-error 'last "(and/c list? (not/c empty?))" l)))


(define (empty? l) (null? l))

null? seems to be a primitive as well. Not sure why they're not direct synonyms in this case.

Jens Axel Søgaard

unread,
Mar 7, 2014, 6:27:53 AM3/7/14
to Ryan Davis, users
The (define (empty? l) (null? l)) is needed due to object-name.

> (object-name null?)
'null?
> (object-name empty?)
'empty?
> (define my-empty? empty?)
> (object-name my-empty?)
'empty?

/Jens Axel

--
--
Jens Axel Søgaard

____________________

Laurent

unread,
Mar 7, 2014, 7:11:46 AM3/7/14
to Jens Axel Søgaard, users
Is this really a need? I really wouldn't bother to have
> (object-name empty?)
'null?

(I have no strong opinion on the subject though, I'm just wondering if there is a pratical different too)

Laurent

Jens Axel Søgaard

unread,
Mar 7, 2014, 7:18:00 AM3/7/14
to Laurent, users
Error messages use the name reported by object-name.

Laurent

unread,
Mar 7, 2014, 7:34:33 AM3/7/14
to Jens Axel Søgaard, users
good point, thanks.

Daniel Carrera

unread,
Mar 7, 2014, 7:40:30 AM3/7/14
to Jens Axel Søgaard, users
Thanks. That's a very useful tip (being able to get at the source code). I am a bit confused by the condition "(and (pair? x) (list? x))". It seems to me that this could just be replaced with "(pair? x)". The "list?" doesn't add anything. Am I wrong?

Also, I don't see exactly how "first" and "car" behave different on a non-list. They both raise an error. The errors are just worded differently.

On the same file, I found the definition of empty?

(define empty? (lambda (l) (null? l)))

Wouldn't it be more economical to write "(define empty? null?)" and allow them to be synonyms?

Cheers,
Daniel.

Daniel Carrera

unread,
Mar 7, 2014, 7:41:29 AM3/7/14
to Jens Axel Søgaard, users
Ah. Thanks. Please disregard the final question on my last email. You just answered it.

Jens Axel Søgaard

unread,
Mar 7, 2014, 7:49:42 AM3/7/14
to Daniel Carrera, users
The value (cons 3 42) is not a list. The function car will extract 3,
but first will fail.

/Jens Axel

Daniel Carrera

unread,
Mar 7, 2014, 8:39:36 AM3/7/14
to Jens Axel Søgaard, users
What is (cons 3 2) ? What is the definition of a list? I thought that a list was defined as either '() or a pair.

Cheers,
Daniel.

Jon Zeppieri

unread,
Mar 7, 2014, 8:43:57 AM3/7/14
to Daniel Carrera, Jens Axel Søgaard, users

Jon Zeppieri

unread,
Mar 7, 2014, 8:46:02 AM3/7/14
to Daniel Carrera, Jens Axel Søgaard, users
Oops, sorry about that empty message. I was going to say that your
definition of a list is close, but it's missing something, A list is
either:

- the empty list; or
- a pair, the second element of which is a list

(cons 3 2) is a pair, and sometimes non-list pairs are called
"improper lists," but they don't satisfy list?.

Daniel Carrera

unread,
Mar 7, 2014, 10:16:56 AM3/7/14
to Jon Zeppieri, Jens Axel Søgaard, users
Ok. That makes sense. A list is either '() or something plus a list.

Thanks.

Cheers,
Daniel.

Eric Dong

unread,
Mar 7, 2014, 11:52:27 AM3/7/14
to Daniel Carrera, Jens Axel Søgaard, users
Forgive me if I am super terribly wrong. Isn't it the case that an improper list is only known to be improper if we walk to the end and find something other than an empty? So wouldn't that mean "first" and "rest" take linear time since they must make sure the argument is a list? Clearly that doesn't happen. What am I missing?

Matthias Felleisen

unread,
Mar 7, 2014, 12:07:32 PM3/7/14
to Eric Dong, Jens Axel Søgaard, users

In a world of immutable cons-es you can cache the result so that testing list? becomes an O(1) operation.

David Van Horn

unread,
Mar 7, 2014, 12:02:16 PM3/7/14
to us...@racket-lang.org, yd2...@uwaterloo.ca
On 3/7/14, 11:52 AM, Eric Dong wrote:
> Forgive me if I am super terribly wrong. Isn't it the case that an improper
> list is only known to be improper if we walk to the end and find something
> other than an empty? So wouldn't that mean "first" and "rest" take linear
> time since they must make sure the argument is a list? Clearly that doesn't
> happen. What am I missing?


That's correct.

What you're missing is that the cost is amortized. It gets cheaper to
check the more times you check it. In the limit, it's a constant time
operation.

Eg:

(let ((ls (make-list 50000000 0)))
(time (list? ls))
(time (list? ls))
(time (list? ls))
(time (list? ls))
(time (list? ls)))

David

Deren Dohoda

unread,
Mar 7, 2014, 12:28:17 PM3/7/14
to Matthias Felleisen, Jens Axel Søgaard, users

Does this mean we shouldn't cdr functional lists but only use rest?

Matthias Felleisen

unread,
Mar 7, 2014, 12:56:34 PM3/7/14
to Deren Dohoda, Jens Axel Søgaard, users

No, why?

Deren Dohoda

unread,
Mar 7, 2014, 1:36:41 PM3/7/14
to Matthias Felleisen, Jens Axel Søgaard, users

Well one because rest is more specific, as above. Call it a style question.  But two because cdr might destroy any information literally. Though I guess cdr couldn't turn a structure already determined to be a list into an improper list. I just wondered if it interfered with any caching.

Daniel Carrera

unread,
Mar 7, 2014, 1:40:05 PM3/7/14
to Deren Dohoda, Jens Axel Søgaard, users, Matthias Felleisen
On the contrary, car and cdr are GUARANTEED to be O(1).

car == return the first element in a pair.
cdr == return the second element in a pair.

The fact that the cdr of a list returns the rest of the list is simply an incidental side-effect of the definition of list.

Cheers,
Daniel.

Matthias Felleisen

unread,
Mar 7, 2014, 1:57:43 PM3/7/14
to Daniel Carrera, users, Jens Axel Søgaard

I prefer first/rest for readability.

When I write (first (second (first x))) though, my old Lisp heart comes back and I switch to (caadar x) or something like that.

Harry Spier

unread,
Mar 7, 2014, 2:22:50 PM3/7/14
to users
See also this previous thread on the Racket list.

Daniel Carrera

unread,
Mar 7, 2014, 2:25:57 PM3/7/14
to Matthias Felleisen, users, Jens Axel Søgaard
I just discovered that SRFI-1 defines first, second, third, ... eight. Except that first = car, second = cadr, and so on.

Cheers,
Daniel.

Daniel Carrera

unread,
Mar 7, 2014, 2:31:40 PM3/7/14
to Harry Spier, users
Thanks.


On 7 March 2014 20:22, Harry Spier <vasisht...@gmail.com> wrote:
See also this previous thread on the Racket list.

____________________
  Racket Users list:
  http://lists.racket-lang.org/users

Sean Kanaley

unread,
Mar 7, 2014, 4:02:30 PM3/7/14
to Daniel Carrera, Harry Spier, users
Pardon me, but going back a bit where the blog about switching to immutable lists was mentioned...I saw an interesting comment about immutability and eq?.  Have any mathematicians come up with an answer for the meaning of eq? on permanently distinct but equivalent and immutable objects, e.g. (eq? (cons 3 2) (cons 3 2))?  What about on immutable objects that have mutable internal state, e.g. memoizing functions?  Naive-but-memoized (fib 1000) is not necessarily the same as (fib 1000), unless the answer doesn't depend on the number of universes that have bigbanged meanwhile.

Eric Dong

unread,
Mar 7, 2014, 4:40:16 PM3/7/14
to Sean Kanaley, Harry Spier, users
So if code is using first/rest instead of car/cdr when I already know the lists are proper, it could incur a significant performance penalty? After all, memoization still needs a table lookup each time I do a first/rest.

(i.e. implementing datastructures in terms of lists should use car/cdr for performance right? any benchmarks?)

Carl Eastlund

unread,
Mar 7, 2014, 11:03:57 PM3/7/14
to Eric Dong, Harry Spier, users
The first/rest operations do not use a memoization table.  They test using the list? primitive, which is built in and actually has a couple of tag bits reserved on every cons object for memoizing its results.  So the operation really is quite fast.

Carl Eastlund

David T. Pierson

unread,
Mar 8, 2014, 12:33:42 PM3/8/14
to us...@racket-lang.org
On Fri, Mar 07, 2014 at 11:03:57PM -0500, Carl Eastlund wrote:
> The first/rest operations do not use a memoization table. They test using
> the list? primitive, which is built in and actually has a couple of tag
> bits reserved on every cons object for memoizing its results. So the
> operation really is quite fast.

I take this to mean there is a bit in the cons object for storing
whether it is a list, and that this bit is set via `list?'.

Why wouldn't the bit be set via `cons'?

Also, when I do:

$ racket
Welcome to Racket v6.0.0.2.
> (let ((ls (make-list 50000000 0)))
(time (list? ls))
(time (list? ls))
(time (list? ls))
(time (list? ls))
(time (list? ls)))

cpu time: 220 real time: 220 gc time: 0
cpu time: 112 real time: 112 gc time: 0
cpu time: 60 real time: 58 gc time: 0
cpu time: 28 real time: 29 gc time: 0
cpu time: 16 real time: 15 gc time: 0
#t

Why does the time decrease gradually, rather than the 2nd and subsequent
times being roughly equivalent?

Thanks.

David

Matthias Felleisen

unread,
Mar 8, 2014, 12:47:22 PM3/8/14
to David T. Pierson, us...@racket-lang.org

You want cons (the frequent action) to take constant time: allocate pointer, set two fields.

Carl Eastlund

unread,
Mar 8, 2014, 12:52:27 PM3/8/14
to Matthias Felleisen, Racket Users, David T. Pierson
If every cons set those bits, it would still take constant time because you'd only have to look one deep.  However, the important part is that currently cons never has to inspect its arguments.  Given that cons may be called frequently in a program that never uses list?, one does not want to pay for inspecting those arguments until needed.

Carl Eastlund

Matthias Felleisen

unread,
Mar 8, 2014, 12:53:06 PM3/8/14
to Carl Eastlund, Racket Users, David T. Pierson

I sit corrected :-) (I had it right at some point)

Matthew Flatt

unread,
Mar 8, 2014, 1:00:28 PM3/8/14
to Matthias Felleisen, Carl Eastlund, Racket Users, David T. Pierson
I've never measured carefully enough to be sure that it's better to put
the cost in `list?` instead of `cons`, but my guess is that the cost is
better in `list?`.

There's an `unsafe-cons-list` function sets the is-a-list bit when
creating a pair. That is, it always sets the bit without a test on the
second argument.

Some function, such as `list`, internally use `unsafe-cons-list`, since
the pairs that the function creates always form a list. In some limited
cases, the compiler effectively replaces `cons` with `unsafe-cons-list`
to set the bit early where no run-time test is needed. Currently,
`make-list` does not use `unsafe-cons-list` and the compiler is not
smart enough to replace its `cons` with `unsafe-cons-list`.

Alexander McLin

unread,
Mar 8, 2014, 1:03:01 PM3/8/14
to David T. Pierson, us...@racket-lang.org
I'm joining in this thread because I'm now wondering about the same questions.

It seems to me that if a list is created via list or make-list then the bits should have been set on the car of the new list, so all the list predicates have to do is check the first element in O(1) time.

Likewise cons can check to see if the second element is a list and correspondingly set bits on the first element.

Thanks to immutable pairs, it seems it should be possible to provide guarantees on list? being always O(1) by using the above technique without needing to cache results and converging to O(1) in the limit.

Matthew Flatt

unread,
Mar 8, 2014, 1:10:20 PM3/8/14
to David T. Pierson, us...@racket-lang.org
At Sat, 08 Mar 2014 12:33:42 -0500, "David T. Pierson" wrote:
> Also, when I do:
>
> $ racket
> Welcome to Racket v6.0.0.2.
> > (let ((ls (make-list 50000000 0)))
> (time (list? ls))
>
> (time (list? ls))
>
> (time (list? ls))
>
> (time (list? ls))
>
> (time (list? ls)))
>
> cpu time: 220 real time: 220 gc time: 0
> cpu time: 112 real time: 112 gc time: 0
> cpu time: 60 real time: 58 gc time: 0
> cpu time: 28 real time: 29 gc time: 0
> cpu time: 16 real time: 15 gc time: 0
> #t
>
> Why does the time decrease gradually, rather than the 2nd and subsequent
> times being roughly equivalent?

When `list?` explores N pairs to determine whether its argument is a
list, it sets a bit once after N/2 pairs. That choice avoids the cost
of setting bits everywhere while providing amortized constant time for
`rest` in something like

(define (len l)
(if (empty? l)
0
(add1 (len (rest l)))))

Setting the bit at N/2 pairs also happens to reuse references that are
already in flight for tortoise--hare cycle detection. (Whether it's
worth having immutable cyclic pairs is yet another question, though.)

Alexander McLin

unread,
Mar 8, 2014, 1:20:28 PM3/8/14
to Matthew Flatt, Carl Eastlund, Racket Users, Matthias Felleisen, David T. Pierson
I see, thanks for the detailed explanation. I understand the issue is whether the cost associated with cons checking arguments prior to creating the pair is acceptable or not but truly if cons is checking for list? by whether a bit is set or not, wouldn't that be negligible compared to the cost of creating the pair in the first place?

Now I'm curious enough to want to try modifying cons on my own and seeing what happens. Do you mind giving me pointers on how to do that? I imagine it'd require changing the source code and compiling racket?

Greg Hendershott

unread,
Mar 8, 2014, 1:35:11 PM3/8/14
to us...@racket-lang.org
As for length:

(for ([i 3]) (collect-garbage))
(let ((ls (time (make-list 50000000 0))))
(time (length ls))
(time (length ls))
(time (length ls))
(void))

It's what I would expect:

cpu time: 6733 real time: 6744 gc time: 6419
cpu time: 141 real time: 141 gc time: 0
cpu time: 145 real time: 145 gc time: 0
cpu time: 142 real time: 142 gc time: 0

- - - - -

Next, thinking about 50000000 elements made me think about vectors,
vector? and vector-length.

This:

(for ([i 3]) (collect-garbage))
(let ((v (time (make-vector 50000000 0))))
(time (vector-length v))
(time (vector-length v))
(time (vector-length v))
(void))

prints:

cpu time: 182 real time: 183 gc time: 1
cpu time: 315 real time: 315 gc time: 314
cpu time: 0 real time: 0 gc time: 0
cpu time: 0 real time: 0 gc time: 0

Huh? The first call to vector-length takes even longer than making
the vector (and, seems to be almost entirely gc).

I would have guessed that make-vector would create a structure with
the length already stored in it, and all vector-length calls would be
essentially 0.

Laurent

unread,
Mar 8, 2014, 1:57:14 PM3/8/14
to Greg Hendershott, users
I get :

> (let ((v (time (make-vector 50000000 0))))
    (time (vector-length v))
    (time (vector-length v))
    (time (vector-length v))
    (void))
cpu time: 188 real time: 191 gc time: 12

cpu time: 0 real time: 0 gc time: 0
cpu time: 0 real time: 0 gc time: 0
cpu time: 0 real time: 0 gc time: 0

Which seems consistent to me.

Are you trying on the command line?

Laurent

David T. Pierson

unread,
Mar 8, 2014, 2:31:37 PM3/8/14
to us...@racket-lang.org
On Sat, Mar 08, 2014 at 11:10:20AM -0700, Matthew Flatt wrote:
> At Sat, 08 Mar 2014 12:33:42 -0500, "David T. Pierson" wrote:
> > cpu time: 220 real time: 220 gc time: 0
> > cpu time: 112 real time: 112 gc time: 0
> > cpu time: 60 real time: 58 gc time: 0
> > cpu time: 28 real time: 29 gc time: 0
> > cpu time: 16 real time: 15 gc time: 0
> > #t
> >
> > Why does the time decrease gradually, rather than the 2nd and subsequent
> > times being roughly equivalent?
>
> When `list?` explores N pairs to determine whether its argument is a
> list, it sets a bit once after N/2 pairs. That choice avoids the cost
> of setting bits everywhere while providing amortized constant time for

Ah, that makes sense. And now that I look closely, I see that the time
was decreasing by about half each time.

Thanks.

David

Greg Hendershott

unread,
Mar 8, 2014, 3:03:07 PM3/8/14
to Laurent, users
Before posting, I had repeated this multiple times, running this code
inside a sandbox.

After seeing your reply, I tried but couldn't repro it at the command line.

I could still repro it in the sandbox. But with a fresh sandbox, now I
can't repro it anymore. I guess some mysterious bit of state was
causing this (but only for the first call to vector-length??).

I know the rule -- only do serious profiling at the command line. In
this case I didn't expect it would matter. I figured "some number vs.
some smaller number" could be due to environment overhead, but not
"some number vs. 0". However I was wrong.

Matthew Flatt

unread,
Mar 9, 2014, 11:55:30 AM3/9/14
to Greg Hendershott, users
The GC in that first call to `vector-length` is really paying for work
related to allocating the vector.

Normally, a call to `vector-length` wouldn't present an opportunity for
the GC to fire. Combine it with `time`, though, and there are some
allocation points to set up timing. So, unfortunately, the
instrumentation to time the call to `vector-length` is what allows a GC
to happen and produce a misleading timing result.

Meanwhile, since you're running `(collect-garbage)` three times before
starting, you might not expect a GC to be needed after allocating one
big vector (that's still live). But the way the GC works, it's
essentially because a big chunk of memory has been allocated since the
last GC that can make the system perform a major GC --- particularly if
anything else at all is happening in the Racket process, such as a
sandbox-related thread.

In this case, putting a separate `(collect-garbage)` before each `time`
would probably help measure what you intended to measure.
Reply all
Reply to author
Forward
0 new messages