A nil puzzle

11 views
Skip to first unread message

Mark Engelberg

unread,
Jan 23, 2009, 3:35:47 AM1/23/09
to clo...@googlegroups.com
The power set of a set S is defined as the set of all subsets of S.

Here is one possible implementation of a powerset function. To stay
true to the spirit of the above definition, I've written it in a way
that manipulates the sets in set form as much as possible (not quite
an easy task since things like rest and map return sequences, not
sets).

(defn powerset [s]
(if (empty? s) #{#{}}
(let [element (first s),
rest-s (disj s element),
powerset-rest-s (powerset rest-s)]
(clojure.set/union powerset-rest-s
(into #{} (map #(conj % element) powerset-rest-s))))))

Example: (powerset #{1 2 3}) prints as #{#{} #{1} #{2} #{3} #{1 2} #{1
3} #{2 3} #{1 2 3}}

Now, here's the puzzle. Let's say you want to convert this idea over
to working with lists, or perhaps sequences in general.

Should (powerset '(1 2 3)) print as:
(() (1) (2) (3) (1 2) (1 3) (2 3) (1 2 3))
or
(nil (1) (2) (3) (1 2) (1 3) (2 3) (1 2 3))

I can think of several excellent arguments for and against each
option. I'd like to know whether the Clojure community is as
conflicted on this point as I am. I think that the way you reason
about this question says a lot about how you view the relationship
between lists and sequences and the role of nil and the empty list, so
I look forward to hearing the responses. Let the debating begin!

Konrad Hinsen

unread,
Jan 23, 2009, 4:17:45 AM1/23/09
to clo...@googlegroups.com
On 23.01.2009, at 09:35, Mark Engelberg wrote:

> Now, here's the puzzle. Let's say you want to convert this idea over
> to working with lists, or perhaps sequences in general.
>
> Should (powerset '(1 2 3)) print as:
> (() (1) (2) (3) (1 2) (1 3) (2 3) (1 2 3))
> or
> (nil (1) (2) (3) (1 2) (1 3) (2 3) (1 2 3))

My vote is for (), because it specifically represents the empty
subset of a list, rather than a generic "nothing in here" value,
which is my interpretation of nil.

> between lists and sequences and the role of nil and the empty list, so
> I look forward to hearing the responses. Let the debating begin!

This reminds me of the long debates in the APL community about the
shapes of empty nested arrays :-)

Konrad.

MikeM

unread,
Jan 23, 2009, 6:32:00 AM1/23/09
to Clojure
The power set function for lists should use () as the empty set. If
you use nil, then it seems that you would have

(powerset nil) => (nil)

but this is wrong, since the only subset of an empty set is the empty
set. You could try to fix this by making

(powerset nil) => nil

but then your function would return a list for all cases except when
presented with the empty set, which doesn't seem right.

For sequences in general, Rich has said there is no such thing as an
empty seq:

http://groups.google.com/group/clojure/browse_thread/thread/966bd0d4bb18a4a2/b56470cbc8b4123e?lnk=raot&fwc=1

so I think you couldn't use seqs to represent sets (unless there is a
notion of sets where there is no empty set).

lpetit

unread,
Jan 23, 2009, 7:20:45 AM1/23/09
to Clojure
+1.

#{} is the empty set, () is the empty list, {} is the empty map.

Cheers,

--
Laurent

Stephen C. Gilardi

unread,
Jan 23, 2009, 10:34:03 AM1/23/09
to clo...@googlegroups.com
There was a recent suggestion here:

http://groups.google.com/group/clojure/msg/32d323e15f8ce624

about the proper value of:

(clojure.contrib.lazy-seqs/combinations)

(and perhaps by extension (clojure.contrib.lazy-seqs/combinations '())
(or any number of empty seq arguments))

Does a decision to use empty collections in the powerset case fit well
with the suggestion that

(clojure.contrib.lazy-seqs/combinations) return ([])

or are they in conflict or unrelated?

--Steve

> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> To unsubscribe from this group, send email to clojure+u...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/clojure?hl=en
> -~----------~----~----~----~------~----~------~--~---
>

Jason Wolfe

unread,
Jan 23, 2009, 12:32:17 PM1/23/09
to Clojure
See also this thread:

http://groups.google.com/group/clojure/browse_thread/thread/134642cc76de17f7?hl=en#

where I also proposed putting power-set in clojure.contrib. I'm just
waiting to hear back from Rich about a few potential changes to core,
and then I will dump all of my remaining proposals on the contrib
issues page.

As far as the return values, mine returns vectors ... I don't think it
really matters if its lists or vectors or seqs, so long as it's
consistent. (I would actually vote against using sets though, since
with the other options it's slightly more general as it will also work
with multi-sets):

(defn power-set
"Returns a lazy seq of possible subvectors of seq s."
[s]
(loop [s (seq s), sets [[]]]
(if s
(recur (rest s) (lazy-cat sets (map #(conj % (first s))
sets)))
sets)))

user> (power-set '(1 1))
([] [1] [1] [1 1])

-Jason

Jason Wolfe

unread,
Jan 23, 2009, 12:41:10 PM1/23/09
to Clojure
On Jan 23, 7:34 am, "Stephen C. Gilardi" <squee...@mac.com> wrote:
> There was a recent suggestion here:
>
>        http://groups.google.com/group/clojure/msg/32d323e15f8ce624
>
> about the proper value of:
>
>         (clojure.contrib.lazy-seqs/combinations)
>
> (and perhaps by extension (clojure.contrib.lazy-seqs/combinations '())  
> (or any number of empty seq arguments))

No no no :). (combinations '()) should be nil/empty coll, but
(combinations) should be a [[]] (or any coll/seq containing only an
nil/empty coll, I don't care which types).

This can be argued several ways, but the clearest is to think about
the length of the sequence returned by combinations. It should be the
product of the lengths of the input sequences. Thus, if any of the
inputs to combinations are empty lists, (* ? 0 ? ?) ==> 0, the return
value should be nil. But, if the input is *no lists at all*, then we
have (*) ==> 1 and the output should be a list containing the empty
list.


> Does a decision to use empty collections in the powerset case fit well  
> with the suggestion that
>
>         (clojure.contrib.lazy-seqs/combinations) return ([])
>
> or are they in conflict or unrelated?

Well, you can reason in the same way about power-set; if the input has
n elements, the output should have 2^n elements. Thus, (power-set
nil) should return [[]] as well (or, again, any coll/seq containing
only an empty coll/seq, I don't care which type).

Or, you could just look at Wiki: "the power set of the empty set is
the set containing the empty set"

http://en.wikipedia.org/wiki/Power_set

-Jason

e

unread,
Jan 23, 2009, 8:04:33 PM1/23/09
to clo...@googlegroups.com
sented with the empty set, which doesn't seem right.

For sequences in general, Rich has said there is no such thing as an
empty seq:

http://groups.google.com/group/clojure/browse_thread/thread/966bd0d4bb18a4a2/b56470cbc8b4123e?lnk=raot&fwc=1

so the answer is you don't convert the idea over to lists, maybe.




Mark Engelberg

unread,
Jan 24, 2009, 3:43:38 AM1/24/09
to clo...@googlegroups.com
Now that everyone's had a day to mull this puzzle over, here are my thoughts.

One way to think about this, is to think about what the statement of
purpose / contract of a generalized powerset function would be. In
general, most functions on seq-able objects return sequences. For
example, (rest [1 2 3]) yields (2 3). So we'd want to make powerset
consistent with Clojure's general behavior. Also, like most Clojure
core functions, we'd expect sequences in the output to be lazy. I
think there are two basic lines of reasoning:

1. The most straightforward generalization is:
The powerset of a seq-able S is the (lazy) sequence of all (lazy)
subsequences of (seq S).
Thinking in this vein, you might figure that
(powerset [1 2 3]) should be (nil (1) (2) (3) (1 2) (1 3) (2 3) (1 2 3)).
One way to rationalize it is that we really need to have an empty
sequence in the sequence of answers, but there is no such thing as an
empty sequence, so we must use nil.
But the obvious flaw here is that this output violates our contract.
We said we'd generate a sequence of sequences, but (seq? nil) is
false. nil is not the empty sequence, nor is it a sequence at all, so
we haven't accurately captured the function's intention with this
output.

2. So one way to try to "fix" this problem is to put a concrete empty
object in the output sequence. Since sequences are inspired by lists,
perhaps it makes sense to replace the nil with the empty list (). So,
(powerset [1 2 3]) yields (() (1) (2) (3) (1 2) (1 3) (2 3) (1 2 3)).
But this has problems as well. First, even though the result prints
as a list of lists, it is important to remember that these things are
not lists, but sequences derived from vectors (since the input is a
vector in our example). For example, (cons 1 [2 3]) prints as (1 2
3), but (list? (cons 1 [2 3])) is false.
So it's difficult to even imagine what the contract would be for such
a function. Perhaps, "The powerset of a seq-able S is the (lazy)
sequence of the empty list and all the (lazy) subsequences of (seq
S)." would be one way to word the statement of purpose. But if the
input is a vector, it seems really odd to see an empty list in the
output. Why a list?

So I think that the answer is that there is no good answer. We really
need an empty sequence here, and Clojure doesn't have such a notion.

Clojure's choice to not have an empty sequence doesn't really cause
any inconvenience when dealing with flat sequences, but I have found
that it results in these sorts of dilemmas almost any time you deal
with sequences of sequences. In discussions about generating a
sequence of all permutations, all combinations, etc., you inevitably
end up with arguments over issues like what (permutations nil) should
be (some say (nil), some say (()), and some say nil). If you have an
empty sequence, these issues go away.

So ultimately, you just have to make a choice between nil and () and
go with it. It all comes down to a gut feeling about which one is a
better stand-in for the notion of an empty sequence, even though
neither one truly is the empty sequence. In practice, it doesn't seem
to matter too much which choice you make. nil and () are almost
entirely interchangeable.
(cons 1 nil) is the actual concrete list (1), just as (cons 1 '()) is.
(conj nil 1) is the same as (conj '() 1)
(first nil) and (first '()) both yield nil.
(rest nil) and (rest '()) both yield nil.
(empty? nil) and (empty? '()) both yield true
(seq nil) and (seq '()) both yield nil.

There are certainly a few differences, of course. nil and '() respond
differently to nil?, list? and not, for example. But I haven't found
these sorts of tests to be common in code operating on nested
sequences.

Even though it doesn't make a huge difference, the lack of a clearcut
correct choice means that people will make different choices in
different codebases, which could prove problematic.

I have seen Rich's talk in which he explains that he chose to not have
an empty sequence because which empty sequence should it be? Empty
list? Empty vector? Why should one empty object be privileged? But
this reasoning never resonated with me. Sequences and lists have a
special connection. Sequences are intended to be a list-like
interface. So any sequence-producing function produces either a
concrete list, or something that looks, feels, and tastes like a list.
I've already gotten used to seeing (rest [1 2 3]) producing (2 3) in
the REPL (even though technically it's producing something list-like
and not actually producing a list). So to me, it would be perfectly
natural to see (rest [1]) producing (). () could reasonably serve as
both the empty list and the empty sequence since lists and sequences
are so closely related.

I assume that a design decision as fundamental as the choice to have
no empty sequence is probably set in stone at this point, so I think
it's safe to assume that this dilemma is here to stay. At the moment,
I intend to advocate the use of nil over () as the closest surrogate
for an empty sequence. Clojure is designed to not give one empty
collection privileged status, so putting () inside a sequence of
sequences feels more wrong to me. Furthermore, including nil is often
more practical to code. None of Clojure's built-ins actually return
(), so it's only feasible to have () represent the empty sequence if
it's the result of a separately-coded base case. For many algorithms
where the sequence is produced by mapping a function over a bunch of
values, or by 'rest'ing a sequence until reaching empty, the algorithm
will most naturally include nil in the sequence. So for consistency,
it's probably better to always choose nil over () in sequences of
sequences.

Anyone want to try to convince me otherwise? If you think my logic is
flawed, I'd love to hear your take on this problem.

Konrad Hinsen

unread,
Jan 24, 2009, 5:48:52 AM1/24/09
to clo...@googlegroups.com
On 24.01.2009, at 09:43, Mark Engelberg wrote:

> Anyone want to try to convince me otherwise? If you think my logic is
> flawed, I'd love to hear your take on this problem.

It's not a question of logic but of priorities. As you explained very
well, there is no obviously ideal solution. So what you look for is
the "best" compromise.

One question I would ask in such a situation is: what are the typical
use cases for a power set? Which choice is most practical? I can't
answer it because I have never needed power sets, but perhaps you can
give some examples.

Konrad.

e

unread,
Jan 24, 2009, 10:31:00 AM1/24/09
to clo...@googlegroups.com
this may be a silly argument, but:

 (list? '()) returns true

(list? nil) returns false
(list? 'nil) returns false.  I don't even know what it means.

therefore nil isn't a list.

Then again, I already chimed in.  a list isn't a set.  You said so in the introduction.  Maybe set's need there own print representation, like <> ..... uh oh, starting to look like C++

That or find the unicode character for the zero with the line through it.

Rich Hickey

unread,
Jan 24, 2009, 12:06:56 PM1/24/09
to Clojure


On Jan 24, 3:43 am, Mark Engelberg <mark.engelb...@gmail.com> wrote:
> Now that everyone's had a day to mull this puzzle over, here are my thoughts.
>
> One way to think about this, is to think about what the statement of
> purpose / contract of a generalized powerset function would be. In
> general, most functions on seq-able objects return sequences. For
> example, (rest [1 2 3]) yields (2 3). So we'd want to make powerset
> consistent with Clojure's general behavior. Also, like most Clojure
> core functions, we'd expect sequences in the output to be lazy. I
> think there are two basic lines of reasoning:
>
> 1. The most straightforward generalization is:
> The powerset of a seq-able S is the (lazy) sequence of all (lazy)
> subsequences of (seq S).

This is already wrong, right? Don't you mean (set s)? You haven't said
what (powerset [1 2 3 2 1]) is going to be. subsequences != subsets.
That doesn't follow.

Basically it's not hard to make a promise and behavior that match if
you don't keep trying to redefine what sequence means. You've said
you'd generate sequences but haven't defined what they are (in your
mind), then criticize as if they were logical flaws problems with your
formulations that have more to do with terminology mismatches than
logic.

You gotten tied up, here and elsewhere, conflating sequence with
container, and almost all of your problems stem from wanting to say
sequence and have it mean container. If you want to put the contents
of a (possibly empty) set into a sequential container, use a
sequential container - () or []. The sequential? predicate will cover
both of those as well as lazy sequences of set contents.

seq? tests if something implements ISeq, no more, no less - it is not
the defining predicate of sequence. Is that what you want to promise?

Or do you want to promise you'll return something on which (seq x)
works? It works for nil and () and [], and the user wouldn't look for
any one specifically.

Or do you want to promise that it returns the result of seq. It is
this notion that is most commonly meant by sequence in Clojure,
whether you like it or not.

Hmmm...

(defn sequence? [x] (or (nil? x) (seq? x)))

Take your pick, they all could work.

Most important, why promise more than you need to?

What you can't do is say - look, I promised sequence but it's broken
because (seq? nil) -> false.

>
> Clojure's choice to not have an empty sequence doesn't really cause
> any inconvenience when dealing with flat sequences, but I have found
> that it results in these sorts of dilemmas almost any time you deal
> with sequences of sequences. In discussions about generating a
> sequence of all permutations, all combinations, etc., you inevitably
> end up with arguments over issues like what (permutations nil) should
> be (some say (nil), some say (()), and some say nil). If you have an
> empty sequence, these issues go away.
>

Really? Don't you still have a problem with "some say (empty-seq),
some say empty-seq" - why would that be different?

> So ultimately, you just have to make a choice between nil and () and
> go with it. It all comes down to a gut feeling about which one is a
> better stand-in for the notion of an empty sequence, even though
> neither one truly is the empty sequence. In practice, it doesn't seem
> to matter too much which choice you make. nil and () are almost
> entirely interchangeable.
> (cons 1 nil) is the actual concrete list (1), just as (cons 1 '()) is.
> (conj nil 1) is the same as (conj '() 1)
> (first nil) and (first '()) both yield nil.
> (rest nil) and (rest '()) both yield nil.
> (empty? nil) and (empty? '()) both yield true
> (seq nil) and (seq '()) both yield nil.
>

> There are certainly a few differences, of course. nil and '() respond
> differently to nil?, list? and not, for example. But I haven't found
> these sorts of tests to be common in code operating on nested
> sequences.
>
> Even though it doesn't make a huge difference, the lack of a clearcut
> correct choice means that people will make different choices in
> different codebases, which could prove problematic.
>

I don't see how this follows from "In practice, it doesn't seem to
matter" and "I haven't found these sorts of tests to be common". In
practice it doesn't matter, as people just call (seq x) on it and get
on with their lives.

I'm not arguing against a convention here, and am amenable to your
argument about the practicality of nil, which is covered by the
promises "something on which (seq x) works" and "the result of seq".

It is only when code overpromises re: types etc that consumers become
brittle as they marry implementation details, i.e., even if you return
embedded nils, don't promise embedded nils.

Rich

Nathan Kitchen

unread,
Jan 24, 2009, 12:23:53 PM1/24/09
to Clojure


On Jan 24, 7:31 am, e <evier...@gmail.com> wrote:
> Then again, I already chimed in.  a list isn't a set.  You said so in the
> introduction.  Maybe set's need there own print representation, like <>
> ..... uh oh, starting to look like C++

Like this?:

user=> (hash-set 1 2 3)
#{1 2 3}

> On Sat, Jan 24, 2009 at 3:43 AM, Mark Engelberg <mark.engelb...@gmail.com>wrote:
> > Anyone want to try to convince me otherwise?  If you think my logic is
> > flawed, I'd love to hear your take on this problem.

My intuition is that the subsets returned by powerset should have the
same type as the argument:

user=> (powerset #{1 2})
(#{} #{1} #{2} #{1 2})

user=> (powerset [1 2])
([] [1] [2] [1 2])

user=> (powerset '(1 2))
(() (1) (2) (1 2))

For non-set collections, the name "powerset" might be misleading.
Maybe the right name for the concept is something more like
"subcolls".

-- Nathan

Mark Engelberg

unread,
Jan 24, 2009, 5:00:49 PM1/24/09
to clo...@googlegroups.com
On Sat, Jan 24, 2009 at 9:06 AM, Rich Hickey <richh...@gmail.com> wrote:
> This is already wrong, right? Don't you mean (set s)? You haven't said
> what (powerset [1 2 3 2 1]) is going to be. subsequences != subsets.

Despite the name "powerset" being inspired by sets, when I posed the
puzzle of generalizing the behavior to sequences, I meant:
(powerset [1 2 1]) to mean something like (nil (1) (2) (1) (1 2) (1 1)
(2 1) (1 2 1))
Because sets are the only things that care about uniqueness, this is
how I'd expect sequences to behave. I didn't specify this behavior
clearly, and I can see why my open-ended question could cause
confusion. But I don't consider this to be essential to my main
points, though.

> You gotten tied up, here and elsewhere, conflating sequence with
> container, and almost all of your problems stem from wanting to say
> sequence and have it mean container. If you want to put the contents
> of a (possibly empty) set into a sequential container, use a
> sequential container - () or []. The sequential? predicate will cover
> both of those as well as lazy sequences of set contents.

There are a lot of different collections in Clojure (can the words
collection and container be used interchangeably in Clojure?). But
they are all connected in the sense that they all implement a
list-like view of their contents. This is IMO one of the coolest
things about Clojure. Because most of the core Clojure functions use
this list-like view to manipulate collections, as soon as you use one
of these functions, you get back something different. When I apply
cons, lazy-cons, first, rest, map, filter, etc. to something like [1 2
3], #{1 2 3} or '(1 2 3), I get back something that acts like a list,
but isn't necessarily a list. What should I call this general concept
of what these functions return? I've been calling this general
concept a sequence (which seems to be supported by the fact that these
functions are listed in the sequence-in sequence-out section of the
docs). And yes, I also think of these non-specific sequences as
collections, because regardless of how they are implemented, they act
like lists (which are containers). How can I not think of (rest [1 2
3]) or (lazy-cons 2 #{3}) as both a sequence, as well as some sort of
collection of 2 and 3? That's certainly how they act.

So yes, I agree that I'm conflating sequence with container, but I'm
still looking for a better way to think about and classify these
inputs and outputs. I'd certainly like to have a better
classification, because as I've tried to illustrate, when you approach
things with this viewpoint, the lack of an empty sequence feels a bit
bizarre - an odd divergence in the way that sequences mirror lists.
So maybe it all comes down to needing better terminology to discuss
the inputs and outputs of Clojure's sequence functions...

When analyzing any function, we need to understand what its domain
(set of inputs) and range (set of outputs) are. So what are the
domain and range of Clojure's core sequence functions?

The domain is the set of things on which (seq x) works. Is there a
standard term for this (I've been using the term seq-able)? Is there
a built-in predicate to test this important classification?

The range is (I think) a proper subset of the domain. The outputs of
Clojure's sequence functions are all seq-able, but certain things are
missing from the set of outputs. You won't ever get back a vector,
for example, or any empty collection. Is there a standard term for
the range of seq? Is there a built-in predicate to test this
important classification?

> seq? tests if something implements ISeq, no more, no less - it is not
> the defining predicate of sequence. Is that what you want to promise?
>
> Or do you want to promise you'll return something on which (seq x)
> works? It works for nil and () and [], and the user wouldn't look for
> any one specifically.
>
> Or do you want to promise that it returns the result of seq. It is
> this notion that is most commonly meant by sequence in Clojure,
> whether you like it or not.

I think that the promise I want to make is one that is as consistent
as possible with Clojure's core sequence functions.

So: the domain of powerset should be the same as the domain of seq.
The range of powerset should be a homogeneous sequence whose elements
all belong to the range of seq.

> What you can't do is say - look, I promised sequence but it's broken
> because (seq? nil) -> false.

Agreed. But I think it is reasonable to say - hey, this thing should
return a homogeneous sequence, but I can't find a label that
accurately captures the commonality of the elements as well as the
spirit of the problem. All the standard Clojure concepts and
predicates don't quite work, so this is an awkward decision.

But although I didn't see it at the time of my initial post, it seems
now that the crux of my problem is that we lack good labels and
predicates for the domain and range of seq, even though these are
things we use all the time. I was complaining that predicates like
seq?, sequential?, and list? didn't provide reasonable promises, when
in fact, these predicates don't match Clojure's internal promises
either. It just becomes harder to ignore these details when you deal
with nested sequences, and you have to figure out exactly what value
belongs inside.

Right now, a certain amount of handwaving seems fine. But as bigger
projects get written and Clojure, and as people develop automated
tools like QuickCheck to randomly generate test cases, and test
contracts, being able to specify these promises clearly starts to
matter.

I think the concepts I'm looking for would be something like this:
Domain of seq (called seqable).
(defn seqable? [s] (or nil (sequential? s)))
Range of seq (called sequence)
(defn sequence? [s] (or nil (seq? s)))

So, using this terminology, generalized powerset takes a seqable and
returns a sequence of sequences. Does this sound right?

But would people be able to remember the distinction between
similar-sounding seq?, sequential?, seqable?, and sequence? Probably
not.

> I'm not arguing against a convention here, and am amenable to your
> argument about the practicality of nil, which is covered by the
> promises "something on which (seq x) works" and "the result of seq".
>

Yes. Now that I've clarified my thought processes in terms of
matching the domain and range of seq, I'm all the more confident that
(nil (1) (2) (3) (1 2) (1 3) (2 3) (1 2 3)) is the "right choice" for
this particular example, and similar problems.

>> In discussions about generating a
>> sequence of all permutations, all combinations, etc., you inevitably
>> end up with arguments over issues like what (permutations nil) should
>> be (some say (nil), some say (()), and some say nil). If you have an
>> empty sequence, these issues go away.
>>
>
> Really? Don't you still have a problem with "some say (empty-seq),
> some say empty-seq" - why would that be different?

This is getting away from my main point, but since you asked, I'd like
to respond that no, you wouldn't have this problem with an empty
sequence. Here's why:
Standard definition of permutations of lists:
The permutations of a list S is a list of all the arrangements of S.
So, the permutations of the empty list is a list of all the
arrangements of the empty list.
The only arrangement of the empty list is the empty list.
Therefore, the permutations of the empty list is the list of the empty list.

This falls straight out of the definition (and also happens to be the
natural base case for a recursive definition). No ambiguity. But as
soon as you try to define this for Clojure sequences, you run into
problems. If you try to mirror the list-based definition, you'd
expect (permutations nil) to yield (nil) (and that's certainly how I'd
code it). However, it has been said repeatedly that "nil is not the
empty sequence... nil is nothing." So people can reasonably argue
that
(permutations nil) should be nil because it doesn't even make sense to
try to find the arrangements of "nothing". nil has three purposes in
Clojure: it serves as an empty-like base case (e.g., (empty? nil) is
true, and (cons 1 nil) is (1)), it serves as a false value, and it
serves as a Java-like null "nothing" value. This triple purpose
causes quite a bit of ambiguity that an empty sequence just doesn't
have. To answer the question what does (permutations nil) mean, we
first have to figure out which conception of nil applies here.

Rich Hickey

unread,
Jan 24, 2009, 6:46:49 PM1/24/09
to Clojure


On Jan 24, 5:00 pm, Mark Engelberg <mark.engelb...@gmail.com> wrote:
> On Sat, Jan 24, 2009 at 9:06 AM, Rich Hickey <richhic...@gmail.com> wrote:

> > You gotten tied up, here and elsewhere, conflating sequence with
> > container, and almost all of your problems stem from wanting to say
> > sequence and have it mean container. If you want to put the contents
> > of a (possibly empty) set into a sequential container, use a
> > sequential container - () or []. The sequential? predicate will cover
> > both of those as well as lazy sequences of set contents.
>
> There are a lot of different collections in Clojure (can the words
> collection and container be used interchangeably in Clojure?). But
> they are all connected in the sense that they all implement a
> list-like view of their contents.

That's not quite true. Sometimes a seq is made available by Clojure
for things not participating in Clojure, e.g. Strings.

> This is IMO one of the coolest
> things about Clojure. Because most of the core Clojure functions use
> this list-like view to manipulate collections, as soon as you use one
> of these functions, you get back something different. When I apply
> cons, lazy-cons, first, rest, map, filter, etc. to something like [1 2
> 3], #{1 2 3} or '(1 2 3), I get back something that acts like a list,
> but isn't necessarily a list. What should I call this general concept
> of what these functions return? I've been calling this general
> concept a sequence (which seems to be supported by the fact that these
> functions are listed in the sequence-in sequence-out section of the
> docs).

So does Clojure - so let's call them sequences.

> And yes, I also think of these non-specific sequences as
> collections, because regardless of how they are implemented, they act
> like lists (which are containers).

No, they don't, e.g. pop is the 'removal' function for lists:

(pop '(1))
-> ()

(pop (seq [1]))
-> ClassCastException

> How can I not think of (rest [1 2
> 3]) or (lazy-cons 2 #{3}) as both a sequence, as well as some sort of
> collection of 2 and 3? That's certainly how they act.
>

You can by avoiding any preconceived notion of what sequence ought to
mean. As similar as two things might be, they are free to be unalike
in some areas. Similar != same.

> So yes, I agree that I'm conflating sequence with container, but I'm
> still looking for a better way to think about and classify these
> inputs and outputs.

Sequence works fine as long as you don't read more into it.

> I'd certainly like to have a better
> classification, because as I've tried to illustrate, when you approach
> things with this viewpoint, the lack of an empty sequence feels a bit
> bizarre - an odd divergence in the way that sequences mirror lists.
> So maybe it all comes down to needing better terminology to discuss
> the inputs and outputs of Clojure's sequence functions...
>

Sequence works for me. They either return an ISeq or nothing (nil).

> When analyzing any function, we need to understand what its domain
> (set of inputs) and range (set of outputs) are. So what are the
> domain and range of Clojure's core sequence functions?
>
> The domain is the set of things on which (seq x) works. Is there a
> standard term for this (I've been using the term seq-able)?

Just about anything composite is theoretically "seq-able".

> Is there
> a built-in predicate to test this important classification?
>

No, and why should there be? I have never had a need for such a thing.
There was so much pressure for these darn predicates and yet they
really serve little purpose but to allow people to write less general
code than they should. There was a time when Strings weren't
"seqable", now they are. I'd much rather leave it at: the set of
things for which seq works is the set of things for which (seq x)
works. I have yet to need (seqable? x)

> The range is (I think) a proper subset of the domain. The outputs of
> Clojure's sequence functions are all seq-able, but certain things are
> missing from the set of outputs. You won't ever get back a vector,
> for example, or any empty collection. Is there a standard term for
> the range of seq?

Right now there isn't, it is implied by the range of rest - nil/ISeq.
I'm currently working on more generalization of this, but it won't
lead to a sentinel empty seq - sequence fns may just return "things
supporting seq/stream". In some ways that will mean the "sequence"
functions simply take/return the broadest possible notion of
collections, which they process sequentially.

> Is there a built-in predicate to test this
> important classification?
>

No, and why should there be? I'd rather it be the most general
possible notion. Right now it is (or (nil? x) (seq? x)), but as I
said, there is a potential future world in which the range would be
just - "things supporting seq and stream". I don't have a name for
that yet.

> > seq? tests if something implements ISeq, no more, no less - it is not
> > the defining predicate of sequence. Is that what you want to promise?
>
> > Or do you want to promise you'll return something on which (seq x)
> > works? It works for nil and () and [], and the user wouldn't look for
> > any one specifically.
>
> > Or do you want to promise that it returns the result of seq. It is
> > this notion that is most commonly meant by sequence in Clojure,
> > whether you like it or not.
>
> I think that the promise I want to make is one that is as consistent
> as possible with Clojure's core sequence functions.
>

Then just say sequence and use nil - there really is no problem or
ambiguity here.

> So: the domain of powerset should be the same as the domain of seq.
> The range of powerset should be a homogeneous sequence whose elements
> all belong to the range of seq.
>

Why should it be homogeneous? What does it add to promise that?

> > What you can't do is say - look, I promised sequence but it's broken
> > because (seq? nil) -> false.
>
> Agreed. But I think it is reasonable to say - hey, this thing should
> return a homogeneous sequence, but I can't find a label that
> accurately captures the commonality of the elements as well as the
> spirit of the problem. All the standard Clojure concepts and
> predicates don't quite work, so this is an awkward decision.
>

I think sequence works just fine. Your only problem with sequence was
one of not admitting nil as a member.

> But although I didn't see it at the time of my initial post, it seems
> now that the crux of my problem is that we lack good labels and
> predicates for the domain and range of seq, even though these are
> things we use all the time.

I disagree. People use the sequence functions and expect either nil or
something with a first.

> I was complaining that predicates like
> seq?, sequential?, and list? didn't provide reasonable promises, when
> in fact, these predicates don't match Clojure's internal promises
> either. It just becomes harder to ignore these details when you deal
> with nested sequences, and you have to figure out exactly what value
> belongs inside.
>

Predicates are inherently weak, as they try to put the world in boxes,
but the world is not clearly separable. The 'cool thing' you noted
about Clojure before is it really downplays the importance of concrete
types.

> Right now, a certain amount of handwaving seems fine. But as bigger
> projects get written and Clojure, and as people develop automated
> tools like QuickCheck to randomly generate test cases, and test
> contracts, being able to specify these promises clearly starts to
> matter.
>
> I think the concepts I'm looking for would be something like this:
> Domain of seq (called seqable).
> (defn seqable? [s] (or nil (sequential? s)))

This is wrong - maps are not sequential but are "seqable".

> Range of seq (called sequence)
> (defn sequence? [s] (or nil (seq? s)))
>
> So, using this terminology, generalized powerset takes a seqable and
> returns a sequence of sequences. Does this sound right?
>

Go all the way (and future-proof) - takes a seqable and returns a
seqable of seqables, where seqable means: (seq x) works. Note how the
utility is not in any way reduced, but the specificity has evaporated.
You are chasing something whose value is quite limited, at least in
Clojure.

> But would people be able to remember the distinction between
> similar-sounding seq?, sequential?, seqable?, and sequence? Probably
> not.
>

I recommend people avoid these predicates as much as possible. But
what they do is clearly defined. I don't intend to add seqable?, and
sequence?, at least as we've discussed here.

> > I'm not arguing against a convention here, and am amenable to your
> > argument about the practicality of nil, which is covered by the
> > promises "something on which (seq x) works" and "the result of seq".
>
> Yes. Now that I've clarified my thought processes in terms of
> matching the domain and range of seq, I'm all the more confident that
> (nil (1) (2) (3) (1 2) (1 3) (2 3) (1 2 3)) is the "right choice" for
> this particular example, and similar problems.
>

The domain of seq and range of rest. That really is simplest.

> nil has three purposes in
> Clojure: it serves as an empty-like base case (e.g., (empty? nil) is
> true, and (cons 1 nil) is (1)), it serves as a false value, and it
> serves as a Java-like null "nothing" value. This triple purpose
> causes quite a bit of ambiguity that an empty sequence just doesn't
> have.

I don't disagree. nil-punning might die for other reasons. But for
now, best to follow along with seq/rest.

Rich

e

unread,
Jan 24, 2009, 8:20:35 PM1/24/09
to clo...@googlegroups.com


I recommend people avoid these predicates as much as possible. But
what they do is clearly defined. I don't intend to add seqable?, and
sequence?, at least as we've discussed here.


This sounds right to me, too, but I wonder if my reasons are the same.  This isn't Eiffel.  There's no section for defining a precondition.  So I think it's not a matter of having to catch exceptions (I don't even know if you can do that in Clojure) since the predicates don't exist.  The caller just shouldn't send in non-sequables to things that take sequables.  It's not the callee's job to be paranoid and check input.  Ok, to make this work, though, the caller either needs access to the implementation or they need a well documented API . . . . or they need a good variable name to help the callee out.  If there is exception handling, the caller can chose whether or not to use it.
Reply all
Reply to author
Forward
0 new messages