Here is one possible implementation of a powerset function. To stay
true to the spirit of the above definition, I've written it in a way
that manipulates the sets in set form as much as possible (not quite
an easy task since things like rest and map return sequences, not
sets).
(defn powerset [s]
(if (empty? s) #{#{}}
(let [element (first s),
rest-s (disj s element),
powerset-rest-s (powerset rest-s)]
(clojure.set/union powerset-rest-s
(into #{} (map #(conj % element) powerset-rest-s))))))
Example: (powerset #{1 2 3}) prints as #{#{} #{1} #{2} #{3} #{1 2} #{1
3} #{2 3} #{1 2 3}}
Now, here's the puzzle. Let's say you want to convert this idea over
to working with lists, or perhaps sequences in general.
Should (powerset '(1 2 3)) print as:
(() (1) (2) (3) (1 2) (1 3) (2 3) (1 2 3))
or
(nil (1) (2) (3) (1 2) (1 3) (2 3) (1 2 3))
I can think of several excellent arguments for and against each
option. I'd like to know whether the Clojure community is as
conflicted on this point as I am. I think that the way you reason
about this question says a lot about how you view the relationship
between lists and sequences and the role of nil and the empty list, so
I look forward to hearing the responses. Let the debating begin!
> Now, here's the puzzle. Let's say you want to convert this idea over
> to working with lists, or perhaps sequences in general.
>
> Should (powerset '(1 2 3)) print as:
> (() (1) (2) (3) (1 2) (1 3) (2 3) (1 2 3))
> or
> (nil (1) (2) (3) (1 2) (1 3) (2 3) (1 2 3))
My vote is for (), because it specifically represents the empty
subset of a list, rather than a generic "nothing in here" value,
which is my interpretation of nil.
> between lists and sequences and the role of nil and the empty list, so
> I look forward to hearing the responses. Let the debating begin!
This reminds me of the long debates in the APL community about the
shapes of empty nested arrays :-)
Konrad.
http://groups.google.com/group/clojure/msg/32d323e15f8ce624
about the proper value of:
(clojure.contrib.lazy-seqs/combinations)
(and perhaps by extension (clojure.contrib.lazy-seqs/combinations '())
(or any number of empty seq arguments))
Does a decision to use empty collections in the powerset case fit well
with the suggestion that
(clojure.contrib.lazy-seqs/combinations) return ([])
or are they in conflict or unrelated?
--Steve
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> To unsubscribe from this group, send email to clojure+u...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/clojure?hl=en
> -~----------~----~----~----~------~----~------~--~---
>
sented with the empty set, which doesn't seem right.
For sequences in general, Rich has said there is no such thing as an
empty seq:
http://groups.google.com/group/clojure/browse_thread/thread/966bd0d4bb18a4a2/b56470cbc8b4123e?lnk=raot&fwc=1
One way to think about this, is to think about what the statement of
purpose / contract of a generalized powerset function would be. In
general, most functions on seq-able objects return sequences. For
example, (rest [1 2 3]) yields (2 3). So we'd want to make powerset
consistent with Clojure's general behavior. Also, like most Clojure
core functions, we'd expect sequences in the output to be lazy. I
think there are two basic lines of reasoning:
1. The most straightforward generalization is:
The powerset of a seq-able S is the (lazy) sequence of all (lazy)
subsequences of (seq S).
Thinking in this vein, you might figure that
(powerset [1 2 3]) should be (nil (1) (2) (3) (1 2) (1 3) (2 3) (1 2 3)).
One way to rationalize it is that we really need to have an empty
sequence in the sequence of answers, but there is no such thing as an
empty sequence, so we must use nil.
But the obvious flaw here is that this output violates our contract.
We said we'd generate a sequence of sequences, but (seq? nil) is
false. nil is not the empty sequence, nor is it a sequence at all, so
we haven't accurately captured the function's intention with this
output.
2. So one way to try to "fix" this problem is to put a concrete empty
object in the output sequence. Since sequences are inspired by lists,
perhaps it makes sense to replace the nil with the empty list (). So,
(powerset [1 2 3]) yields (() (1) (2) (3) (1 2) (1 3) (2 3) (1 2 3)).
But this has problems as well. First, even though the result prints
as a list of lists, it is important to remember that these things are
not lists, but sequences derived from vectors (since the input is a
vector in our example). For example, (cons 1 [2 3]) prints as (1 2
3), but (list? (cons 1 [2 3])) is false.
So it's difficult to even imagine what the contract would be for such
a function. Perhaps, "The powerset of a seq-able S is the (lazy)
sequence of the empty list and all the (lazy) subsequences of (seq
S)." would be one way to word the statement of purpose. But if the
input is a vector, it seems really odd to see an empty list in the
output. Why a list?
So I think that the answer is that there is no good answer. We really
need an empty sequence here, and Clojure doesn't have such a notion.
Clojure's choice to not have an empty sequence doesn't really cause
any inconvenience when dealing with flat sequences, but I have found
that it results in these sorts of dilemmas almost any time you deal
with sequences of sequences. In discussions about generating a
sequence of all permutations, all combinations, etc., you inevitably
end up with arguments over issues like what (permutations nil) should
be (some say (nil), some say (()), and some say nil). If you have an
empty sequence, these issues go away.
So ultimately, you just have to make a choice between nil and () and
go with it. It all comes down to a gut feeling about which one is a
better stand-in for the notion of an empty sequence, even though
neither one truly is the empty sequence. In practice, it doesn't seem
to matter too much which choice you make. nil and () are almost
entirely interchangeable.
(cons 1 nil) is the actual concrete list (1), just as (cons 1 '()) is.
(conj nil 1) is the same as (conj '() 1)
(first nil) and (first '()) both yield nil.
(rest nil) and (rest '()) both yield nil.
(empty? nil) and (empty? '()) both yield true
(seq nil) and (seq '()) both yield nil.
There are certainly a few differences, of course. nil and '() respond
differently to nil?, list? and not, for example. But I haven't found
these sorts of tests to be common in code operating on nested
sequences.
Even though it doesn't make a huge difference, the lack of a clearcut
correct choice means that people will make different choices in
different codebases, which could prove problematic.
I have seen Rich's talk in which he explains that he chose to not have
an empty sequence because which empty sequence should it be? Empty
list? Empty vector? Why should one empty object be privileged? But
this reasoning never resonated with me. Sequences and lists have a
special connection. Sequences are intended to be a list-like
interface. So any sequence-producing function produces either a
concrete list, or something that looks, feels, and tastes like a list.
I've already gotten used to seeing (rest [1 2 3]) producing (2 3) in
the REPL (even though technically it's producing something list-like
and not actually producing a list). So to me, it would be perfectly
natural to see (rest [1]) producing (). () could reasonably serve as
both the empty list and the empty sequence since lists and sequences
are so closely related.
I assume that a design decision as fundamental as the choice to have
no empty sequence is probably set in stone at this point, so I think
it's safe to assume that this dilemma is here to stay. At the moment,
I intend to advocate the use of nil over () as the closest surrogate
for an empty sequence. Clojure is designed to not give one empty
collection privileged status, so putting () inside a sequence of
sequences feels more wrong to me. Furthermore, including nil is often
more practical to code. None of Clojure's built-ins actually return
(), so it's only feasible to have () represent the empty sequence if
it's the result of a separately-coded base case. For many algorithms
where the sequence is produced by mapping a function over a bunch of
values, or by 'rest'ing a sequence until reaching empty, the algorithm
will most naturally include nil in the sequence. So for consistency,
it's probably better to always choose nil over () in sequences of
sequences.
Anyone want to try to convince me otherwise? If you think my logic is
flawed, I'd love to hear your take on this problem.
> Anyone want to try to convince me otherwise? If you think my logic is
> flawed, I'd love to hear your take on this problem.
It's not a question of logic but of priorities. As you explained very
well, there is no obviously ideal solution. So what you look for is
the "best" compromise.
One question I would ask in such a situation is: what are the typical
use cases for a power set? Which choice is most practical? I can't
answer it because I have never needed power sets, but perhaps you can
give some examples.
Konrad.
Despite the name "powerset" being inspired by sets, when I posed the
puzzle of generalizing the behavior to sequences, I meant:
(powerset [1 2 1]) to mean something like (nil (1) (2) (1) (1 2) (1 1)
(2 1) (1 2 1))
Because sets are the only things that care about uniqueness, this is
how I'd expect sequences to behave. I didn't specify this behavior
clearly, and I can see why my open-ended question could cause
confusion. But I don't consider this to be essential to my main
points, though.
> You gotten tied up, here and elsewhere, conflating sequence with
> container, and almost all of your problems stem from wanting to say
> sequence and have it mean container. If you want to put the contents
> of a (possibly empty) set into a sequential container, use a
> sequential container - () or []. The sequential? predicate will cover
> both of those as well as lazy sequences of set contents.
There are a lot of different collections in Clojure (can the words
collection and container be used interchangeably in Clojure?). But
they are all connected in the sense that they all implement a
list-like view of their contents. This is IMO one of the coolest
things about Clojure. Because most of the core Clojure functions use
this list-like view to manipulate collections, as soon as you use one
of these functions, you get back something different. When I apply
cons, lazy-cons, first, rest, map, filter, etc. to something like [1 2
3], #{1 2 3} or '(1 2 3), I get back something that acts like a list,
but isn't necessarily a list. What should I call this general concept
of what these functions return? I've been calling this general
concept a sequence (which seems to be supported by the fact that these
functions are listed in the sequence-in sequence-out section of the
docs). And yes, I also think of these non-specific sequences as
collections, because regardless of how they are implemented, they act
like lists (which are containers). How can I not think of (rest [1 2
3]) or (lazy-cons 2 #{3}) as both a sequence, as well as some sort of
collection of 2 and 3? That's certainly how they act.
So yes, I agree that I'm conflating sequence with container, but I'm
still looking for a better way to think about and classify these
inputs and outputs. I'd certainly like to have a better
classification, because as I've tried to illustrate, when you approach
things with this viewpoint, the lack of an empty sequence feels a bit
bizarre - an odd divergence in the way that sequences mirror lists.
So maybe it all comes down to needing better terminology to discuss
the inputs and outputs of Clojure's sequence functions...
When analyzing any function, we need to understand what its domain
(set of inputs) and range (set of outputs) are. So what are the
domain and range of Clojure's core sequence functions?
The domain is the set of things on which (seq x) works. Is there a
standard term for this (I've been using the term seq-able)? Is there
a built-in predicate to test this important classification?
The range is (I think) a proper subset of the domain. The outputs of
Clojure's sequence functions are all seq-able, but certain things are
missing from the set of outputs. You won't ever get back a vector,
for example, or any empty collection. Is there a standard term for
the range of seq? Is there a built-in predicate to test this
important classification?
> seq? tests if something implements ISeq, no more, no less - it is not
> the defining predicate of sequence. Is that what you want to promise?
>
> Or do you want to promise you'll return something on which (seq x)
> works? It works for nil and () and [], and the user wouldn't look for
> any one specifically.
>
> Or do you want to promise that it returns the result of seq. It is
> this notion that is most commonly meant by sequence in Clojure,
> whether you like it or not.
I think that the promise I want to make is one that is as consistent
as possible with Clojure's core sequence functions.
So: the domain of powerset should be the same as the domain of seq.
The range of powerset should be a homogeneous sequence whose elements
all belong to the range of seq.
> What you can't do is say - look, I promised sequence but it's broken
> because (seq? nil) -> false.
Agreed. But I think it is reasonable to say - hey, this thing should
return a homogeneous sequence, but I can't find a label that
accurately captures the commonality of the elements as well as the
spirit of the problem. All the standard Clojure concepts and
predicates don't quite work, so this is an awkward decision.
But although I didn't see it at the time of my initial post, it seems
now that the crux of my problem is that we lack good labels and
predicates for the domain and range of seq, even though these are
things we use all the time. I was complaining that predicates like
seq?, sequential?, and list? didn't provide reasonable promises, when
in fact, these predicates don't match Clojure's internal promises
either. It just becomes harder to ignore these details when you deal
with nested sequences, and you have to figure out exactly what value
belongs inside.
Right now, a certain amount of handwaving seems fine. But as bigger
projects get written and Clojure, and as people develop automated
tools like QuickCheck to randomly generate test cases, and test
contracts, being able to specify these promises clearly starts to
matter.
I think the concepts I'm looking for would be something like this:
Domain of seq (called seqable).
(defn seqable? [s] (or nil (sequential? s)))
Range of seq (called sequence)
(defn sequence? [s] (or nil (seq? s)))
So, using this terminology, generalized powerset takes a seqable and
returns a sequence of sequences. Does this sound right?
But would people be able to remember the distinction between
similar-sounding seq?, sequential?, seqable?, and sequence? Probably
not.
> I'm not arguing against a convention here, and am amenable to your
> argument about the practicality of nil, which is covered by the
> promises "something on which (seq x) works" and "the result of seq".
>
Yes. Now that I've clarified my thought processes in terms of
matching the domain and range of seq, I'm all the more confident that
(nil (1) (2) (3) (1 2) (1 3) (2 3) (1 2 3)) is the "right choice" for
this particular example, and similar problems.
>> In discussions about generating a
>> sequence of all permutations, all combinations, etc., you inevitably
>> end up with arguments over issues like what (permutations nil) should
>> be (some say (nil), some say (()), and some say nil). If you have an
>> empty sequence, these issues go away.
>>
>
> Really? Don't you still have a problem with "some say (empty-seq),
> some say empty-seq" - why would that be different?
This is getting away from my main point, but since you asked, I'd like
to respond that no, you wouldn't have this problem with an empty
sequence. Here's why:
Standard definition of permutations of lists:
The permutations of a list S is a list of all the arrangements of S.
So, the permutations of the empty list is a list of all the
arrangements of the empty list.
The only arrangement of the empty list is the empty list.
Therefore, the permutations of the empty list is the list of the empty list.
This falls straight out of the definition (and also happens to be the
natural base case for a recursive definition). No ambiguity. But as
soon as you try to define this for Clojure sequences, you run into
problems. If you try to mirror the list-based definition, you'd
expect (permutations nil) to yield (nil) (and that's certainly how I'd
code it). However, it has been said repeatedly that "nil is not the
empty sequence... nil is nothing." So people can reasonably argue
that
(permutations nil) should be nil because it doesn't even make sense to
try to find the arrangements of "nothing". nil has three purposes in
Clojure: it serves as an empty-like base case (e.g., (empty? nil) is
true, and (cons 1 nil) is (1)), it serves as a false value, and it
serves as a Java-like null "nothing" value. This triple purpose
causes quite a bit of ambiguity that an empty sequence just doesn't
have. To answer the question what does (permutations nil) mean, we
first have to figure out which conception of nil applies here.
I recommend people avoid these predicates as much as possible. But
what they do is clearly defined. I don't intend to add seqable?, and
sequence?, at least as we've discussed here.