I'd use map-when.
> (defn iterate-while [f x]
> (take-while identity (iterate f x)))
This one too.
It raises a question, though -- how much functionality should a
function provide to be worth making everyone who reads the code learn
the new vocabulary? I've written each of these inline when I've
needed them. Are they better as idioms or functions?
--Chouser
Nick nailed that one.
> The general question it raises for _me_ is this: why is such a basic,
> useful and generally applicable function like 'chunk not included in
> the core? Or 'random-element?
'chunk' is very similar to the built-in 'partition', the difference
being how the last group is handled.
user=> (partition 2 '[a b c d e])
((a b) (c d))
--Chouser
Those are good arguments. I guess what I fear is some of the reaction
I've had to reading Common Lisp code. The difference in feeling
between CL and Clojure is important to me because I tried repeatedly
and failed to adopt CL into my toolbox, but Clojure went quite
smoothly.
One of the things I found difficult with CL was the extremely large
number of builtin functions -- a large vocabulary with subtle
differences between nearly synonymous-sounding words. It meant that
at first glance, a particular block of could would look simple -- 5
lines, how hard could it be? But in reading it I would discover I
only new 2 of the 15 functions used. So I'd fight through it, feeling
like I was learning something useful, but the next function would use
15 completely different functions.
Now perhaps I'm just whining, perhaps I needed better reference tools,
perhaps I'm mis-identifying the problem. But it is a real source of
my hesitancy to ask for more built-in functions.
> For instance, I
> find myself needing to use map-map or merge-reduce fairly frequently,
> and the (reduce (fn [m [k v]] (assoc m k (f v))) {} m) type idioms
> they replace are much more difficult to read than calls to the
> utilities IMO.
Yes, that's a mouthful. I'll need to study merge-reduce and map-map a
bit more to see when to use them instead of merge, merge-with, and
into.
--Chouser
My comments below are of course highly influence by my personal
experiences using Clojure. I'm quite willing to admit that even if
I've never needed a function, it's certainly possible that everyone
else has.
On Mon, Jan 12, 2009 at 6:48 PM, Jason Wolfe <jaw...@berkeley.edu> wrote:
>
> ;;; Sequences.
>
> (defn combinations "Take a seq of seqs and return a lazy seq of
> ordered combinations (pick 1 from each seq)"
> [seqs]
> (if (empty? seqs) '(())
> (forcat [item (first seqs)]
> (map #(cons item %) (combinations (rest seqs))))))
I don't think I've ever needed this. When I've needed similar, I've
used 'for', but being a macro it can be harder to compose and requires
knowing the depth of the nesting ahead of time. I'm not aware of
anything other builtin that provides this kind of nesting behavior, so
I'd tend to say this pays for itself.
Almost identical to clojure.contrib.lazy-seqs/combinations
(require '[clojure.contrib.lazy-seqs :as ls])
user=> (combinations [[1 2] [4 5] [7 8]])
((1 4 7) (1 4 8) (1 5 7) (1 5 8) (2 4 7) (2 4 8) (2 5 7) (2 5 8))
user=> (ls/combinations [1 2] [4 5] [7 8])
([1 4 7] [1 4 8] [1 5 7] [1 5 8] [2 4 7] [2 4 8] [2 5 7] [2 5 8])
> (defn power-set
> "Returns a lazy seq of possible subvectors of seq s."
> [s]
> (loop [s (seq s), sets [[]]]
> (if s
> (recur (rest s) (lazy-cat sets (map #(conj % (first s))
> sets)))
> sets)))
Never needed it, but looks like a good candidate for clojure-contrib
I certainly wouldn't want to create it from scratch if I did need it.
> (defn random-permutation [s]
> "Return a random permutation of this seq."
> (let [arr (to-array s) len (alength arr)]
> (dotimes [i (dec len)]
> (let [r (+ i (rand-int (- len i))),
> prev (aget arr i)]
> (aset arr i (aget arr r))
> (aset arr r prev)))
> (seq arr)))
>
> (defn random-element [s]
> "Return a random element of this seq"
> (nth s (rand-int (count s))))
I wouldn't argue against those.
> (defn maximal-elements [f s]
> "Return a seq of elements of s maximizing (f elt)."
> (when (seq s)
> (loop [max-elts (first s),
> max-val (f (first s)),
> rest-elts (rest s)]
> (if (empty? rest-elts)
> max-elts
> (let [next-val (f (first rest-elts))]
> (cond (< next-val max-val) (recur max-elts max-val (rest rest-
> elts))
> (= next-val max-val) (recur (cons (first rest-elts) max-elts) max-
> val (rest rest-elts))
> (> next-val max-val) (recur [(first rest-elts)] next-val (rest rest-
> elts))))))))
I'm having a hard time imagining when maximal-elements would be
useful. What have you used it for? Looks like a good candidate for
application code. :-)
> (import '(java.util HashSet))
> (defn distinct-elts? "Are all of the elements of this sequence
> distinct? Works on infinite sequences with repititions, making it
> useful for, e.g., detecting cycles in graphs."
> [s]
> (let [hs (HashSet.)]
> (loop [s (seq s)]
> (cond (empty? s) true
> (.contains hs (first s)) false
> :else (do (.add hs (first s)) (recur (rest s)))))))
Is there any reason the builtin 'distinct?' couldn't handle these
cases as well? What does "elts" stand for?
> (defn concat-elts "Lazily concatenate elements of a seq of seqs." [s]
> (when (seq s) (lazy-cat (first s) (concat-elts (rest s)))))
>
> (defn lazy-mapcat "Like mapcat but is lazy like map" [f s]
> (concat-elts (map f s)))
>
> (defn map-when "Like map but discards logically false entries"
> [fn & seqs]
> (filter identity (apply map fn seqs)))
Sufficient laziness is another conversation entirely. :-)
> (defn iterate-while [f x]
> (take-while identity (iterate f x)))
As I said, I've written this out myself. If it was included I would
certainly use it. And as you pointed out, the words in its name mean
exactly what they mean in the body. I'd vote for this.
> (defn chunk "Lazily break s into chunks of length n (or less, for the
> final chunk)."
> [n s]
> (when (seq s)
> (lazy-cons (take n s)
> (chunk n (drop n s)))))
This is so close to 'partition', that I think I'd prefer there be only
one, or at least have related names.
> (defn mevery? "Like every but takes multiple seq args like map."
> ([f & seqs]
> (or (some empty? seqs)
> (and (apply f (map first seqs))
> (recur f (map rest seqs))))))
Is that the same as this?
(defn my-mevery? [f & seqs]
(every? identity (apply map f seqs)))
Someone suggested all the functions that currently take a predicate
and one seq ought to take multiple seqs and work like 'map'. I
suppose this would include: every? any? some etc. Seems pretty
convenient, and I've not heard arguments against it. But without
that, I think I'd rather use 'identity' and 'apply map' than have an
m-version of all the seq functions.
> ;;; Maps
>
> (defn map-map "Like map, but expects f to return pairs/map entries
> that are combined to make a map return value."
> [f & maps] (reduce #(conj %1 %2) {} (apply map f maps)))
As discussed, perhaps (into {} (map f ...)) is good enough.
> (defmacro lazy-get "Like get but lazy about evaluating the default
> value"
> [m k d]
> `(if-let [pair# (find ~m ~k)]
> (second pair#)
> ~d))
I've never needed this. How expensive is your default value?
> (defn safe-get "Like get but throws an exception if not found"
> [m k]
> (lazy-get m k (throw (IllegalArgumentException. (format "Key %s not
> found" k)))))
Ah, I see now. Unless you're storing nils in your map, this is the
same as (or (m k) (throw ...))
I think I'd rather this be a feature of the map itself, rather than
depend on each 'get' to be a 'safe-get'. Perhaps if 'get' finds
nothing and no default value is given, it could check the map's
metadata for a :default-value or :throw-unless-found item.
> (defn merge-agree "Like merge but returns nil if there are
> inconsistent key/value pairs."
> ([] {})
> ([map] map)
> ([m1 m2 & maps]
> (when (every? (fn [[k v]] (= v (get m1 k v))) m2)
> (apply merge-agree (merge m1 m2) maps))))
Maybe if it accepted the = function as a arg (like merge-with) it
would be a little more general?
> (defn merge-reduce "Combines maps, reducing sets of values with same
> key. Assumes nil value = not present. The first map entry must be a
> real map, but the remaining arguments can be seqs of map entries/key-
> value pairs."
> ([f ] {})
> ([f m1 & maps]
> (reduce (fn [m [k v]]
> (if-let [v2 (get m k)]
> (assoc m k (f v2 v))
> (assoc m k v)))
> m1
> (concat-elts maps))))
As discussed, this is like the builtin merge-with.
> (defn prln "Print all arguments and return the first argument"
> [& args] (do (println (apply print-str args)) (first args)))
You use this for debugging?
> (defn xor [& args]
> (odd? (count (filter identity args))))
Never needed it.
> (defmacro forcat
> "Like for, but concatenates the results."
> [& args] `(concat-elts (for ~@args)))
Maybe instead of adding 'forcat' for symmetry with 'mapcat', we should
ask for 'mapcat' to be removed.
--Chouser
Right on the first try! :-)
user=> (seq-chunk 2 [1 2 3 4 5])
((1 2) (3 4) (5))
user=> (seq?-chunk 2 [1 2 3 4 5])
nil
This is because a vector is not itself a seq, though it is seq-able.
Thus 'seq?' returns false, which 'seq' returns a sequence as long as
the vector is not empty.
> 2. My intuition is wrong, and 'seq? is not more efficient than 'seq.
Both are usually very fast -- fast enough to not worry about them, and
certainly fast enough that you should use the correct one rather than
the fast one. :-)
However, for the record, depending on the type of object being
examined, either may be faster. 100,000 times each, fastest to
slowest:
(seq '(1 2 3)))) ==> "Elapsed time: 8.086614 msecs"
(seq? '(1 2 3)))) ==> "Elapsed time: 11.290486 msecs"
(seq? [1 2 3]))) ==> "Elapsed time: 19.127055 msecs"
(seq [1 2 3]))) ==> "Elapsed time: 20.471575 msecs"
--Chouser
If you don't know, you can always just ask the language :)
user> (ancestors (class [1 2 3]))
#{clojure.lang.IPersistentStack clojure.lang.Streamable
java.io.Serializable java.lang.Runnable clojure.lang.IFn
clojure.lang.IObj clojure.lang.Associative clojure.lang.Reversible
clojure.lang.Obj clojure.lang.Sequential java.util.RandomAccess
java.util.List clojure.lang.IPersistentVector
clojure.lang.APersistentVector clojure.lang.AFn java.lang.Object
clojure.lang.IPersistentCollection java.lang.Iterable
java.lang.Comparable java.util.Collection java.util.concurrent.Callable}
user> (contains? (ancestors (class [1 2 3])) clojure.lang.ISeq)
false
But, I agree; the seq/coll distinction can be very confusing at first,
especially when faced with seemingly contradictory outputs like
user> (= '(1) [1])
true
user> (= '() [])
false
user> {'(1) true}
{(1) true}
user> (get *1 [1])
true
user> (hash-map '(1) true)
{(1) true}
user> (get *1 [1])
nil
-Jason
Also there's 'isa?':
user=> (isa? (class [1 2 3]) clojure.lang.ISeq)
false
user=> (isa? (class (seq [1 2 3])) clojure.lang.ISeq)
true
And 'instance?':
user=> (instance? clojure.lang.ISeq [1 2 3])
false
user=> (instance? clojure.lang.ISeq (seq [1 2 3]))
true
> But, I agree; the seq/coll distinction can be very confusing at first,
> especially when faced with seemingly contradictory outputs like
>
> user> (= '(1) [1])
> true
>
> user> (= '() [])
> false
Hm. That does seem rather odd. I wonder if that paper defining
'egal' address this kind of issue. I haven't read it yet -- perhaps
the time has come.
> user> {'(1) true}
> {(1) true}
>
> user> (get *1 [1])
> true
>
> user> (hash-map '(1) true)
> {(1) true}
>
> user> (get *1 [1])
> nil
The different map types use different kinds of equality, and not all
of these are even defined for all object types: Hash-maps use hash
functions, sorted-maps use a comparator, and array-maps use =
--Chouser
Oh, I always assumed this was intentional ... I guess I never tried
switching the order of arguments. Well, that makes a bit more sense
then :).
Good idea!
>> > (defn concat-elts "Lazily concatenate elements of a seq of seqs." [s]
>> > (when (seq s) (lazy-cat (first s) (concat-elts (rest s)))))
>>
>> > (defn lazy-mapcat "Like mapcat but is lazy like map" [f s]
>> > (concat-elts (map f s)))
>
> Yes. I would be happy to drop concat-elts and lazy-mapcat, if the
> built-in concat (which seems to be used by mapcat) was changed to be
> less eager (in the apply sense just mentioned above).
>
> user> (take 0 (apply concat (report-seq "" [[1] [2] [3] [4] [5]])))
> (first [1] )
> (rest after [1] )
> (first [2] )
> (rest after [2] )
> (first [3] )
> (rest after [3] )
> nil
>
> This can end up being a real drag in situations like (apply concat
> (map #(recurive-call ...) ...)), since it may result in an exponential
> amount of extra work being done. Any chance of this being changed?
> (It should be possible to get rid of the last 4 evaluations here,
> without getting into the territory of our last conversation about
> laziness :))
I don't see why the built-in concat couldn't be defined like yours.
>> > (defn chunk "Lazily break s into chunks of length n (or less, for the
>> > final chunk)."
>> > [n s]
>> > (when (seq s)
>> > (lazy-cons (take n s)
>> > (chunk n (drop n s)))))
>>
>> This is so close to 'partition', that I think I'd prefer there be only
>> one, or at least have related names.
>
> OK, I agree the name should be closer to partition (which I'd
> forgotten about), but I think this one is at least as useful as
> partition (You can recover partition from this one by (take-when #(=
> (count %) n) (chunk n s))) but not vice-versa.) Any suggestions on a
> name?
Perhaps 'partition-all' ?
I'd vote to have this in contrib.
>> (defn my-mevery? [f & seqs]
>> (every? identity (apply map f seqs)))
>
> Yes, I believe so ... that's a much nicer way of writing it.
>
>> Someone suggested all the functions that currently take a predicate
>> and one seq ought to take multiple seqs and work like 'map'. I
>> suppose this would include: every? any? some etc. Seems pretty
>> convenient, and I've not heard arguments against it. But without
>> that, I think I'd rather use 'identity' and 'apply map' than have an
>> m-version of all the seq functions.
>
> Yes, I would of course prefer that the built-in versions take multiple
> seqs and work like map.
> If not, I guess I agree with you that it's better not to create m-
> versions of everything.
These could be changed to act like 'my-mevery?' above. Is this a
complete list? --> every? not-every? some not-any?
I believe functions that return the values from the input seq are NOT
candidates for this? That includes:
filter remove take-while drop-while split-with
Instead the first three might suggest corollaries:
map-when map-when-not map-while
>> I think I'd rather this be a feature of the map itself, rather than
>> depend on each 'get' to be a 'safe-get'. Perhaps if 'get' finds
>> nothing and no default value is given, it could check the map's
>> metadata for a :default-value or :throw-unless-found item.
>
> I think I'd be equally happy with "safe maps" as you suggest as using
> safe-get. But, one or the other is definitely useful ... I've been
> bitten several times by misspelled keys, which can create difficult-to-
> find bugs.
I know people have asked previously for maps that know their own
default value. I suppose the metadata could be supplied directly:
#^{:default "not found"} (hash-map :a 1, :b 2)
That's a bit ugly, but you could use it on {} literals. Otherwise I
guess you'd need extra constructors.
(hash-map-default "not found" :a 1, :b 2)
(sorted-map-throw :a 1, :b 2)
(safe-array-map :a 1, :b 2)
Hm. Is it worth it? Any naming ideas?
>> > (defn merge-agree "Like merge but returns nil if there are
>> > inconsistent key/value pairs."
>> > ([] {})
>> > ([map] map)
>> > ([m1 m2 & maps]
>> > (when (every? (fn [[k v]] (= v (get m1 k v))) m2)
>> > (apply merge-agree (merge m1 m2) maps))))
>>
>> Maybe if it accepted the = function as a arg (like merge-with) it
>> would be a little more general?
>
> Well, then which value would you use (if two values are my-equal but
> not =)?
Oh. Good point. I guess I was thinking it would be like
'merge-with', but if the conflict resolution function returned nil or
false, you'd get back a nil instead of a map.
> With merge-with, I think I would have to say
> (apply merge-with + accounts (map (fn [[k v]] (hash-map k v)) (concat
> deposits withdrawls)))
>
> which is considerably uglier and less clear IMO. Any chance of
> changing merge-with in this way?
Yes, it would just require replacing a use of 'key' and 'value' with
'first' and 'second' in the definition of 'merge-with'. I can't think
of a good reason not to do this.
--Chouser
OK, great. I then ask that the core concat be changed to
(defn my-concat [& args]
(when (seq args)
(lazy-cat (first args) (my-concat (rest args)))))
which has more or less the desired effect:
user> (take 0 (apply my-concat (report-seq "" [[1] [2] [3] [4]])))
(first [1] )
(rest after [1] )
nil
Actually, this is still slightly more eager than concat-elts:
user> (take 0 (concat-elts (report-seq "" [[1] [2] [3] [4]])))
(first [1] )
nil
which seems to be due to the definition of apply. Why does the rest
get evaluated above?
>>>> (defn chunk "Lazily break s into chunks of length n (or less, for
>>>> the
>>>> final chunk)."
>>>> [n s]
>>>> (when (seq s)
>>>> (lazy-cons (take n s)
>>>> (chunk n (drop n s)))))
>>>
>>> This is so close to 'partition', that I think I'd prefer there be
>>> only
>>> one, or at least have related names.
>>
>> OK, I agree the name should be closer to partition (which I'd
>> forgotten about), but I think this one is at least as useful as
>> partition (You can recover partition from this one by (take-when #(=
>> (count %) n) (chunk n s))) but not vice-versa.) Any suggestions
>> on a
>> name?
>
> Perhaps 'partition-all' ?
> I'd vote to have this in contrib.
Works for me, thanks.
>>> (defn my-mevery? [f & seqs]
>>> (every? identity (apply map f seqs)))
>>
> These could be changed to act like 'my-mevery?' above. Is this a
> complete list? --> every? not-every? some not-any?
Sounds perfect ... I think that's all of them.
> I believe functions that return the values from the input seq are NOT
> candidates for this? That includes:
> filter remove take-while drop-while split-with
>
> Instead the first three might suggest corollaries:
> map-when map-when-not map-while
Perfect.
>>> I think I'd rather this be a feature of the map itself, rather than
>>> depend on each 'get' to be a 'safe-get'. Perhaps if 'get' finds
>>> nothing and no default value is given, it could check the map's
>>> metadata for a :default-value or :throw-unless-found item.
>>
>> I think I'd be equally happy with "safe maps" as you suggest as using
>> safe-get. But, one or the other is definitely useful ... I've been
>> bitten several times by misspelled keys, which can create difficult-
>> to-
>> find bugs.
>
> I know people have asked previously for maps that know their own
> default value. I suppose the metadata could be supplied directly:
>
> #^{:default "not found"} (hash-map :a 1, :b 2)
>
> That's a bit ugly, but you could use it on {} literals. Otherwise I
> guess you'd need extra constructors.
>
> (hash-map-default "not found" :a 1, :b 2)
> (sorted-map-throw :a 1, :b 2)
> (safe-array-map :a 1, :b 2)
>
> Hm. Is it worth it? Any naming ideas?
Well, I've thought about this a bit more.
I guess I'm a bit wary about your proposal, since as far as I know
none of the existing core functions (except metay-things like ns,
doc, ..) have their behavior changed by (especially, explicitly user-
set) metadata. (Well, to be more precise, I don't know much of
anything about how metadata is used by the core). Right now, I use
metadata in a couple places in my code to store information about
objects that I don't want to affect their equality semantics. If I
could accidentally clobber metadata used internally by Clojure in
doing so, that could be a bad thing. Is there a list somewhere of
"reserved metadata keywords" that I shouldn't be using in this way?
If not, after adding this it seems one would definitely be needed.
On the other hand, part of me still likes safe-get, since the meaning
is clear in the code: I will get the object associated with this key,
or die trying. I may want to write a function that takes an arbitrary
map and safe-gets some specific key, regardless of where the map came
from. Or, I might like to sometimes "safe-get" and sometimes just
"get" from the same map (i.e., if there are some required and some
optional keys). Basically, it seems to me that most of the time,
whether you want get or safe-get is a function of the *use* of the map
and not the map itself.
That being said, I guess there may be times when you really want the
map itself to be safe (e.g., structs-maps?) or have a particular
default value (come to think of it, I think I wanted this just
yesterday). So maybe, both couldn't hurt? The names you give seem
fine, or maybe (hash-map-no-default ...) or (hash-map-safe ...) and
(hash-map-with-default def ...).
>>>> (defn merge-agree "Like merge but returns nil if there are
>>>> inconsistent key/value pairs."
>>>> ([] {})
>>>> ([map] map)
>>>> ([m1 m2 & maps]
>>>> (when (every? (fn [[k v]] (= v (get m1 k v))) m2)
>>>> (apply merge-agree (merge m1 m2) maps))))
>>>
>>> Maybe if it accepted the = function as a arg (like merge-with) it
>>> would be a little more general?
>>
>> Well, then which value would you use (if two values are my-equal but
>> not =)?
>
> Oh. Good point. I guess I was thinking it would be like
> 'merge-with', but if the conflict resolution function returned nil or
> false, you'd get back a nil instead of a map.
Hmmm, but sometimes I like to use nil/false values ... anyway, I'll
agree that this utility has fairly narrow use cases and perhaps isn't
worth putting in contrib.
>> With merge-with, I think I would have to say
>> (apply merge-with + accounts (map (fn [[k v]] (hash-map k v)) (concat
>> deposits withdrawls)))
>>
>> which is considerably uglier and less clear IMO. Any chance of
>> changing merge-with in this way?
>
> Yes, it would just require replacing a use of 'key' and 'value' with
> 'first' and 'second' in the definition of 'merge-with'. I can't think
> of a good reason not to do this.
>
That would be perfect.
-Jason
I'm sorry, Jason, I really thought I answered this the first time you
asked. But I can't find any such answer in the archives, so I must
have been mistaken.
> I can try to make patches for the changes in core, and/or improve
> and document other utilities for clojure.contrib, if that would be
> helpful.
I think it might be most useful to split up the changes into smaller
chunks, perhaps as small as a single function each, or maybe a couple
of related functions together. Each of these chunks can be posted as
a feature-request on the issues page. This would allow any further
discussion of implementation details and any patches to be collected,
and would allow Rich to accept or reject each chunk independently.
--Chouser
No worries, thanks for all your help with this.
>> I can try to make patches for the changes in core, and/or improve
>> and document other utilities for clojure.contrib, if that would be
>> helpful.
>
> I think it might be most useful to split up the changes into smaller
> chunks, perhaps as small as a single function each, or maybe a couple
> of related functions together. Each of these chunks can be posted as
> a feature-request on the issues page. This would allow any further
> discussion of implementation details and any patches to be collected,
> and would allow Rich to accept or reject each chunk independently.
OK, I'll get on this then. It this just for changes to core, or
should I post proposed functions for contrib there/ on the contrib
issues page too? If not, what should I do with them?
Thanks!
Jason
OK, I'll get on this then. It this just for changes to core, or
should I post proposed functions for contrib there/ on the contrib
issues page too? If not, what should I do with them?
I agree.
> My understanding of the issue policy for Clojure is that Rich would still
> like to approve them either here or on #clojure (irc) before they're
> entered. (ref: his recent posting on the topic.) I'm not aware of whether or
> not he has approved entering issue(s) in this case.
That's a very interesting point. My impression has been that lack of
objection from him here (or on IRC) is sufficient approval to post
something on the issues page. If he was completely opposed to these
proposals, he's had several days to make his opinion known.
Of course that's no guarantee that the issue or any particular patch
will be approved, but it makes sure that issues and proposed patches
aren't lost in archives.
Hopefully Rich will clarify his wishes on this.
--Chouser
Thanks Rich. But, I think this answers only one of the questions at
hand (about clojure.contrib issues). The other question (to which I
think Chouser was referring above) was about issues for Clojure core,
and whether or not an explicit sign-off from you was desired before
they are posted there. This came up in this thread since after
discussing with Chouser, several of my utilities seemed better-suited
as changes/additions to Clojure core rather than contrib.
-Jason
I didn't find any of them compelling enough for core just yet. Putting
them in contrib first lets people try them out and refine them, point
out redundancies and conflicts etc.
As a general rule I haven't pulled many simple combining functions
from contrib, as they just pollute the namespaces. Plus, I don't think
things are as cut and dried as you make out, for instance I'd expect
map-when and iterate-while to take predicates.
Rich