I'm pretty sure what he wants is for the intersection to be "as many
1s as are common to both lists; as many 2s as are common to both
lists; as many 3s ...", so [1 2 3 3 3 4 5] intersect [1 1 3 3 4 6]
would be [1 3 3 4], for instance.
For that sort of thing, what you really want is a "bag" datatype (an
unordered collection with duplicates that is stored and searched more
efficiently than a vector or a list), but here's quick-and-dirty
intersect that does what he wants on anything seqable:
(defn intersect [s1 s2]
(let [f1 (frequencies s1)
f2 (frequencies s2)
d1 (apply dissoc f1 (keys f2))
d2 (apply dissoc f2 (keys f1))
f1 (apply dissoc f1 (keys d1))
f2 (apply dissoc f2 (keys d2))]
(mapcat
(fn [[k v]]
(repeat v k))
(merge-with min f1 f2))))
user=> (intersect [1 2 3 3 3 4 5] [1 1 3 3 4 6])
(1 3 3 4)
user=> (apply str
(intersect
"the quick brown fox jumped over the lazy dog"
"it was the best of times, it was the worst of times"))
" abeeeefhhimooorttw"
(defprotocol SetOps
(disjoin* [this obj])
(has* [this obj])
(total [this obj])
(counts [this]))
(defn disjoin
([s] s)
([s obj]
(disjoin* s obj))
([s obj & more]
(apply disjoin (disjoin s obj) more)))
(defn difference
([s] s)
([s other]
(apply disjoin s other))
([s other & more]
(apply difference (difference s other) more)))
(defn intersection
([s] s)
([s other]
(difference s (difference s other)))
([s other & more]
(apply intersection (intersection s other) more)))
(defn union
([s] s)
([s other]
(if (empty? other) s (apply conj s other)))
([s other & more]
(apply union (union s other) more)))
(defn has?
([s] true)
([s obj]
(has* s obj))
([s obj & more]
(and (has? s obj) (apply has? s more))))
(extend-type clojure.lang.IPersistentSet
SetOps
(disjoin* [this obj] (disj this obj))
(has* [this obj] (contains? this obj))
(total [this obj] (if (contains? this obj) 1 0))
(counts [this] (zipmap this (repeat 1))))
(deftype Bag [m c]
clojure.lang.IObj
(withMeta [this, mtd]
(Bag. (with-meta m mtd) c))
(meta [this] (meta m))
clojure.lang.IPersistentCollection
(count [this] c)
(empty [this] (Bag. {} 0))
(equiv [this other] (= (seq this) other))
(seq [this]
(mapcat
(fn [[k v]]
(repeat v k))
m))
(cons [this obj]
(Bag. (merge-with + m {obj 1}) (inc c)))
SetOps
(disjoin* [this obj]
(if-let [n (m obj)]
(if (= 1 n)
(Bag. (dissoc m obj) (dec c))
(Bag. (assoc m obj (dec n)) (dec c)))
this))
(has* [this obj] (contains? m obj))
(total [this obj] (m obj 0))
(counts [this] m)
Object
(toString [this]
(apply str
(interpose " "
(seq this)))))
(defn bag [& objects]
(Bag. (frequencies objects) (count objects)))
user=> (difference (bag 1 6 3 6 8 4 5 6 6 9 9) (bag 9 9 6 8 1))
#<Bag 6 6 6 3 4 5>
The difference, union, disjoin, intersection etc. functions are also
made to work on Clojure sets, and via extend-type or extend-protocol
can be extended to other types. (I didn't make Bag implement
IPersistentSet because I considered nonduplication to be part of the
latter's contract. So SetOps is a broader umbrella: unordered
collection with efficient membership test.)
The has? function is like contains?, but works on bags and sets rather
than on sets and maps and may take more arguments, in which case it
checks for all being contained. Note that (has? (bag 1) 1 1 1) will
return true; to check for containing at least a certain number of
occurrences use (not (empty? (difference bag-1 bag-2))) where bag-2
has the items we want to check for in the appropriate quantities.
Well, except that count and empty are broken for some reason:
user=> (.count (Bag. {} 0))
0
user=> (count (Bag. {} 0))
1
I don't understand what's causing this, but empty bags are always
returning a count of 1 (and false from empty?) although the .count
method correctly returns zero. I've traced the problem as far as the
Java method clojure.lang.RT/count but no farther at this time.
Argh. This same bug causes union to blow up in the case of (union (bag
1) (bag)). It works with (union #{1} #{}), by contrast. It traces back
to empty? wrongly returning true, and thus to count wrongly returning
1 even though .count correctly returns 0.
Simply adding "clojure.lang.Counted", without any methods, to the
protocol list makes it work again. The count function in RT is this
odd little thing:
public static int count(Object o){
if(o instanceof Counted)
return ((Counted) o).count();
return countFrom(Util.ret1(o, o = null));
}
If it's not counted, countFrom is called, and for some reason does not
work on empty Bags even though it apparently works correctly on, among
others, empty LazySeqs:
user=> (instance? clojure.lang.Counted (map + '()))
false
user=> (count (map + '()))
0
Taking the countFrom expression apart, Util.ret1 just seems to return
its first argument. I suppose it's being used to null o before
countFrom does all of its work, to avoid holding onto the head of a
lazy sequence that gets passed in. It's not the source of the problem.
The cause seems to be this in countFrom:
else if(o instanceof IPersistentCollection) {
ISeq s = seq(o);
o = null;
int i = 0;
for(; s != null; s = s.next()) {
if(s instanceof Counted)
return i + s.count();
i++;
}
return i;
}
If "o instanceof IPersistentCollection" then o has a .count method,
yet this doesn't call it. Instead it calls seq, and if seq returns a
non-null, empty seq that loop returns 1 instead of 0. So another fix
that could be made to the original bag would be to call seq on the
return value of mapcat. (Why mapcat is not returning nil for an empty
input seq is itself somewhat mysterious.) I'm not sure whether the
loop above should be considered correct (seq should never return a
non-nil, empty seq) or incorrect (the corner case of seq's return
being empty but not null should be handled). I'm not even sure whether
that loop should exist at all, or should instead call the .count
method of IPersistentCollection (with the default implementation being
the loop above, of course). If the .count method of
IPersistentCollection is not going to be used, on the other hand,
unless the thing is also a Counted, then it should be removed from the
former interface.
Regardless, it looks like (count a-bag) is going to be inefficient
despite the .count method implementation unless the Counted type is
added to the protocol list, and seq should probably be returning nil
for empty bags.
So the version I'd suggest using is:
(deftype Bag [m c]
clojure.lang.IObj
(withMeta [this, mtd]
(Bag. (with-meta m mtd) c))
(meta [this] (meta m))
clojure.lang.IPersistentCollection
(count [this] c)
(empty [this] (Bag. {} 0))
(equiv [this other] (= (seq this) other))
(seq [this]
(seq
(mapcat
(fn [[k v]]
(repeat v k))
m)))
(cons [this obj]
(Bag. (merge-with + m {obj 1}) (inc c)))
SetOps
(disjoin* [this obj]
(if-let [n (m obj)]
(if (= 1 n)
(Bag. (dissoc m obj) (dec c))
(Bag. (assoc m obj (dec n)) (dec c)))
this))
(has* [this obj] (contains? m obj))
(total [this obj] (m obj 0))
(counts [this] m)
clojure.lang.Counted
You're welcome.
I solved the count problem independently, but thanks for the advice
anyway. The presence of the count method in the specification of
IPersistentCollection puzzles me, if it's never called unless an
object also implements Counted (why not ICounted?), which specifies
the same method anyway.
just so we have analogues of both vec and vector here.