If the sequence is always sorted as in the example:
(loop [s the-input-seq o [] last nil]
(if (empty? s)
o
(let [n (first s)]
(if (or (= n (second s)) (= n last))
(recur (rest s) (conj o n) n)
(recur (rest s) o n)))))
It's non-lazy but oughta do the job. :)
(let [x (map #(if (= %1 %2) [%1]) s (rest s))]
(mapcat #(if (nil? %1) %2 %1) x (concat (rest x) [nil])))
does it lazily. :)
Am 10.11.2010 um 21:28 schrieb David Jacobs:
> That is, if I have (1 2 3 4 4 5 5 5 5 5 6 9 12 12), I want to get back
> the sequence (4 4 5 5 5 5 5 12 12).
Low-level solution using lazy-seq. As lazy as possible:
(defn ties
[coll]
(let [step (fn step [seen s]
(lazy-seq
(loop [s (seq s)]
(when-first [fst s]
(if (contains? seen fst)
(cons fst (step seen (rest s)))
(when-let [sn (next s)]
(let [snd (first sn)]
(if (= fst snd)
(cons fst (cons snd (step (conj seen fst)
(rest sn))))
(recur sn)))))))))]
(step #{} coll)))
Sincerely
Meikel
As expected: loop/recur fastest, frequencies in betwee, fully-lazy
slowest. The loop/recur version neither creates temporary data
structures (frequencies generates a map, the fully-lazy implementation
an intermediate seq) nor traverses the input twice (both other
versions do -- the frequencies implementation traverses once to build
the frequencies map and again to weed out singletons and the
fully-lazy one weeds the singletons and the first of each run into an
intermediate seq, then goes over the intermediate seq prepending to
each run one more copy, and for the given input the intermediate seq
is nearly as long as the input).
There are other considerations though. The fully-lazy one uses the
least memory; the fully-lazy one's performance is probably better
relative to the other two when the input is sparse in ties (but
probably still slower than the other two). The frequencies one will
work on an input like '(5 1 2 4 5 5 7) whereas one of the 5s will be
missed by the other two. (Besides frequencies, the only implementation
that occurs to me that will work on arbitrary unsorted input is one
that calls sort on its input first and then uses any of the algorithms
already posted. Then it will have n log n performance, so worse than
any of the others on long enough inputs since everything we already
posted is O(n), and will choke on inputs whose elements aren't all
pairwise Comparable. Sorting by #(.compare (.hashCode %1) (.hashCode
%2)) fixes the latter but not the O(n log n) performance and
introduces a tiny chance of error (if 5 and 4 in the above input had
the same hashcode the sort could produce '(1 2 5 4 5 5 7) for instance
and a 5 still gets lost). Sorting with #(try (.compare %1 %2) (catch
ClassCastException e (.compare (.toString (.getClass %1)) (.toString
(.getClass %2))))) should work, but is still n log n. In fact that
last comparator might be useful as a general-purpose comparator for
heterogeneous colls. It won't consider e.g. 5 (Integer) and 5.0
(Double) as a tie here though, which may or may not be desirable
behavior.)
Actually the fully-lazy one might be best having the [nil] near the
end changed to [(Object.)] as well, to be sure.
Best,
David
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en