Find all ties in a sorted list

David Jacobs

nieprzeczytany,

10 lis 2010, 15:28:1210.11.2010

do Clojure

I have a sorted list, and I'd like to derive from that a list
containing all "ties" in the original.

That is, if I have (1 2 3 4 4 5 5 5 5 5 6 9 12 12), I want to get back
the sequence (4 4 5 5 5 5 5 12 12).

My first thought was to try to filter down the list, but filter takes
each element of a list in step. One hack might be to pass the original
list through a partition function[0], then filter that list using
(filter #(> (count %) 1)). Seems messy, though.

Is there a standard solution to this problem that I'm not aware of?

Thanks,
David

[0]: http://groups.google.com/group/clojure/browse_thread/thread/f1593f492759e66b/2046629d40a1fdf7

Ken Wesson

nieprzeczytany,

10 lis 2010, 15:34:2010.11.2010

do clo...@googlegroups.com

On Wed, Nov 10, 2010 at 3:28 PM, David Jacobs
<deve...@allthingsprogress.com> wrote:
> I have a sorted list, and I'd like to derive from that a list
> containing all "ties" in the original.
>
> That is, if I have (1 2 3 4 4 5 5 5 5 5 6 9 12 12), I want to get back
> the sequence (4 4 5 5 5 5 5 12 12).
>
> My first thought was to try to filter down the list, but filter takes
> each element of a list in step. One hack might be to pass the original
> list through a partition function[0], then filter that list using
> (filter #(> (count %) 1)). Seems messy, though.
>
> Is there a standard solution to this problem that I'm not aware of?

If the sequence is always sorted as in the example:

(loop [s the-input-seq o [] last nil]
(if (empty? s)
o
(let [n (first s)]
(if (or (= n (second s)) (= n last))
(recur (rest s) (conj o n) n)
(recur (rest s) o n)))))

It's non-lazy but oughta do the job. :)

Benny Tsai

nieprzeczytany,

10 lis 2010, 15:37:3210.11.2010

do Clojure

I don't know about a standard solution, but my take is that you can
use the built-in 'frequencies' function to build up a frequency table
for the list, then filter on anything that occurs more than once:

(defn get-ties [coll]
(let [freqs (frequencies coll)]
(filter #(> (freqs %) 1) coll)))

On Nov 10, 1:28 pm, David Jacobs <develo...@allthingsprogress.com>
wrote:

> [0]:http://groups.google.com/group/clojure/browse_thread/thread/f1593f492...

Ken Wesson

nieprzeczytany,

10 lis 2010, 15:40:3910.11.2010

do clo...@googlegroups.com

And

(let [x (map #(if (= %1 %2) [%1]) s (rest s))]
(mapcat #(if (nil? %1) %2 %1) x (concat (rest x) [nil])))

does it lazily. :)

Meikel Brandmeyer

nieprzeczytany,

10 lis 2010, 15:55:5110.11.2010

do clo...@googlegroups.com

HI,

Am 10.11.2010 um 21:28 schrieb David Jacobs:

> That is, if I have (1 2 3 4 4 5 5 5 5 5 6 9 12 12), I want to get back
> the sequence (4 4 5 5 5 5 5 12 12).

Low-level solution using lazy-seq. As lazy as possible:

(defn ties
[coll]
(let [step (fn step [seen s]
(lazy-seq
(loop [s (seq s)]
(when-first [fst s]
(if (contains? seen fst)
(cons fst (step seen (rest s)))
(when-let [sn (next s)]
(let [snd (first sn)]
(if (= fst snd)
(cons fst (cons snd (step (conj seen fst)
(rest sn))))
(recur sn)))))))))]
(step #{} coll)))

Sincerely
Meikel

Ken Wesson

nieprzeczytany,

10 lis 2010, 15:59:0110.11.2010

do clo...@googlegroups.com

user=> (defn foo [the-input-seq] (loop [s the-input-seq o [] last nil]

(if (empty? s)
o
(let [n (first s)]
(if (or (= n (second s)) (= n last))
(recur (rest s) (conj o n) n)

(recur (rest s) o n))))))
#'user/foo
user=> (defn bar [coll]

(let [freqs (frequencies coll)]
(filter #(> (freqs %) 1) coll)))

#'user/bar
user=> (defn baz [s] (let [x (map #(if (= %1 %2) [%1]) s (rest s))]
(mapcat #(if (nil? %1) %2 %1) x (concat (rest x) [nil]))))
#'user/baz
user=> (time (doseq [x (range 10000)] (doall (foo '(1 2 3 4 4 5 5 5 5
5 6 9 12 12)))))
"Elapsed time: 128.51932 msecs"
nil
user=> (time (doseq [x (range 10000)] (doall (bar '(1 2 3 4 4 5 5 5 5
5 6 9 12 12)))))
"Elapsed time: 199.70168 msecs"
nil
user=> (time (doseq [x (range 10000)] (doall (baz '(1 2 3 4 4 5 5 5 5
5 6 9 12 12)))))
"Elapsed time: 731.5074 msecs"

As expected: loop/recur fastest, frequencies in betwee, fully-lazy
slowest. The loop/recur version neither creates temporary data
structures (frequencies generates a map, the fully-lazy implementation
an intermediate seq) nor traverses the input twice (both other
versions do -- the frequencies implementation traverses once to build
the frequencies map and again to weed out singletons and the
fully-lazy one weeds the singletons and the first of each run into an
intermediate seq, then goes over the intermediate seq prepending to
each run one more copy, and for the given input the intermediate seq
is nearly as long as the input).

There are other considerations though. The fully-lazy one uses the
least memory; the fully-lazy one's performance is probably better
relative to the other two when the input is sparse in ties (but
probably still slower than the other two). The frequencies one will
work on an input like '(5 1 2 4 5 5 7) whereas one of the 5s will be
missed by the other two. (Besides frequencies, the only implementation
that occurs to me that will work on arbitrary unsorted input is one
that calls sort on its input first and then uses any of the algorithms
already posted. Then it will have n log n performance, so worse than
any of the others on long enough inputs since everything we already
posted is O(n), and will choke on inputs whose elements aren't all
pairwise Comparable. Sorting by #(.compare (.hashCode %1) (.hashCode
%2)) fixes the latter but not the O(n log n) performance and
introduces a tiny chance of error (if 5 and 4 in the above input had
the same hashcode the sort could produce '(1 2 5 4 5 5 7) for instance
and a 5 still gets lost). Sorting with #(try (.compare %1 %2) (catch
ClassCastException e (.compare (.toString (.getClass %1)) (.toString
(.getClass %2))))) should work, but is still n log n. In fact that
last comparator might be useful as a general-purpose comparator for
heterogeneous colls. It won't consider e.g. 5 (Integer) and 5.0
(Double) as a tie here though, which may or may not be desirable
behavior.)

David Jacobs

nieprzeczytany,

10 lis 2010, 16:01:0410.11.2010

do Clojure

Wow, thanks for the quick replies, guys :) I can't decide which one I
like best.

Ken Wesson

nieprzeczytany,

10 lis 2010, 16:01:4010.11.2010

do clo...@googlegroups.com

Oh, and another difference: the frequencies implementation handles nil
in the input. The fully-lazy one also does. The loop/recur will screw
up if the input starts with nil. Changing the literal nil in the loop
initial bindings to (Object.) will fix it though.

Ken Wesson

nieprzeczytany,

10 lis 2010, 16:08:2410.11.2010

do clo...@googlegroups.com

Actually the fully-lazy one might be best having the [nil] near the
end changed to [(Object.)] as well, to be sure.

Juha Arpiainen

nieprzeczytany,

10 lis 2010, 16:27:1310.11.2010

do Clojure

On Nov 10, 10:28 pm, David Jacobs <develo...@allthingsprogress.com>
wrote:

> I have a sorted list, and I'd like to derive from that a list
> containing all "ties" in the original.

One more solution assuming input is sorted:

(defn seq-ties [coll]
(mapcat #(cons (first %) %) (keep next (partition-by identity
coll))))

--
Juha Arpiainen

Alan

nieprzeczytany,

10 lis 2010, 17:13:4510.11.2010

do Clojure

Juha's is my favorite by a mile. And if you're content with a list of
[element, count] pairs, you could do this:

(defn seq-ties [coll]
(->> coll (partition-by identity) (filter next) (map (juxt first
count)))

(seq-ties [1 1 1 1 2 3 3 4 5 7 7])
;==>([1 4] [3 2] [7 2])

David Jacobs

nieprzeczytany,

10 lis 2010, 19:12:2810.11.2010

do clo...@googlegroups.com

Thanks for all of the options, and Ken, thanks for the detailed comparison. This will be extremely useful. (I'm trying to port some statistical tests to Clojure for work.)

Best,
David

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

nicola...@gmail.com

nieprzeczytany,

11 lis 2010, 04:42:3111.11.2010

do clo...@googlegroups.com

A solution based on unfold with the last element read (or initially
nil) and the rest of the list as a seed.

Odpowiedz wszystkim

Odpowiedz autorowi

Przekaż