How can I improve this?

312 views
Skip to first unread message

Colin Yates

unread,
Jan 10, 2014, 9:59:10 AM1/10/14
to clo...@googlegroups.com
I have a sequence of file names and I want to make them unique.  (uniquify ["a" "b" "c" "a"]) => ["a" "b" "c" "a_1"])

This is what I have come up with, but surely there is a better way?

What would you all do?  Feedback welcome (including the word 'muppet' as I am sure I have missed something simple) :)

(defn uniquify
  "Return a sequence, in the same order as s containing every element
  of s. If s (which is presumed to be a string) occurs more than once
  then every subsequent occurrence will be made unique.

  Items will be updated to include an incrementing numeric count using
  the specified formatter function. The formatter function will be
  given the name and the number and should return a combination of the
  two.

  The set of unique s's in the returned sequence will be the count of
  s's in s."  
  ([s] (uniquify s (fn [item duplicates] (str item "_" duplicates))))
  ([s formatter]
     (let [occurrences (atom {})
           register-occurrence (fn [item]
                                 (if (get @occurrences item)
                                   (swap! (get @occurrences item) inc)
                                   (swap! occurrences assoc item (atom 1)))
                                 @(get @occurrences item))
           process (fn [item]
                     (let [duplicates (dec (register-occurrence item))]
                       (if (> duplicates 0)
                         (formatter item duplicates)
                         item)))
           unique-s (map process s)]
       unique-s)))

Guru Devanla

unread,
Jan 10, 2014, 10:10:22 AM1/10/14
to clo...@googlegroups.com
Hi Colin,

Clojure has a distinct function that does this. I may be missing some context on what you what out of your new method that 'distinct' does not provide. The distinct functions source code is a good reference.

Thanks


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Alex Miller

unread,
Jan 10, 2014, 10:12:27 AM1/10/14
to clo...@googlegroups.com
I would not use an atom. Think about it as doing a reduce while passing along a set of the names you've seen so far. You might also look at the implementation of "distinct" in clojure.core which is similar (you want to detect duplicates in the same way, but emit new names instead of omitting dupes).

Colin Yates

unread,
Jan 10, 2014, 10:14:37 AM1/10/14
to clo...@googlegroups.com
The missing context is that distinct removes duplicates, I want duplicates to be made unique by changing them in some way: (uniquify ["a" "b" "c" "a"]) => ["a" "b" "c" "a_1"]) *not* (uniquify ["a" "b" "c" "a"]) => ["a" "b" "c"])

Not sure what else to put that isn't in the original post to be honest...


Date: Fri, 10 Jan 2014 07:10:22 -0800
Subject: Re: How can I improve this?
From: grd...@gmail.com
To: clo...@googlegroups.com
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/rt-l_X3gK-I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.

Colin Yates

unread,
Jan 10, 2014, 10:17:15 AM1/10/14
to clo...@googlegroups.com
Good call.  

I keep discounting reduce as I am not 'reducing' anything, only transforming (i.e. map) it - my mistake.

Thanks.

Col

Ray Miller

unread,
Jan 10, 2014, 10:19:03 AM1/10/14
to clo...@googlegroups.com
On 10 January 2014 14:59, Colin Yates <colin...@gmail.com> wrote:
I have a sequence of file names and I want to make them unique.  (uniquify ["a" "b" "c" "a"]) => ["a" "b" "c" "a_1"])

This is what I have come up with, but surely there is a better way?


I would do something like:

 (defn uniquify
  ([xs]
     (uniquify xs (fn [x n] (str x "_" n))))
  ([xs formatter]
     (letfn [(uniquify* [xs seen]
               (lazy-seq
                (when (not-empty xs)
                  (let [x (first xs)
                        n (seen x 0)]
                    (if (> n 0)
                      (cons (formatter x n) (uniquify* (rest xs) (assoc seen x (inc n))))
                      (cons x (uniquify* (rest xs) (assoc seen x (inc n)))))))))]
       (uniquify* xs {}))))

Laurent PETIT

unread,
Jan 10, 2014, 10:22:29 AM1/10/14
to clo...@googlegroups.com
Hi,

Use frequencies to get a map of path => nb of occurrences, then for each entry of the map, create unique names.
Cannot provide an impl on the uPhine, sorry
--

Colin Yates

unread,
Jan 10, 2014, 10:23:11 AM1/10/14
to clo...@googlegroups.com
Love it.  Much more readable without any nasty persistent state.

Colin Yates

unread,
Jan 10, 2014, 10:24:11 AM1/10/14
to clo...@googlegroups.com
I did consider that but I want to preserve the order of the incoming sequence.

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscribe@googlegroups.com.

Guru Devanla

unread,
Jan 10, 2014, 10:24:59 AM1/10/14
to clo...@googlegroups.com
Ok. My bad. Should not be reading and responding to such posts from the phone. Somehow, I overlooked that part of the mail. I think as Alex stated, a simple way of holding on to seen objects in the set and reducing the list should be good.

Laurent PETIT

unread,
Jan 10, 2014, 10:28:10 AM1/10/14
to clo...@googlegroups.com
Then call frequencies for the nbr of occurrences per entry, then reduce over the initial sequence, including the map in the accumulator, decrement info the nbr every time you find an occurrence ?

--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.

Jim - FooBar();

unread,
Jan 10, 2014, 10:28:17 AM1/10/14
to clo...@googlegroups.com
I quickly put together this which seems to preserver the orderof the original seq:

(defn uniquify [coll]
 (let [post-fn #(group-by first (-> % meta :encountered))]
  (loop [unique (with-meta [] {:encountered []})
        [f & more] coll]
   (if (nil? f) (flatten (concat unique (reduce #(conj % (map str (second %2) (range))) [] (post-fn unique)))) 
     (recur (if-not (some #{f} unique) (conj unique f) (vary-meta unique update-in [:encountered] conj f)) more)))))


HTH,
Jim

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.

Jim - FooBar();

unread,
Jan 10, 2014, 10:34:32 AM1/10/14
to clo...@googlegroups.com
actually `post-fn` should be #(group-by identity (-> % meta :encountered))

Jim

Jim - FooBar();

unread,
Jan 10, 2014, 10:40:28 AM1/10/14
to clo...@googlegroups.com
oops! my fn will not keep the original ordering...sorry Colin

Jim

Stefan Kanev

unread,
Jan 10, 2014, 11:20:25 AM1/10/14
to clo...@googlegroups.com
I came up with the following version:

(defn uniquify [words]
(loop [encountered {}
result []
remaining words]
(if (seq remaining)
(let [word (first remaining)
occurences (get encountered word)
modified (if occurences
(str word "_" occurences)
word)]
(recur (update-in encountered [word] (fnil inc 0))
(conj result modified)
(rest remaining)))
result)))

It is a bit Scheme-ish. It builds a map of number of occurences as it
builds a vector, containing the result. It uses the map to figure out
whether to add a suffix or not. It preserves the original order of the
names. The variable names could use some love, but I don't have the
time for it now.

If you want a lazy version, some modification is needed.
--
Stefan Kanev � @skanev � http://skanev.com/
Giving up on assembly language was the apple in our Garden of Eden: Languages
whose use squanders machine cycles are sinful. The LISP machine now permits
LISP programmers to abandon bra and fig-leaf.

Laurent PETIT

unread,
Jan 10, 2014, 11:43:34 AM1/10/14
to clo...@googlegroups.com



2014/1/10 Stefan Kanev <stefan...@gmail.com>
Note that the lazy version would preserve doing too much computation ahead of time, but would not be any more space efficient than a non lazy version, even if the consumer of the lazy sequence does not retain the head. That is because at the same time that the remaining seq is consumed, the encountered map is updated. In the worst case scenario where all coll items are different, the map when then last lazy seq item is produced will be of the same size as the initial sequence.

 
--
Stefan Kanev Åš @skanev Åš http://skanev.com/

Giving up on assembly language was the apple in our Garden of Eden: Languages
whose use squanders machine cycles are sinful.  The LISP machine now permits
LISP programmers to abandon bra and fig-leaf.

Jonas

unread,
Jan 10, 2014, 11:45:04 AM1/10/14
to clo...@googlegroups.com
Here's a version using reduce:

  (defn uniquify [items]
    (first
      (reduce (fn [[result count-map] item]
                (let [n (inc (count-map item 0))]
                  [(conj result (str item "_" n))
                   (assoc count-map item n)]))
              [[] {}]
              items)))

Replace (str item "_" n) with (if (zero? n) item (str item "_" n) if you don't want to append "_0" the first time you encounter a new string

Stefan Kanev

unread,
Jan 10, 2014, 12:11:39 PM1/10/14
to clo...@googlegroups.com
Somehow I totally forgot I could use destructuring. Here's a slightly
shorter version:

(defn uniquify [words]
(loop [encountered {}
result []
[word & remaining] words]
(if (seq remaining)
(let [occurences (get encountered word)
modified (if occurences
(str word "_" occurences)
word)]
(recur (update-in encountered [word] (fnil inc 0))
(conj result modified)
remaining))
result)))

--
Stefan Kanev ¦ @skanev ¦ http://skanev.com/
You can measure a programmer's perspective by noting his attitude on the
continuing vitality of FORTRAN.

Laurent PETIT

unread,
Jan 10, 2014, 12:20:22 PM1/10/14
to clo...@googlegroups.com
no you have a bug in this last version, it now skips the last result


2014/1/10 Stefan Kanev <stefan...@gmail.com>

Laurent PETIT

unread,
Jan 10, 2014, 12:29:20 PM1/10/14
to clo...@googlegroups.com
What about this one?

Inspired by Stefan's, with more destructuring in loop, format-fn as a function, initial call to (seq) then (next) instead of (rest), placing the exit argument first so that it's not lost at the end of the function, renamed word as item since this function does not depend on the type of items.

(defn uniquify [in format-fn]
  (loop [[item :as in] (seq in)
         {n item :as item-nbrs} {}
         out []]
    (if-not in
      out
      (let [format-fn (if n format-fn (constantly item))]
        (recur (next in)
               (update-in item-nbrs [item] (fnil inc 0))
               (conj out (format-fn item n)))))))



2014/1/10 Laurent PETIT <lauren...@gmail.com>

Mark Engelberg

unread,
Jan 10, 2014, 1:33:29 PM1/10/14
to clojure
If all you need is unqiueness, why not just number all the filenames in sequential order, something like:

(defn uniqueify [filenames]
  (map (fn [filename number] (str filename \_ number)) filenames (iterate inc 1)))


Stefan Kanev

unread,
Jan 10, 2014, 1:34:23 PM1/10/14
to clo...@googlegroups.com
On 10/01/14, Laurent PETIT wrote:
> What about this one?
>
> Inspired by Stefan's, with more destructuring in loop, format-fn as a
> function, initial call to (seq) then (next) instead of (rest), placing the
> exit argument first so that it's not lost at the end of the function,
> renamed word as item since this function does not depend on the type of
> items.
>
> (defn uniquify [in format-fn]
> (loop [[item :as in] (seq in)
> {n item :as item-nbrs} {}
> out []]
> (if-not in
> out
> (let [format-fn (if n format-fn (constantly item))]
> (recur (next in)
> (update-in item-nbrs [item] (fnil inc 0))
> (conj out (format-fn item n)))))))

Yeah, that's better. I like using `if-not` instead of `if` - it puts
the base case first, which I find preferable. An adequate format
function can be given by defining a single-parameter version of
`uniquify`.
--
Stefan Kanev ¦ @skanev ¦ http://skanev.com/
Think of all the psychic energy expended in seeking a fundamental distinction
between "algorithm" and "program".

Sean Corfield

unread,
Jan 10, 2014, 2:03:23 PM1/10/14
to clo...@googlegroups.com
signature.asc

Guru Devanla

unread,
Jan 10, 2014, 2:15:10 PM1/10/14
to clo...@googlegroups.com
But, for the given problem this may not be directly helpful. This method only returns the next unique name for the given list. The logic would have to traverse the list n times for n elements (and some factor of no of duplicates) to get the desired result, correct?

Thanks
Guru 

Colin Yates

unread,
Jan 10, 2014, 2:15:55 PM1/10/14
to clo...@googlegroups.com
This and Jonas' are my current favourites, for what that's worth.

Keep the suggestions coming! 

Guru Devanla

unread,
Jan 10, 2014, 2:16:30 PM1/10/14
to clo...@googlegroups.com
Actually, you might have meant line 267?


On Fri, Jan 10, 2014 at 11:03 AM, Sean Corfield <se...@corfield.org> wrote:

Mark Engelberg

unread,
Jan 10, 2014, 2:39:45 PM1/10/14
to clojure
Technically, all these solutions are flawed.

With the input
["a" "a" "a_1"]
you'll get back
["a" "a_1" "a_1"]

To truly address this, you need to also add the newly formatted filename into the "seen" map, which none of the suggested solutions do.

Jonas Enlund

unread,
Jan 10, 2014, 2:43:40 PM1/10/14
to Clojure
That's why I wrote my solution like I did, i.e., concatenate "_1" when a new string is found. This would result in the vector ["a_1" "a_2" "a_1_1"]
 

--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/rt-l_X3gK-I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.

Mark Engelberg

unread,
Jan 10, 2014, 2:49:38 PM1/10/14
to clojure
On Fri, Jan 10, 2014 at 11:43 AM, Jonas Enlund <jonas....@gmail.com> wrote:
That's why I wrote my solution like I did, i.e., concatenate "_1" when a new string is found. This would result in the vector ["a_1" "a_2" "a_1_1"]

Right, I agree that works, as does my "tack unique numbers onto the end of everything" solution.  That doesn't appear to be what the OP wants, though, so I'm just saying that if he really insists on having "naked names" as much as possible, then it's important to also compare the names not only against previous filenames, but against the modified filenames that have come before.

Colin Yates

unread,
Jan 10, 2014, 2:52:01 PM1/10/14
to clo...@googlegroups.com
way to take the wind out of our sails!  Well spotted :).

Cedric Greevey

unread,
Jan 10, 2014, 3:06:32 PM1/10/14
to clo...@googlegroups.com
On Fri, Jan 10, 2014 at 10:22 AM, Laurent PETIT <lauren...@gmail.com> wrote:
Hi,

Use frequencies to get a map of path => nb of occurrences, then for each entry of the map, create unique names.
Cannot provide an impl on the uPhine, sorry

"uPhine"? :)

Mark Engelberg

unread,
Jan 10, 2014, 3:16:28 PM1/10/14
to clojure

On Fri, Jan 10, 2014 at 11:52 AM, Colin Yates <colin...@gmail.com> wrote:
> way to take the wind out of our sails!  Well spotted :).


It's not too hard to fix.   Here's an adapted version of Jonas' solution that should do the trick:

(defn uniqueify [items]
  (first
    (reduce (fn [[results count-map] item]
              (let [n (count-map item 0)]
                (if (zero? n)                   
                  [(conj results item) (assoc count-map item (inc n))]
                  (recur [results (assoc count-map item (inc n))]
                         (str item \_ n)))))           
            [[] {}]
            items)))

=> (uniqueify ["a" "a" "a" "a" "b" "a_2" "a_3" "a_3_1" "a_3_1" "a"])
["a" "a_1" "a_2" "a_3" "b" "a_2_1" "a_3_1" "a_3_1_1" "a_3_1_2" "a_4"]

Colin Yates

unread,
Jan 10, 2014, 3:55:04 PM1/10/14
to clo...@googlegroups.com
I thought I would have a go myself without copying (although having read them earlier) the other functions and this is what I came up with:

(first (reduce (fn [[results seen] item]
                      (let [occurrences ((fnil identity 0) (get seen item))
                            seen (assoc seen item (inc occurrences))
                            item (if (> occurrences 0) (format-fn item occurrences) item)]
                        [(conj results item) seen])) [[] {}] (seq items)))

This doesn't solve the problem you mention, but baby steps.

Being really anal I could claim the original a_2 should remain a_2 and the third instance of a jump to being a_3.  I thought about this and couldn't see how to do this with reduce because I really want to say "oh, I have a new name, recurse into the function again with the new proposed name", i.e. loop the generation of the proposed name until it is unique, but I haven't got that far yet (without potentially blowing the stack!)

Then I saw your 'recur' used outside of a loop which points the way...

Thanks!

Colin Yates

unread,
Jan 10, 2014, 3:59:00 PM1/10/14
to clo...@googlegroups.com
Sorry - wrong c/p:

(first (reduce (fn [[results seen] item]
                      (let [cnt (get seen item 0)
                            item (if (> cnt 0) (format-fn item cnt) item)]
                        [(conj results item) (assoc seen item (inc cnt))]))
                    [[] {}]
                    items))

Mark Engelberg

unread,
Jan 10, 2014, 4:04:19 PM1/10/14
to clojure
On Fri, Jan 10, 2014 at 12:55 PM, Colin Yates <colin...@gmail.com> wrote:
Being really anal I could claim the original a_2 should remain a_2 and the third instance of a jump to being a_3.
 
 Sure, but that would require two passes.  Otherwise, there's no way when you encounter the third "a" to know there's an "a_2" somewhere later in the stream.

Colin Yates

unread,
Jan 10, 2014, 4:06:31 PM1/10/14
to clo...@googlegroups.com
Gosh - my public humiliation continues.  Here is one that actually works:

(first (reduce (fn [[results seen] item]
                      (let [cnt (get seen item 0)]
                        [(conj results (if (> cnt 0) (format-fn item cnt) item))
                         (assoc seen item (inc cnt))]))
                    [[] {}]
                    items))
(fact "strings can be made unique"
  (s/uniquify ["a" "b" "c"]) => ["a" "b" "c"]
  (s/uniquify ["a" "b" "a" "c" "b" "b" "b" "b" "a"]) => ["a" "b" "a_1" "c" "b_1" "b_2" "b_3" "b_4" "a_2"])

Laurent PETIT

unread,
Jan 10, 2014, 4:34:28 PM1/10/14
to clo...@googlegroups.com
okay, new take solving the issue raised by Mark:

(defn uniquify [in format-fn]
  (loop [[item :as in] (seq in)
         {n item :as item-nbrs} {}
         out []]
    (if-not in
      out
      (let [format-fn (if n format-fn (constantly item))
            new-item (format-fn item n)]
        (recur (next in)
               (merge-with (fnil + 0)
                           item-nbrs
                           (hash-map item 1 new-item 1))
               (conj out new-item))))))

=> (uniquify ["a" "b" "c" "a" "a_1" "a_1" "a"] #(str %1 "_" %2))
["a" "b" "c" "a_1" "a_1_1" "a_1_2" "a_2"]



2014/1/10 Colin Yates <colin...@gmail.com>

--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.

Mark Engelberg

unread,
Jan 10, 2014, 4:55:16 PM1/10/14
to clojure
Laurent, your approach doesn't quite work:
=> (uniquify ["a_1" "a" "a"] (fn [s n] (str s \_ n)))
["a_1" "a" "a_1"]

Laurent PETIT

unread,
Jan 10, 2014, 5:08:03 PM1/10/14
to clo...@googlegroups.com
Indeed, I should definitely recur as you do

Ralf Schmitt

unread,
Jan 10, 2014, 6:03:22 PM1/10/14
to clo...@googlegroups.com
Colin Yates <colin...@gmail.com> writes:

> This and Jonas' are my current favourites, for what that's worth.
>
> Keep the suggestions coming!

here's another one that uses reductions. formatter would need to be
adapted a bit, since it currently also adds "_0" for the first name
seen:


(defn count-name
[name->count name]
(assoc name->count name (inc (get name->count name 0))))

(defn uniquify
([s] (uniquify s (fn [item duplicates] (str item "_" duplicates))))
([s formatter]
(map (fn [name name->count]
(formatter name (get name->count name 0)))
s
(reductions count-name {} s))))

Sean Corfield

unread,
Jan 10, 2014, 7:20:45 PM1/10/14
to clo...@googlegroups.com
I meant the code that starts at line 257 (and continues to line 274): two functions, the second one calls the first one.

Luckily, java.jdbc's code seems to pass all the test cases posted to this thread so far (arguably more intuitively, the second occurrence gets "_2" appended, the third "_3" etc):


Sean
signature.asc

Alan Forrester

unread,
Jan 10, 2014, 8:03:39 PM1/10/14
to clo...@googlegroups.com
On 10 January 2014 21:06, Colin Yates <colin...@gmail.com> wrote:
> Gosh - my public humiliation continues. Here is one that actually works:
>
> (first (reduce (fn [[results seen] item]
> (let [cnt (get seen item 0)]
> [(conj results (if (> cnt 0) (format-fn item cnt)
> item))
> (assoc seen item (inc cnt))]))
> [[] {}]
> items))
> (fact "strings can be made unique"
> (s/uniquify ["a" "b" "c"]) => ["a" "b" "c"]
> (s/uniquify ["a" "b" "a" "c" "b" "b" "b" "b" "a"]) => ["a" "b" "a_1" "c"
> "b_1" "b_2" "b_3" "b_4" "a_2"])

My first two suggestions produce unique titles. They don't do what you
want but they are brief:
(defn uniquify [a] (map #(str (gensym %)) a))
(defn uniquify [a] (map-indexed (fn [ind el] (str el "_" ind)) a))

My last suggestion does what you want
(defn uniquify [a]
(loop [res [(first a)] leftover (rest a)]
(if-not (empty? leftover)
(let [freqy (frequencies res)
fl (first leftover)
ffl (freqy fl)]
(if ffl
(recur (concat res [(str fl "_" ffl)]) (rest leftover))
(recur (concat res [fl]) (rest leftover))))
res)))

Alan

Alan Forrester

unread,
Jan 10, 2014, 8:14:34 PM1/10/14
to clo...@googlegroups.com
On 11 January 2014 01:03, Alan Forrester
Actually, it doesn't work.

Alan

Alan Forrester

unread,
Jan 10, 2014, 8:30:36 PM1/10/14
to clo...@googlegroups.com
On 11 January 2014 01:14, Alan Forrester
This version works:

(defn uniquify [a]
(loop [res [(first a)] intermed [(first a)] leftover (rest a)]
(if-not (empty? leftover)
(let [freqy (frequencies intermed)
fl (first leftover)
ffl (freqy fl)]
(if ffl
(recur (concat res [(str fl "_" ffl)]) (concat intermed
[fl]) (rest leftover))
(recur (concat res [fl]) (concat intermed [fl]) (rest leftover))))
res)))

Alan

HÃ¥kan RÃ¥berg

unread,
Jan 10, 2014, 11:10:33 PM1/10/14
to clo...@googlegroups.com
Another style, using channels for local state, but could been plain old iterators, slight golf warning:

(require '[clojure.core.async :refer [to-chan <!!]])

(defn uniquify [s formatter]
  (let [g (memoize #(to-chan (cons % (map (partial formatter %) (next (range))))))]
    (map (fn f [x] ((some-fn #{x} f) (<!! (g x)))) s)))

(uniquify ["a" "a" "a" "a" "b" "a_2" "a_3" "a_3_1" "a_3_1" "a"] #(str %1 "_" %2))
;=> ["a" "a_1" "a_2" "a_3" "b" "a_2_1" "a_3_1" "a_3_1_1" "a_3_1_2" "a_4"]


On Friday, 10 January 2014 14:59:10 UTC, Colin Yates wrote:
I have a sequence of file names and I want to make them unique.  (uniquify ["a" "b" "c" "a"]) => ["a" "b" "c" "a_1"])

This is what I have come up with, but surely there is a better way?

What would you all do?  Feedback welcome (including the word 'muppet' as I am sure I have missed something simple) :)

(defn uniquify
  "Return a sequence, in the same order as s containing every element
  of s. If s (which is presumed to be a string) occurs more than once
  then every subsequent occurrence will be made unique.

  Items will be updated to include an incrementing numeric count using
  the specified formatter function. The formatter function will be
  given the name and the number and should return a combination of the
  two.

  The set of unique s's in the returned sequence will be the count of
  s's in s."  
  ([s] (uniquify s (fn [item duplicates] (str item "_" duplicates))))
  ([s formatter]
     (let [occurrences (atom {})
           register-occurrence (fn [item]
                                 (if (get @occurrences item)
                                   (swap! (get @occurrences item) inc)
                                   (swap! occurrences assoc item (atom 1)))
                                 @(get @occurrences item))
           process (fn [item]
                     (let [duplicates (dec (register-occurrence item))]
                       (if (> duplicates 0)
                         (formatter item duplicates)
                         item)))
           unique-s (map process s)]
       unique-s)))

Mark Engelberg

unread,
Jan 11, 2014, 3:32:20 AM1/11/14
to clojure
Very clever!


--

Alex Baranosky

unread,
Jan 11, 2014, 5:22:18 AM1/11/14
to clo...@googlegroups.com
I've been finding uses for Brandon Bloom's transduce mini-library left and right lately. There is a class of problems where you want to track some state as you process a seq, and transduce.lazy/map-state enables you to do that (https://github.com/brandonbloom/transduce).

Here's my solution using it. It is concise and stateless:

(defn uniqify [coll]
  (map-state (fn [elem->times-seen x]
               (let [x (if-let [times-seen (elem->times-seen x)]
                         (str x "_" times-seen)
                         x)
                     elem->times-seen' (update-in elem->times-seen [x] (fnil inc 0))]
                 [elem->times-seen' x]))
             {}
             coll))

Alex Osborne

unread,
Jan 11, 2014, 2:47:39 AM1/11/14
to clo...@googlegroups.com
On Sat, January 11, 2014 9:22 pm, Alex Baranosky wrote:
> There is a class of problems where you want to track some state as you
process a seq

This is an interesting exercise. I find said class of problem comes up
fairly frequently and I usually end up with something scary-looking using
reduce or lazy-seq that I'm never quite satisfied with. I often feel
there's some missing flow control mechanism just out of reach.

As another point of comparison here's a straightforward imperative version
(of the simplified problem):

(defn unique [words]
(let [occurrences (java.util.HashMap.)]
(for [word words]
(let [n (get occurrences word 0)]
(.put occurrences word (inc n))
(if (zero? n) word (str word "_" n))))))

i.e. "if a tree falls in the woods..." style mutation. :-)

> and transduce.lazy/map-state enables you to do that
(https://github.com/brandonbloom/transduce)

The imperative solution led me towards something like the "for"-like form
of map-state. A macro that's a hybrid of "for" and "loop" such that it
takes both for and loop style bindings:

(defn unique [words]
(for-loop [word words] [occurrences {}]
(let [n (get occurences word 0)]
(yield (if (zero? n) word (str word "_" n))
(assoc occurrences word (inc n))))))

Imagine "yield" is somewhat like "recur" in that it takes new values to
bind for the next iteration but its first argument is the sequence element
to return for the current step. Like recur yield must be in the tail call
position.

The above might expand into this:

(defn uniquify [words format]
((fn uniquify* [occurrences words]
(lazy-seq
(let [[word & rest] words
n (get occurrences word 0)]
(when word
(cons (if (zero? n) word (str word "_" n))
(uniquify* (assoc occurrences word (inc n))
rest))))))
{} words))



Reply all
Reply to author
Forward
0 new messages