Duplicate key exception reading map that was written to a file

84 views
Skip to first unread message

Dave Kincaid

unread,
Nov 25, 2015, 11:04:11 PM11/25/15
to Clojure
I have something very strange going on when I try to write a map out to a file and read it back in. It's a perfectly fine hash-map with ????? key/values (so it's pretty big). When I write the map out to a file using

(spit "/tmp/mednotes6153968756847768349/repl-write.edn" (pr-str phrases))

and then read it back in with

(edn/read (PushbackReader. (io/reader "/tmp/mednotes6153968756847768349/repl-write.edn")))

I am getting a duplicate key exception indicating that "? 5" is duplicated. phrases is a clojure.lang.PersistentHashMap. The keys of the map are strings and the values are numbers. When I get the value for "? 5" from the map it returns 352.

I tried to grep the file to find the occurrences of the key "? 5" (and the 30 characters before and after it) and it seems to return 4 of them. The second one is the right one from the map, but I have no idea where the other 3 are coming from.

[/tmp/mednotes6153968756847768349]> egrep -o ".{30}\"\? 5\" .{30}" repl-write.edn 
hasing a toothbrush for" 160, "? 5" 32, ". ) during his /" 32, "to
 "is intact with sutures" 32, "? 5" 352, "4.81 pounds" 128, "ceren
udden" 32, "being up all" 32, "? 5" 32, "limited financial means" 
, "count , everytime she" 32, "? 5" 32, "had a partial mandibulect

Does anyone have an idea what might be happening when the map is written out to the file? How is that key getting duplicated?

I have tried a few slightly different ways of writing to the file including

(spit "/tmp/mednotes6153968756847768349/repl-write.edn" (binding [*print-dup* true] (pr-str phrases)))

and

(spit "/tmp/mednotes6153968756847768349/repl-write.edn" (.toString phrases))

based on some StackOverflow answers I found. They all seem to do the same thing.

Here is the exception stack trace.

1. Caused by java.lang.IllegalArgumentException
   Duplicate key: ? 5

        PersistentHashMap.java:   67  clojure.lang.PersistentHashMap/createWithCheck
                       RT.java: 1538  clojure.lang.RT/map
                EdnReader.java:  631  clojure.lang.EdnReader$MapReader/invoke
                EdnReader.java:  142  clojure.lang.EdnReader/read
                EdnReader.java:  108  clojure.lang.EdnReader/read
                       edn.clj:   35  clojure.edn/read
                       edn.clj:   33  clojure.edn/read
                      AFn.java:  154  clojure.lang.AFn/applyToHelper
                      AFn.java:  144  clojure.lang.AFn/applyTo
                 Compiler.java: 3623  clojure.lang.Compiler$InvokeExpr/eval
                 Compiler.java:  439  clojure.lang.Compiler$DefExpr/eval
                 Compiler.java: 6787  clojure.lang.Compiler/eval
                 Compiler.java: 6745  clojure.lang.Compiler/eval
                      core.clj: 3081  clojure.core/eval
                      main.clj:  240  clojure.main/repl/read-eval-print/fn
                      main.clj:  240  clojure.main/repl/read-eval-print
                      main.clj:  258  clojure.main/repl/fn
                      main.clj:  258  clojure.main/repl
                   RestFn.java: 1523  clojure.lang.RestFn/invoke
        interruptible_eval.clj:   58  clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn
                      AFn.java:  152  clojure.lang.AFn/applyToHelper
                      AFn.java:  144  clojure.lang.AFn/applyTo
                      core.clj:  630  clojure.core/apply
                      core.clj: 1868  clojure.core/with-bindings*
                   RestFn.java:  425  clojure.lang.RestFn/invoke
        interruptible_eval.clj:   56  clojure.tools.nrepl.middleware.interruptible-eval/evaluate
        interruptible_eval.clj:  191  clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn
        interruptible_eval.clj:  159  clojure.tools.nrepl.middleware.interruptible-eval/run-next/fn
                      AFn.java:   22  clojure.lang.AFn/run
       ThreadPoolExecutor.java: 1142  java.util.concurrent.ThreadPoolExecutor/runWorker
       ThreadPoolExecutor.java:  617  java.util.concurrent.ThreadPoolExecutor$Worker/run
                   Thread.java:  745  java.lang.Thread/run




Dave Kincaid

unread,
Nov 25, 2015, 11:07:34 PM11/25/15
to Clojure
The number of keys in the map is 8,054,160.

Ghadi Shayban

unread,
Nov 25, 2015, 11:27:29 PM11/25/15
to Clojure
While in memory before writing, are the hash codes for the "duplicate" keys the same?   You can call (hash) on the keys.  I'm thinking there is perhaps an issue with unicode string serialization...  Are the question marks a particular character?

If you can find the similar strings in memory, before they are written, call:
(map int  the-string)
To see the actual unicode characters for the question marks.

Dave Kincaid

unread,
Nov 25, 2015, 11:40:53 PM11/25/15
to Clojure
The question marks are actual question marks. I'm not sure how to find the "duplicate" keys in the map in memory. As far as I can tell there is only one "? 5" key in the in memory map.

I thought maybe computing the frequencies of the hash values of the keys and looking for any with more than one would find them, but this code:

read-notes> (def dupes (filter #(> (second %) 1) (frequencies (map hash (keys phrases)))))
#'read-notes/dupes
read-notes> (count dupes)
8911

seems to indicate 8,911 keys with identical hash values.

Dave Kincaid

unread,
Nov 26, 2015, 12:04:06 AM11/26/15
to Clojure
I just tried outputting the map to an Avro file and read it back in. This works fine. That tells me that there is something wrong with the way that I'm trying to write the EDN file somehow.

Here is the code I used to output to Avro and read back:

(def schema (avro/parse-schema {:type :map :values :long}))
(with-open [out-file (avro/data-file-writer schema "/tmp/mednotes6153968756847768349/repl-write.avro")] (.append out-file phrases))
(def ps (with-open [in-file (avro/data-file-reader "/tmp/mednotes6153968756847768349/repl-write.avro")] (doall (seq in-file))))

I'm using the excellent abracad library :refer'd as avro.

Ghadi Shayban

unread,
Nov 26, 2015, 12:07:35 AM11/26/15
to Clojure
Does the phrases value in memory exactly match the payload roundtripped through Avro?
Reply all
Reply to author
Forward
0 new messages