There are several differences that could be factors. For example, the
Java version uses StreamTokenizer, while your Clojure version uses
String.split with a regex that gets recompiled for each line read.
> I've also noticed that there is a significant speed difference between
> conj and assoc, why is that?
> If I understand correctly both should only create the delta of the new
> elements and the old structure, however assoc appears to perform much
> better.
user=> (let [c 1000000 p [1 1]] (time (reduce #(conj % [%2 %2]) {}
(range c))) (time (reduce #(assoc % %2 %2) {} (range c))) nil)
"Elapsed time: 1544.180472 msecs"
"Elapsed time: 1894.318809 msecs"
nil
user=> (let [c 1000000 p [1 1]] (time (reduce #(conj % [%2 %2]) {}
(range c))) (time (reduce #(assoc % %2 %2) {} (range c))) nil)
"Elapsed time: 1549.159812 msecs"
"Elapsed time: 1594.18912 msecs"
That's a million items added to a hash-map each way in about 1.5
seconds -- not too shabby. And the speeds for conj vs. assoc seem
very close, though I'm actually seeing a slight advantage for conj.
And I'm sorry for what follows -- it's like a compulsion for me, and I
hope it doesn't put you off. Each of these functions takes the same
input and produces the same output as your original code, but each is
implemented a bit more succinctly:
(import '(java.io BufferedReader InputStreamReader))
(defn inc-count [words word]
(if (seq word)
(assoc words word (inc (words word 0)))
words))
(defn sort-words [words]
(reverse (sort (map (fn [[k v]] [v k]) words))))
(defn print-words [words]
(doseq [head words]
(println head)))
(defn read-words [words line]
(reduce inc-count words line))
(defn read-input []
(with-open [buf (BufferedReader. (InputStreamReader. System/in))]
(let [words (for [line (line-seq buf)] (.split line " "))]
(print-words (sort-words (reduce read-words {} words))))))
(time (read-input))
--Chouser