Wide Finder

6 views
Skip to first unread message

Rich Hickey

unread,
Dec 13, 2007, 3:48:04 PM12/13/07
to Clojure
You might have seen the dialog surrounding Tim Bray's WideFinder
project ( http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder
) a few months ago. I thought it might be interesting to try it in
Clojure, because it was implemented in so many languages and it is fun
to see the differences:


(import '(java.util.regex Pattern Matcher)
'(java.io BufferedReader InputStreamReader FileInputStream))

(defn wf [f]
(with-open rdr (new BufferedReader (new InputStreamReader (new
FileInputStream f) "US-ASCII"))
(let [p (. Pattern (compile "GET /ongoing/When/\\d\\d\\dx/(\\d\\d\
\d\\d/\\d\\d/\\d\\d/[^ .]+)"))
inc-match (fn [cnts line]
(let [m (. p (matcher line))]
(if (. m (find))
(let [k (. m (group))]
(assoc cnts k (inc (get cnts k 0))))
cnts)))
counts (reduce inc-match {} (line-seq rdr))]
(doseq cnt (take 10 (reverse (sort-by val counts)))
(prn (strcat (val cnt) ": " (key cnt)))))))

I haven't applied any effort to compete for speed with some of the
other implementations. Part of the 'challenge' was to parallelize the
solution in order to achieve a speedup on multi-core hardware. Many
people chimed in saying it wouldn't be faster because the task was too
granular and IO bound, and sure enough it isn't faster in parallel in
Clojure (using agents), but the transformation was easy, and I think
is interesting:

(defn wf* [f n]
(with-open rdr (new BufferedReader (new InputStreamReader (new
FileInputStream f) "US-ASCII"))
(let [p (. Pattern (compile "GET /ongoing/When/\\d\\d\\dx/(\\d\\d\
\d\\d/\\d\\d/\\d\\d/[^ .]+)"))
inc-match (fn [cnts line]
(let [m (. p (matcher line))]
(if (. m (find))
(let [k (. m (group))]
(assoc cnts k (inc (get cnts k 0))))
cnts)))
agents (map agent (replicate n {}))
dispatch (fn [agents-cycle line]
(! (first agents-cycle) inc-match line)
(rest agents-cycle))]
(reduce dispatch (cycle agents) (line-seq rdr))
(apply await agents)
(let [counts (apply merge-with + (map deref agents))]
(doseq cnt (take 10 (reverse (sort-by val counts)))
(prn (strcat (val cnt) ": " (key cnt))))))))

Note in particular how the inc-match function of the first version
became the agent action in the second without any change at all. If
one had a more compute-intensive task at hand, this would be all it
would take to parallelize it in Clojure.

Rich
Reply all
Reply to author
Forward
0 new messages