Clojure example code snippets

176 views
Skip to first unread message

dokondr

unread,
May 19, 2011, 10:43:46 AM5/19/11
to Clojure
Hi!
I am thinking about using Clojure for distributed NLP.
Being an absolute newbie in Clojure I look for nice expressive code
snippets.

For example, I need an easy way to read text files such as in the
following Python code:
>>> for line in open("file.txt"):
... for word in line.split():
... if word.endswith('ing'):
... print word

What would be an equivalent of this code in Clojure?

Thanks,
Dmitri

Meikel Brandmeyer

unread,
May 19, 2011, 10:52:33 AM5/19/11
to clo...@googlegroups.com
Hi,

something like the following should work.

(with-open [rdr (java.io.FileReader. "file.txt")]
  (doseq [line (line-seq rdr)
          word (.split line "\\s")]
    (when (.endsWith word "ing")
      (println word))))

Sincerely
Meikel

Jonathan Fischer Friberg

unread,
May 19, 2011, 11:00:46 AM5/19/11
to clo...@googlegroups.com
There is clojure.contrib.duck-streams/read-lines
http://clojuredocs.org/clojure_contrib/clojure.contrib.duck-streams/read-lines

Then it's a matter of

(filter (partial re-matches #".*ing") (read-lines "/path/to/file"))

Jonathan



--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Benny Tsai

unread,
May 19, 2011, 11:33:22 AM5/19/11
to clo...@googlegroups.com
I think there can be multiple words on each line, so they have to be split into words first.  Maybe something like:

(ns example
  (:use [clojure.contrib.duck-streams :only (read-lines)]))

(let [lines (read-lines "file.txt")
      words (mapcat #(.split % "\\s") lines)
      ing-words (filter (partial re-matches #".*ing") words)]
  (doseq [ing-word ing-words]
    (println ing-word)))

Benny Tsai

unread,
May 19, 2011, 11:36:03 AM5/19/11
to clo...@googlegroups.com
I think line-seq needs a java.io.BufferedReader instead of a java.io.FileReader.  clojure.java.io has a reader function that constructs a java.io.BufferedReader from a filename, so this worked for me:

(ns example
  (:use [clojure.java.io :only (reader)]))

(with-open [rdr (reader "file.txt")]

Benny Tsai

unread,
May 19, 2011, 11:41:52 AM5/19/11
to clo...@googlegroups.com
Oops.  Just noticed that the original was not quoted in either of my previous emails, which makes things really confusing.  My first reply (the one using read-lines) was an extension of odyssomay/Jonathan's code, and the second (with reader) was an extension of Meikel's code.  Sorry guys.

dokondr

unread,
May 19, 2011, 2:40:59 PM5/19/11
to Clojure
Thanks everybody! The short one from Meikel (above) looks nice to
me :)

And the one from ClojureDocs too:

(defn read-lines
"Like clojure.core/line-seq but opens f with reader. Automatically
closes the reader AFTER YOU CONSUME THE ENTIRE SEQUENCE."
[f]
(let [read-line (fn this [^BufferedReader rdr]
(lazy-seq
(if-let [line (.readLine rdr)]
(cons line (this rdr))
(.close rdr))))]
(read-line (reader f))))

Armando Blancas

unread,
May 19, 2011, 3:12:28 PM5/19/11
to Clojure
Just in case I'll mention that Meikel's use of (with-open) will
automatically close the reader.

Andreas Kostler

unread,
May 19, 2011, 5:47:28 PM5/19/11
to clo...@googlegroups.com
Hi Armando,
I'm working on a Clojurej library for sentiment analysis which doesn't contain everything you'd want for nlp but quite a nice subset of input modules (plain text corpora, rss feeds, html, etc...),
tokenising/normalising filters (noise removal, porter stemmer, etc), distance/similarity metrics (euclidean, cosine, peasrsons, Jaccard/Tanimoto), b-o-w vector representation, clustering (hierarchical, k-means), classification (NN, Bayes, knn), and some other little tidbits to tie up loose ends. There will be a first release in about 2-3 weeks time. If you're planning on doing work in that direction, maybe we could join forces :)
Kind Regards
Andreas

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

--
"Test-driven Dentistry (TDD!) - Not everything should be test driven"
- Michael Fogus
--
**********************************************************
Andreas Koestler, Software Engineer
Leica Geosystems Pty Ltd
270 Gladstone Road, Dutton Park QLD 4102
Main: +61 7 3891 9772 Direct: +61 7 3117 8808
Fax: +61 7 3891 9336
Email: andreas....@leica-geosystems.com

************www.leica-geosystems.com*************

when it has to be right, Leica Geosystems

Please consider the environment before printing this email.

Ken Wesson

unread,
May 19, 2011, 5:51:32 PM5/19/11
to clo...@googlegroups.com
On Thu, May 19, 2011 at 5:47 PM, Andreas Kostler
<andreas.koe...@gmail.com> wrote:
> Hi Armando,
> I'm working on a Clojurej library for sentiment analysis which doesn't contain everything you'd want for nlp but quite a nice subset of input modules (plain text corpora, rss feeds, html, etc...),
> tokenising/normalising filters (noise removal, porter stemmer, etc), distance/similarity metrics (euclidean, cosine, peasrsons, Jaccard/Tanimoto), b-o-w vector representation, clustering (hierarchical, k-means), classification (NN, Bayes, knn), and some other little tidbits to tie up loose ends. There will be a first release in about 2-3 weeks time. If you're planning on doing work in that direction, maybe we could join forces :)
> Kind Regards
> Andreas

Whoa, what the heck are you doing, trying to build Skynet? :)

Then again, Lisp HAS traditionally been the preferred language of AI hackers...

--
Protege: What is this seething mass of parentheses?!
Master: Your father's Lisp REPL. This is the language of a true
hacker. Not as clumsy or random as C++; a language for a more
civilized age.

Reply all
Reply to author
Forward
0 new messages