with-open and for

158 views
Skip to first unread message

Steven D. Arnold

unread,
Jun 8, 2013, 12:53:05 AM6/8/13
to clo...@googlegroups.com
Hi, I am trying to write a function to extract words from a file that are four characters or more, and nine characters or less.  Many words could appear on a single line, so the implementation needs to combine the words from all the lines.  I wrote a function to do this, but I am getting an error that the stream is closed.

The function and error are below.  The idea is that we iterate over the lines in the file, splitting each line on whitespace and selecting those that meet the length requirements, giving us a list of lists, which is then flattened into a single list.

Most examples online of 'with-open' use it in conjunction with 'doseq', but I don't think doseq returns the value of expressions like for does.

As a Clojure newb, I'd welcome any feedback, but in particular I'm interested in what's going on with the closed stream.  Any thoughts?

(defn filter-file
 []
 (with-open [rdr (reader "/Users/thoth/wordlist.txt")]
   (flatten
     (for
       [line (line-seq rdr)]
       (filter
         (and #(<= (count %) 9)
              #(>= (count %) 4))
         (split line #"\s+"))))))

The error is below.  The 'ha-ha' and 'splenetic' are the first couple words in the wordlist.  In case you're wondering, they come from dictionary.com's previous words of the day.

[ 09:39 PM (6) theorem:thoth ~/Source/clojure/test ] > lein run
(ha-ha splenetic Exception in thread "main" java.io.IOException: Stream closed
[...lengthy traceback omitted, but the "for" line seems to be the one that triggers the error]

Thanks in advance!
steven

Lars Nilsson

unread,
Jun 8, 2013, 1:02:42 AM6/8/13
to clo...@googlegroups.com
On Sat, Jun 8, 2013 at 12:53 AM, Steven D. Arnold
<thoth.a...@gmail.com> wrote:
> (defn filter-file
> []
> (with-open [rdr (reader "/Users/thoth/wordlist.txt")]
> (flatten
> (for
> [line (line-seq rdr)]
> (filter
> (and #(<= (count %) 9)
> #(>= (count %) 4))
> (split line #"\s+"))))))
>
> The error is below. The 'ha-ha' and 'splenetic' are the first couple words
> in the wordlist. In case you're wondering, they come from dictionary.com's
> previous words of the day.

Wrap flatten in a doall so the lazy sequence is realized before the
with-open closes the stream.

Lars Nilsson

Thomas Heller

unread,
Jun 8, 2013, 5:28:12 AM6/8/13
to clo...@googlegroups.com
Hey,

for produces a lazy sequence (as does flatten) which is hurting you here. You could wrap everything in a doall but I'd recommend using reduce since thats technically what you want here. I'd probably go for something like:

(defn filter-file [filename]
(with-open [rdr (io/reader filename)]
(reduce (fn [words line]
(->> (str/split line #"\s+")
(filter #(and (<= (count %) 9)
(>= (count %) 4)))
(set)
(set/union words)))
#{}
(line-seq rdr))))

Wether you want a set or a vector is up to you, set seems more logical to me here. Didn't check how expensive set/union is, might be better to use concat with a final set (after reduce) if there are a lot of lines.

Anyways, use reduce. ;)

HTH,
/thomas

Steven D. Arnold

unread,
Jun 8, 2013, 10:16:21 PM6/8/13
to clo...@googlegroups.com
Thanks for the responses!  As suggested, wrapping in 'doall' does work.


On Jun 8, 2013, at 3:28 AM, Thomas Heller <th.h...@gmail.com> wrote:

(defn filter-file [filename] 
 (with-open [rdr (io/reader filename)] 
   (reduce (fn [words line]
             (->> (str/split line #"\s+")
                  (filter #(and (<= (count %) 9) 
                                (>= (count %) 4)))
                  (set)
                  (set/union words)))
           #{}
           (line-seq rdr)))) 

That code is really graceful and clean.  I like it a lot.  But for some reason I've never loved 'reduce' before, which probably means I've never used it where it is called for.  Reduce just seems so generic... it's what you say when you haven't got anything better to say, something like "all right, do this."

But, having said that, I'd pick your implementation over mine, because I think it's conceptually cleaner (as recursive algorithms often are).  Nice.  Thanks!

steven

Alan Thompson

unread,
Jun 10, 2013, 12:01:49 PM6/10/13
to clo...@googlegroups.com
Hey Thomas - How'd you get the nice syntax highlighting in your post?
Alan


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Thomas Heller

unread,
Jun 10, 2013, 1:20:49 PM6/10/13
to clo...@googlegroups.com
Hey,

I pasted the code into gist ( https://gist.github.com/thheller/5734642 ) and copy&pasted that into the post.

Cheers,
/thomas


You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/5o7iIrQQlR4/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.

Ray Miller

unread,
Jun 11, 2013, 5:04:14 AM6/11/13
to clo...@googlegroups.com
The 'reduce' solution is very elegant, but you can simplify it further:

(defn filter-file [filename]
(with-open [rdr (io/reader filename)]
(reduce (fn [words line]
(into words (filter #(<= 4 (count %) 9) (str/split line #"\s+"))))
#{}
(line-seq rdr))))

Ray.

Jean Niklas L'orange

unread,
Jun 11, 2013, 9:09:42 AM6/11/13
to clo...@googlegroups.com
I haven't seen the use of multiple statements in the for comprehension here, so perhaps it's nice to elaborate that this exists?

(defn filter-file [filename]
  (with-open [rdr (io/reader filename)]
    (set
     (for [line (line-seq rdr)
           word (str/split line #"\s+")
           :when (<= 4 (count word) 9)]
       word))))

I would believe that is one of the more evident ways of writing it. If you would like duplicates, replace set with doall.

 -- JN

Meikel Brandmeyer (kotarak)

unread,
Jun 11, 2013, 9:25:40 AM6/11/13
to clo...@googlegroups.com
Or another one:

(defn filter-lines
  [rdr]
  (->> (line-seq rdr)
    (mapcat #(str/split % #"\s+"))
    (filter #(<= 4 (count %) 9))
    (into #{})))

(defn filter-file
  [filename]
  (with-open [rdr (io/reader filename)]
    (filter-lines rdr)))

Meikel

John D. Hume

unread,
Jun 11, 2013, 10:00:40 AM6/11/13
to clo...@googlegroups.com

On Jun 11, 2013 8:25 AM, "Meikel Brandmeyer (kotarak)" <m...@kotka.de> wrote:
> Or another one:
>
> (defn filter-lines
>   [rdr]
>   (->> (line-seq rdr)
>     (mapcat #(str/split % #"\s+"))
>     (filter #(<= 4 (count %) 9))
>     (into #{})))
>
> (defn filter-file
>   [filename]
>   (with-open [rdr (io/reader filename)]
>     (filter-lines rdr)))

I like this split a lot, though I'd prefer to pass the line-seq to filter-lines. Then you have an extremely simple impure fn and all your logic in an easy-to-test, easy-to-reuse pure function.

Reply all
Reply to author
Forward
0 new messages