Interesting. I find it odd, though, that that would result in
expensive-function being called 32 times. I'd expect mapcat to cause
it to be called once; so you'd realize the first 32 elements of (range
0 4000 100) and consume one, and realize the first 100 elements of
(mapcat expensive-function (range 0 4000 100)) and consume five.
Is mapcat also semi-eager, then?
Chunking is dependent upon the type of seq being traversed, which is in turn dependent upon the type of collection underlying the seq. Ranges always produce chunked seqs, as do non-empty vectors for example. If chunking is a concern, you can always fall back to seqs "grounded" in lists, which are always unchunked (and therefore yield one-at-a-time behaviour with e.g. map):
=> (->> (range 50)
(map println)
first)
0
1
2
...
31
nil
vs…
=> (->> (range 50)
(mapcat list)
(map println)
first)
0
nil
I'm not sure that this means that map, concat, etc are eager in practical terms. I view the chunking in much the same light as transients vis á vis immutability of persistent data structures -- the result is a relative perf improvement that doesn't impact semantics in the vast majority of cases. In my experience, chunking is only detrimental when mapping side-effecting (usually IO-related) functions across collections, as you're doing. Given that, using a (mapcat list) interstitial to get unchunked seqs is inconsequential w.r.t. perf, etc. I'd be interested in hearing any different perspectives.
FYI, chunked-seq? can be used to determine if a seq supports chunking or not.
- Chas
> I spent a long time debugging some Clojure code yesterday. The
> essence of it looked similar to this:
>
> (defn items []
> (mapcat expensive-function (range 0 4000 100)))
>
> . . . (take 5 (items)) . . .
I tried to distill the problem down by defining a non-chunking range
function (a simple variant), then working outward from there to see what
else is evaluating a surprising number of times.
,----
| (defn non-chunked-range
| [start end]
| (prn (format "In non-chunked-range with %d and %d." start end))
| (lazy-seq
| (when-not (= start end)
| (cons start (non-chunked-range (inc start) end)))))
`----
Note that `mapcat` is defined in terms of `map` and `concat`. First
let's confirm that `map` is not eager:
,----
| ;; Draws one, and evaluates lazy sequence function twice:
| (take 1
| (map #(list %)
| (non-chunked-range 0 10)))
`----
Experimenting with the argument to `take` shows that the lazy sequence
function is evaluated as expected: a number of times equal to the
argument plus one for the terminal case (n + 1).
Now add `concat` into the mix to make sure it's not eager:
,----
| ;; Draws two, and evaluates lazy sequence function three times:
| (concat (take 2 (non-chunked-range 0 10)))
`----
That works as expected. Now add `apply` to `concat` as `mapcat` does to
flatten the input lists:
,----
| ;; Draws one, and evaluates lazy sequence five times:
| (take 1
| (apply concat
| (map #(list %)
| (non-chunked-range 0 10))))
`----
Whoah! Where did the extra three evaluations of the lazy sequence
function come from? Note that this one calls on the function /five/
times.
Here is the mapping of the argument to `take` and the number of times
the function is called:
take calls
==== =====
0 5
1 5
2 5
3 5
4 6
5 7
...
n n + 2
I read the source for `concat`, but I don't see what it's doing to force
the extra evaluations both below four arguments and the extra one
(yielding n + 2) with four or more arguments. What's responsible for
this difference in behavior?
--
Steven E. Harris