Yup, that's it. The memory tap gets serialized into the JobConf.
You can get around this with the lazy-generator in cascalog.ops. This function accepts a sequence, pours it into a SequenceFile and returns an hfs-seqfile tap. Here's an example for a sequence called "lazy-seq":
'[cascalog.ops :as c])
(with-fs-tmp [_ tmp-path]
(let [lazy-tap (lazy-generator tmp-path lazy-seq)]
(?<- (stdout)
[?field1 ?field2 ... etc]
(lazy-tap ?field1 ?field2)
...)))
You're going to run into issues if you try to use a sequence that's bound to a var, like this:
(def my-seq (for [x (range 1000000) y (range 1000000)] [x y]))
(with-fs-tmp ,,,using my-seq with lazy-generator,,,)
Since Clojure will hold on to the head of a sequence, bound to a var, this would cause the entire sequence to be realized in memory with no garbage collection. It's best to create the lazy sequence and pass it in as a function argument, or create it in a let binding.
Let me know if that all makes sense!