Thanks for this. I'm still not quite there yet though. My data is
tab delimited and I ultimately only care about the 1st, 2nd, 5th, and
7th field. There are 50-100 columns. Unsure how to use destructuring
to get this.
Also, unclear on how to invoke the function with the right kind of
quoting around the s3:// path.
So far, I have this (basically unaltered from your example of the
follows-data function)...
(defn get-my-data [dir]
(let [source (hfs-textline dir)]
(<-[ ?p ?p2] (source ?line) (c/re-parse [#"[^\s]+"] ?line :> ?p ?p2)
( :distinct false))))
(get-my-data "s3://path.to.my.bucket/folder/part-*")
user=> (get-my-data "s3://path.to.my.bucket/folder/part-*")
{:type :generator, :id "34a7407c-b28f-4987-a594-
ad4415b86d42", :ground? true, :sourcemap {"892d9ba0-28bb-4808-
a772-5de1099921a4" #<Hfs Hfs["TextLine[['line']->[ALL]]"]["s3://
path.to.my.bucket/folder/part-*"]"]>}, :pipe #<Each
Each(892d9ba0-28bb-4808-a772-5de1099921a4)
[Identity[decl:ARGS]]>, :outfields ["?p__gen8" "?p2__gen9"]}
user=>
What am I doing wrong?
Many thanks,
~A