CSV example

87 views
Skip to first unread message

Chris Dean

unread,
May 29, 2012, 12:50:56 PM5/29/12
to cascal...@googlegroups.com
I have a bunch of files that are comma separated with the first line
being the field name of each column in the file. The files don't all
have the same fields, but they all have a field named "price".

What we would like to do is process the files and create json output.
I'd also like to convert a field named to "price" from a string to an
integer.

What has me stuck is how to get the field names into the json
automatically. I'd assumed I'd use lfs-delimited and gen-nullable-vars
but can't figure it out. If anyone could provide an example or outline
that would be appreciated.

Cheers,
Chris Dean

Andy Xue

unread,
May 29, 2012, 5:06:10 PM5/29/12
to cascalog-user
i think you are better off just storing the field names in a separate
file that your app can read

also look into the TextDelimited class in Cascading ... you can
passing a list of field names and types -- if you set the "price"
field as integer, it will automatically coerce it from string to
integer

Chris Dean

unread,
May 29, 2012, 8:58:46 PM5/29/12
to cascal...@googlegroups.com
Andy Xue <and...@lumoslabs.com> writes:
> i think you are better off just storing the field names in a separate
> file that your app can read

Ok, thanks. I'll give that a try.

> also look into the TextDelimited class in Cascading ... you can
> passing a list of field names and types -- if you set the "price"
> field as integer, it will automatically coerce it from string to
> integer

I had hoped that TextDelimited would find the vars for me in some way.
In Cascading 2.0 the docs say:

It is assumed if sink/source fields is set to either Fields.ALL or
Fields.UNKNOWN and skipHeader or hasHeader is true, the field names
will be retrieved from the header of the file and used during
planning. The header will parsed with the same rules as the body of
the file.

Cheers,
Chris Dean

Chris K Wensel

unread,
May 30, 2012, 9:44:32 AM5/30/12
to cascal...@googlegroups.com
> I had hoped that TextDelimited would find the vars for me in some way.
> In Cascading 2.0 the docs say:
>
> It is assumed if sink/source fields is set to either Fields.ALL or
> Fields.UNKNOWN and skipHeader or hasHeader is true, the field names
> will be retrieved from the header of the file and used during
> planning. The header will parsed with the same rules as the body of
> the file.

This should work well in 2.0 (latest wips). Let me know otherwise on the Cascading user list.

ckw

--
Chris K Wensel
ch...@concurrentinc.com
http://concurrentinc.com

Reply all
Reply to author
Forward
0 new messages