Join data from compressed file and uncompressed file

51 views
Skip to first unread message

Kang Tu

unread,
Mar 7, 2013, 4:49:31 PM3/7/13
to cascal...@googlegroups.com
Hi,

I need to join two files. One is compressed sequence file (maybe I should use hfs-seqfile tap) and the other one is not compressed, tab delimited file (maybe I should use hfs-delimited).

I wonder if I can do it in cascalog?

Thanks in advance

Kang

David Kincaid

unread,
Mar 7, 2013, 5:01:15 PM3/7/13
to cascal...@googlegroups.com
I think you answered your own question. Create two taps using hfs-seqfile for the compressed file and hfs-delimited for the tab delimited file. Then create a query that uses the two taps and does your join.

Dave

Sam Ritchie

unread,
Mar 7, 2013, 5:01:30 PM3/7/13
to cascal...@googlegroups.com
Yup, totally possible.

March 7, 2013 1:49 PM
--
You received this message because you are subscribed to the Google Groups "cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
Sam Ritchie, Twitter Inc
703.662.1337
@sritchie

Kang Tu

unread,
Mar 7, 2013, 5:47:47 PM3/7/13
to cascal...@googlegroups.com
Hi Dave,

Thanks for replying. What I am not sure is:

If the hfs-seqfile is the compressed format by default?

If it is not, how can I set "compressed" option for one tap and "non-compressed" option for another tap? I know there might be some option in with-job-conf but it looks like a global option and cannot be applied to individual.

Thanks

Kang

Paul Lam

unread,
Mar 10, 2013, 4:27:20 AM3/10/13
to cascal...@googlegroups.com
Hi Kang,

hfs-delimited is non-compressed by default. For a general solution, say if you have one hfs-seqfile that is compressed and another hfs-seqfile that is not compressed or using a different compression method, you can use cascalog-checkpoint and have each sourcing step using its own (with-job-conf) to set compression properties.



Paul

Kang Tu

unread,
Mar 10, 2013, 8:11:55 PM3/10/13
to cascal...@googlegroups.com
Great suggestion. Thank you Paul.

--
You received this message because you are subscribed to a topic in the Google Groups "cascalog-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cascalog-user/00EgoRaoOFU/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to cascalog-use...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages