RCFile

30 views
Skip to first unread message

dmnk

unread,
Dec 5, 2011, 2:36:12 AM12/5/11
to cloudba...@googlegroups.com
Hi,

Is this possible to use RCFile[1] with CloudBase?

dmnk

unread,
Dec 21, 2011, 5:07:10 AM12/21/11
to cloudba...@googlegroups.com
Hi,

I want to implement RCFile support for cloudbase.
Can I have some instruction where to start?


Tarandeep

unread,
Dec 21, 2011, 3:12:45 PM12/21/11
to CloudBase
It might not be very hard depending upon how you store your data in
RCFile.

If you are planning to store a text line as a record, then there won't
be much problem implementing RCFile support, however if you are
planning to store a serializable objects there might be little extra
work.

All map reduce jobs (in cloudbase) that implement the sql logic use
TextInputFormat as input format. So you can go ahead and change them
to RCFileInputFormat (assuming there is one). Then as long as you get
a line of text separated by the field separator character as a record,
you are fine.

If you are storing serialized objects in your RC file then you can do
one of the two things-

1) have your record reader convert your fields into a tab/comma
separated string and pass it to mapper (not efficient)
2) Modify mappers to accept your serialized object as value and then
get individual fields (columns) out of it

If you start writing code for this, let me know. I will be happy to
help.

-Tarandeep

dmnk

unread,
Jan 4, 2012, 3:22:47 AM1/4/12
to cloudba...@googlegroups.com


Is this possible to get column names/numbers wich are in select statement in TextDataLoader.loadData function, so I can load only columns on which i will operate?


I have checkouted source from svn on sourceforge and working on branch 2.0
Reply all
Reply to author
Forward
0 new messages