Hbase tap modifications to read all columns from a column family

208 views
Skip to first unread message

Pranav

unread,
Jul 21, 2012, 3:48:43 PM7/21/12
to cascadi...@googlegroups.com
Hello, 

My Hbase table schema is such that each row has different columns for the same column family(timestamp is a part of column name) and I want to read all columns for a particular column family. 
Cascading/maple does not seem to support this yet, so I am implementing this in a fork. 

I have been looking around in the cascading source code and it seems that 'source' method in cascading.scheme.Scheme does all the actual reading from the database. 
Assuming that by using the hbase java api, the column names can be extracted for each row, is it possible to set the columns that must be read dynamically for each row? 


Pranav

unread,
Jul 23, 2012, 5:54:29 AM7/23/12
to cascadi...@googlegroups.com
Ok, If I create one Field in cascading, and dump data from all columns into that field - that will work too. Only thing I need to figure out is how to get cascading to read data from all hbase columns in a column family, rather than read it using the field specified(which does not correspond to any actual field)

Any suggestions on how to do that? 

Pranav

unread,
Jul 24, 2012, 3:21:53 AM7/24/12
to cascadi...@googlegroups.com
would appreciate it if someone would let me know if what I am trying to do is at all possible. i.e. - Reading data from all columns in the table, and put that data into one cascading Field. 

Thanks!

Chris K Wensel

unread,
Jul 24, 2012, 9:58:01 AM7/24/12
to cascadi...@googlegroups.com
if you aren't getting an suitable answer, it's likely no one who's watching the list has attempted this.

that said, once you sort it out, would love to hear how it went so others can search this list if they have a similar issue.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/--Qdrrj6CzcJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.


Pranav

unread,
Aug 2, 2012, 8:47:21 AM8/2/12
to cascadi...@googlegroups.com
This class - org.apache.hadoop.hbase.mapred.TableRecordReaderImpl is the one that actually reads from hbase. I extended it to not query using the column names, but return data from all columns in a column family. Then I changed the HbaseTap(https://github.com/Cascading/maple/blob/master/src/jvm/com/twitter/maple/hbase/HBaseScheme.java) so that it creates one Cascading Fields corresponding to column families rather than columns and inserted all column data joined with separators (so I can split it later)

If this is something that other people would like too, i could create a pull request or fork cascading/maple...

Pranav. 

On Sunday, July 22, 2012 1:18:43 AM UTC+5:30, Pranav wrote:
Reply all
Reply to author
Forward
0 new messages