Bulk Data to HBase and Hbase Schema Design

52 views
Skip to first unread message

lakshminarayan

unread,
Oct 9, 2013, 3:18:13 AM10/9/13
to chenn...@googlegroups.com
Hi,

I have two questions

QUESTION 1: Schema design for Hbase

Assume I have  three tables as below in my RDBMS with all primary and foreign key relationships
 
(1) Orders (OrderID, OrderName,CustomerId,ProductId)
(2) Customers (CustomerId, CustomerName)
(3) Products (ProductId,ProductName)

I have a sql query with joins which generates a report (result) for me.
Now My requirement is make this happen in my Hadoop Cluster with Hbase.

I would like to know how to design my Hbase schema for this.
Is there any basic thumb rule in designing schema for Hbase.

QUESTION 2: Porting Bulk Data into Hbase

I guess there are multiple ways to insert bulk data into Hbase. Neglecting sqoop tool.
What are the basic ways to port bulk data into hbase
(1) Flat CSV File (Local File System) to HBase
(2) RDBMS Table data to HBase schema
(3) HDFS data to HBase (I guess we can have MR program can do this right?)

/Lakshmi

Sivakumar Rajasundaram

unread,
Oct 9, 2013, 3:52:57 AM10/9/13
to chenn...@googlegroups.com
Hi,

Why did you choose HBase in first place?
 
For your 2nd Question refer the below article,



Regards,
Sivakumar
9500145827


lakshminarayan

unread,
Oct 9, 2013, 4:59:26 AM10/9/13
to chenn...@googlegroups.com

HBase?  It is just part of my learning. While doing so.. When i got familiar with the some of the shell commands. I thought of bulk data insert..
I googled and got some use MR jobs for it. I just want to check how it is done in real world. The RDBMS table design is just for my convenience to ask and understand my problem.

/Lakshmi

Mahesh Sundaramurthy

unread,
Oct 9, 2013, 5:26:02 AM10/9/13
to chenn...@googlegroups.com
Hi Lakshmi
First of all, migrating from RDBMS to No-SQL is paradigm shift in how we think.
Not trying to put RDBMS concepts into HBase design is a big plus.

Please see if these helps

Some pointers to have in mind
1. Care should be taken in designing the Row-Key. It must be designed in a way that data is distributed across and not clog one region.
2. De-normalize your data as much as possible.

Thanks
Mahesh


--
You received this message because you are subscribed to the Google Groups "Hadoop Users Group (HUG) Chennai" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chennaihug+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

lakshminarayan

unread,
Oct 10, 2013, 4:56:27 AM10/10/13
to chenn...@googlegroups.com
Thanks Mahesh. I got very similar question in google (Thanks stackoverflow.com)

The question is like this...

i have a weather data base with 4 tables : province,city,station, instantHarvestinfo,dailyHarvestInfo and the relation between tables is parent-child: (province,city): R(1,m) (city,station):R(1,m) (statin,istantharvestInfo):R(1,m) (station,dailyHarvestInfo):R(1,m) i want put all of them in one bigtable in hbase and for echa one create a column family..but i dont know how define my row key...i think i need a nested row key that in each step get a split of my rowkey that related a comuln family and give me information of same cf..but how i cant define it? please help me

And the answer is:

I guess you are going to save huge amount of instantharvestInfo and dailyHarvestInfo for each station.

Since there is parent-child relationship in your data model, I think you could

design the schema as:

-------------------------------------------------------------------------
**Row-Key**:              Province + city + station + timestamp 
--------+---------------------+------------------------------------------
Family  | Qualifier           |          Value
--------+---------------------+------------------------------------------
        | istantharvestInfo   |        "value of istantInfo"
   F    +---------------------+------------------------------------------ 
        | dailyHarvestInfo    |        "value of dailyInfo"
--------+---------------------+------------------------------------------

Note that there is only one Family, because we should always make #family as small as possible.

http://stackoverflow.com/questions/18268106/nested-rowkey-in-hbase-tables

The Mahesh said..Two keys things in Hbase schema design is..

(1) De-Normalize the data
(2) Decide the rowkey

/Lakshmi.

Reply all
Reply to author
Forward
0 new messages