I created a HBase dataset with the schema below and an external Hive table to be able to query this dataset using Hive like this:
~~~~~~~~~~
Schema
{
"type": "record",
"name": "somename",
"partitions": [
{ "type": "identity", "source": "uid" }
],
"fields": [
{ "name": "uid", "type": "string",
"mapping": { "type": "column", "value": "cf:uid" } },
{ "name": "col1", "type": "string",
"mapping": { "type": "column", "value": "cf:col1" } },
{ "name": "col2", "type": "string",
"mapping": { "type": "column", "value": "cf:col2" } }
]
}
Hive table
CREATE EXTERNAL TABLE hive_ext_table(uid string, col1 string, col2 string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, cf:col1, cf:col2")
~~~~~~~~~~~~
If I query the Hive table using an existing uid (ie, row key), I get no rows returned unless I use the LIKE operator, for example:
SELECT * FROM hive_ext_table WHERE uid LIKE '%12345%'
I do notice that in the HBase dataset the row key has hex chars appended to the row key. Is this the reason why I can't query or I need to create the Hive table with different mappings or columns?
Thanks!