Can HBase Indexer handle huge number of HBase columns for indexing

33 views
Skip to first unread message

Arjun K

unread,
Feb 9, 2016, 8:04:40 AM2/9/16
to HBase Indexer Users
Hi All,

we have done a POC on indexing HBase data in Solr using hbase-indexer. Hbase-indexer does this amazingly fast for 1.3M rows and fits really good for our use-case. 
But looking into future we are estimating to have about 10,000 columns to start with and may reach something close to 200,000 columns in 2yrs down the line.
We would like to know if we can handle such volume with hbase-indexer.
Any advice on this will be a great help.

Thanks
Arjun

Gabriel Reid

unread,
Feb 11, 2016, 10:41:43 AM2/11/16
to Arjun K, HBase Indexer Users
Hi Arjun,

Solr (and/or HBase) will typically be the limiting factor in indexing,
not hbase-indexer (although this depends a little bit on your indexer
configuration).

You say that you're going to have 10000 columns, are you referring to
the number of columns for a single row? In that case, I would expect
that Solr will have some major issues in indexing documents with that
many fields.

In any case, I haven't yet encountered a situation where the
hbase-indexer was the limiting factor in indexing throughput. Seeing
as the main thing that it is doing is reading from HBase (which
involves waiting) and writing to Solr (which also involves waiting),
HBase and Solr have always been the limiting factor.

- Gabriel
> --
> You received this message because you are subscribed to the Google Groups
> "HBase Indexer Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to hbase-indexer-u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Arjun K

unread,
Feb 11, 2016, 11:47:02 PM2/11/16
to HBase Indexer Users, nagarju...@gmail.com
Hi Gabriel,

Many thanks for the response.

Yes we will have 10000 columns in single HBase row to start with.
But this number will decrease in Solr, as we are using wild cards while indexing. something like following 

<field name="CEB_ss" value="info:CEB_*"/>
<field name="CEO_ss" value="info:CEO_*"/>

and indexed data in Solr will be something like as follows

"CEB_ss": [ "9aS",
"UVe" ],

so number of columns in Solr will be around 450 - 500 columns at max and to start with we will have about 100-150 columns and this will vary per Row in HBase and Solr.
What would be the ideal number of columns Solr can handle? Any input on this would be a great help.

Many Thanks
Nagarjuna 
Reply all
Reply to author
Forward
0 new messages