missing documents

31 views
Skip to first unread message

Robert Fiser

unread,
Feb 22, 2016, 8:01:56 AM2/22/16
to HBase Indexer Users
Hi,
we'r missing a few(1 of every 50000) documents in index. Using MoprhlineToSolrMapper and logDebug command after extractHBaseCells to be sure that missing documents never came to morphline.
I would appreciate any hint

Robert

Gabriel Reid

unread,
Feb 22, 2016, 9:40:06 AM2/22/16
to Robert Fiser, HBase Indexer Users
Hi Robert,

Are you missing exactly 1 out of every 50k documents? Or is it just
something close to that number?

Are you able to identify the specific records that aren't making it
into the index?

These kinds of things are typically due to something like two row keys
being mapped to the same Solr document id, although this isn't always
easy to find. If you don't have specific information about the actual
records causing this problem, I would try one (or both) of the
following:
1. enable DEBUG logging for hbase-indexer and see if there are record
updates that are being unexpectedly not sent to Solr (or something
related)
2. follow the metrics that are published via JMX while you are
ingesting data into HBase -- these metrics provide information on
counts of record that make it through various stages in the indexing
pipeline, so should help determine where a record is stopping if it
isn't being indexed

- Gabriel
> --
> You received this message because you are subscribed to the Google Groups
> "HBase Indexer Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to hbase-indexer-u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Robert Fiser

unread,
Feb 22, 2016, 10:01:15 AM2/22/16
to HBase Indexer Users, robert...@socialbakers.com
Hi Gabriel,
50k is just a guess. We'r missing a few documents, sometimes less and sometimes more.
I don't think it's a solr problem or even an hbase-indexer problem. I think might be a problem on hbase site maybe in SEP module because I'm pretty sure that missing documents never came into morphline. We'r indexing a facebook posts and logging all posts of some profile. After check hbase table, solr index and hbase-indexer logs we are missing some documents in index and hbase-indexer log but present in hbase table.

{ if {
conditions : [ { equals { "d:profile_id" : ["XXX"] } } ]
then : [ { logDebug { format : "INCR_AGG, FB_POSTS: Debug: {}", args : ["@{}"] } } ]
} }

I'm going to check JMX metrics.
Thanx

Robert
Reply all
Reply to author
Forward
0 new messages