How to get rid of bad document from the pipeline

71 views
Skip to first unread message

Pankil

unread,
Nov 18, 2015, 7:57:27 PM11/18/15
to HBase Indexer Users
Hi Gabriel,

I am facing an issue where I added a document in hbase which doesnt map correctly to solr schema. 


For eg:

Caused by: java.lang.RuntimeException: org.apache.solr.client.solrj.impl.CloudSolrServer$RouteException: ERROR: [doc=c800b895-21ec-4f2a-a0b0-92edd22909d8] multiple values encountered for non multiValued field property: [fb-1, fb-1]

at com.ngdata.hbaseindexer.indexer.IndexingEventListener.processEvents(IndexingEventListener.java:102)

at com.ngdata.sep.impl.SepEventExecutor$1.run(SepEventExecutor.java:97)

... 5 more

Caused by: org.apache.solr.client.solrj.impl.CloudSolrServer$RouteException: ERROR: [doc=c800b895-21ec-4f2a-a0b0-92edd22909d8] multiple values encountered for non multiValued field property: [fb-1, fb-1]

at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:360)

at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533)

at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)

at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)

at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)

at com.ngdata.hbaseindexer.indexer.DirectSolrInputDocumentWriter.add(DirectSolrInputDocumentWriter.java:112)

at com.ngdata.hbaseindexer.indexer.Indexer.indexRowData(Indexer.java:156)

at com.ngdata.hbaseindexer.indexer.IndexingEventListener.processEvents(IndexingEventListener.java:99)

... 6 more



Now it is blocking the entire pipeline. Is that correct? And what would be best way to deal with such issue ? Something like failure queue ?


Thanks,

Pankil

Gabriel Reid

unread,
Nov 19, 2015, 2:54:26 AM11/19/15
to Pankil, HBase Indexer Users
Hi Pankil,

This exception is being interpreted as an error in communicating with
Solr (due to the error code that Solr is returning). In this case, the
document will be retried indefinitely until the Solr issue is
resolved. In this specific case, that would be fixing the field
definition that is not marked as a multi-valued field in your Solr
schema.

The general strategy taken within hbase-indexer is that if Solr
responds with a HTTP 400 status, indicating a bad request or something
wrong with the Solr document being written, the document will simply
be dropped by the indexer, as there is no reason to retry it. If Solr
responds with a different HTTP error status (e.g. 500), then the
document will be retried indefinitely until Solr stops running into
errors on the server side.

- Gabriel
> --
> You received this message because you are subscribed to the Google Groups
> "HBase Indexer Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to hbase-indexer-u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Gabriel Reid

unread,
Nov 20, 2015, 7:34:54 AM11/20/15
to Pankil Doshi, hbase-ind...@googlegroups.com
Re-adding hbase-indexer group.

To answer your question, the logic is not that one bad document can
block the whole queue -- instead, the intention is that if there is a
problem with Solr (and it is returning a 500 status code) then this
will block the queue, with the reason being that there is no use
attempting to move on to indexing other documents if there is
something wrong with Solr.

On the other hand, if there is something wrong with a specific
document (and not with Solr itself), that specific document will be
dropped. This functionality is dependent on Solr returning accurate
status codes (i.e. 50x if there is a server-specific issue, and 40x if
there is an issue with the request being sent to Solr).

This code is in the add method of DirectSolrInputDocumentWriter [1].

- Gabriel

1. https://github.com/NGDATA/hbase-indexer/blob/master/hbase-indexer-engine/src/main/java/com/ngdata/hbaseindexer/indexer/DirectSolrInputDocumentWriter.java#L101

On Thu, Nov 19, 2015 at 6:28 PM, Pankil Doshi <forp...@gmail.com> wrote:
> Hi Garbriel,
>
> So will one poison pill block the whole queue? So for eg, if I got one 500
> response back will it keep on re-trying the same document forever until that
> is resolved or will it move forward with other requests coming in along with
> retrying that one ?
>
> Also, can you point me to the classes handling this logic ? I would like to
> see if I can extend that one to push such messages to separate queues.
>
> Pankil

Pankil

unread,
Mar 10, 2016, 8:06:29 PM3/10/16
to HBase Indexer Users, forp...@gmail.com
Hi Gabriel,

There are cases where solr gives wrong error codes. 
For eg:

ERROR: [doc=5c606f8b-1cd1-4dfb-8ae1-92875d2582c1/5!instagram/3!061c3232-40a9-369a-9255-c9b2e6836c72] Error adding field 'metadata_testsecond_l'='1457572311L' msg=For input string: "1457572311L"

Error code is 500 and not 40x. And hence as per design this is blocking entire queue. 


Do you have a good suggestion on dropping this particular item and moving forward ?


Pankil

Gabriel Reid

unread,
Mar 11, 2016, 3:20:21 AM3/11/16
to Pankil, hbase-ind...@googlegroups.com
Hi Pankil,

Can you look into the underlying problem as to why Solr is returning a
500 code for this document?

If there is really nothing wrong with Solr, then it sounds like this
is a bug in Solr. I suppose it could be handled by altering the
handling of responses with a 50x status code in hbase-indexer to do
further inspection of the response.

Unfortunately, I don't know of a way to just skip this one document in
the pipeline.

One option might be to remove the indexer and then re-create it, this
will drop everything that is currently in the replication pipeline and
start from new documents after the indexer has been re-created.

- Gabriel
>> >> > email to hbase-indexer-u...@googlegroups.com.
>> >> > For more options, visit https://groups.google.com/d/optout.
>> >
>> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "HBase Indexer Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to hbase-indexer-u...@googlegroups.com.

Pankil

unread,
Mar 17, 2016, 4:18:17 PM3/17/16
to HBase Indexer Users, forp...@gmail.com
Thanks Gabriel.
>> >> > For more options, visit https://groups.google.com/d/optout.
>> >
>> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "HBase Indexer Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an

Arjun K

unread,
Mar 21, 2016, 1:08:00 AM3/21/16
to HBase Indexer Users, forp...@gmail.com
Hi Pankil,

We faced the same issue in our environment. 
We got this issue when we were trying to index a column which has multiple values as a single valued solr field. 

The work around/fix for this is to update the Hbase-indexer xml file (the file where we specify the columns to be indexed) . You dont have to drop it.

For example, if the column xyz has multiple values, then we can specify the solr field as follows.

<field name="xyz_ss" value="info:xyz"/>

Hope this helps
Arjun
Reply all
Reply to author
Forward
0 new messages