Listening to indexing events

219 views
Skip to first unread message

Olivier Armand

unread,
Jan 7, 2015, 6:33:33 AM1/7/15
to hbase-ind...@googlegroups.com
Hi,

We need to be able to listen to specific changes to the Solr collections performed by the Lily indexer to notify consumer systems.
We are expecting to be able to hook a Java Listener to specific indexes, to which would be passed information regarding the event (event type, attribute values), but can't find any. Is this use case currently supported?

Best regards,

-- Olivier

Gabriel Reid

unread,
Jan 7, 2015, 11:05:14 AM1/7/15
to hbase-ind...@googlegroups.com, Olivier Armand
Hi Olivier,

No, this specific use case isn't currently supported within hbase-indexer.

What is supported is listening to the underlying change events in
HBase, by using the same underlying library as hbase-indexer uses.
This is the hbase-sep sub-module within the hbase-indexer project
(https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep). If
this is something that might help you with what you're trying to do,
try checking out the hbase-sep-demo sub-module first to see how it
works.

- Gabriel
> --
> You received this message because you are subscribed to the Google Groups
> "HBase Indexer Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to hbase-indexer-u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Olivier Armand

unread,
Jan 9, 2015, 8:04:38 AM1/9/15
to hbase-ind...@googlegroups.com, olivier...@gmail.com
Grabiel,

Thank you, SEP could probably do.
We are also considering adding a custom command in the indexer's Morphline for our event listener, as one of our concerns is to trigger it not too long before the indexation in Solr. Would you also recommend this alternative?

Best regards,

-- Olivier

Gabriel Reid

unread,
Jan 9, 2015, 8:17:45 AM1/9/15
to hbase-ind...@googlegroups.com, Olivier Armand
Hi Olivier,

Adding a custom command in the indexer's morphline will probably be a
better fit than building your own SEP listener if you want to run your
trigger before indexing happens. The reason for this is that two
different SEP listeners will not necessarily remain in sync -- if one
listener can process incoming messages faster than another (i.e. your
own implementation might be quicker than the full round trip to Solr),
then it will keep processing things at its own speed. In this way,
your custom process could be processing messages before or after they
are indexed in Solr.

If you add a custom command to your indexing morphline, you will at
least have the guarantee that your custom code will always be run
before the related record is indexed in Solr.

- Gabriel

Olivier Armand

unread,
Jan 9, 2015, 8:44:13 AM1/9/15
to hbase-ind...@googlegroups.com, olivier...@gmail.com
I would actually prefer the listener to be always triggered *after* the indexing, but as I understand it isn't possible neither with SEP or Morphline, as indexing is performed by the hbase-indexer after the Morphline commands.

-- Olivier

Gabriel Reid

unread,
Jan 9, 2015, 9:28:05 AM1/9/15
to hbase-ind...@googlegroups.com, Olivier Armand
I think that it's also possible to have an explicit "writeToSolr"
command in your morphline, and then you could add your custom logic
just after that in your morphline. I guess (although I'm not sure)
that this would also require ensuring that the data doesn't flow any
further out of your morphline after your custom logic has been run,
otherwise the same document will be indexed twice in Solr.

- Gabriel
>> > email to hbase-indexer-u...@googlegroups.com.
>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "HBase Indexer Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to hbase-indexer-u...@googlegroups.com.

Olivier Armand

unread,
Jan 9, 2015, 9:41:14 AM1/9/15
to hbase-ind...@googlegroups.com, olivier...@gmail.com
I can't find how to disable the hbase-indexer's automatic indexing after the custom logic.

I read in Cloudera Search's documentation the following notice:
<<< Note: To function properly, the morphline must not contain a loadSolr command. The enclosing Lily HBase Indexer must load documents into Solr, instead the morphline itself. >>>


-- Olivier
>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "HBase Indexer Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an

Gabriel Reid

unread,
Jan 9, 2015, 9:51:51 AM1/9/15
to hbase-ind...@googlegroups.com, Olivier Armand
Re-adding hbase-indexer-user list.

On Fri, Jan 9, 2015 at 3:50 PM, Gabriel Reid <gabrie...@gmail.com> wrote:
> Yes, unfortunately I'm not at all sure if it's possible to prevent the
> document from going through to the implicit indexing that happens at
> the end of the morphline. It's also possible that there will be other
> unwanted side-effects that occur by attempting to do this, as implied
> by the Cloudera Search documentation.
>
> If it's an option to have your custom logic run before indexing, I
> would suggest just going with your custom command in your morphline
> and sticking with the defaults.
>
> If you do want to find a way of running your own explicit loadSolr
> command and disabling the default, I would suggest checking on the
> CDK-dev list [1] to see if anyone there can give you some advice on
> this.
>
> - Gabriel
>
> 1. https://groups.google.com/a/cloudera.org/forum/#!forum/cdk-dev
>>> >> > email to hbase-indexer-u...@googlegroups.com.
>>> >> > For more options, visit https://groups.google.com/d/optout.
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google
>>> > Groups
>>> > "HBase Indexer Users" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send
>>> > an
>>> > email to hbase-indexer-u...@googlegroups.com.
>>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "HBase Indexer Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to hbase-indexer-u...@googlegroups.com.

Olivier Armand

unread,
Jan 9, 2015, 9:55:50 AM1/9/15
to hbase-ind...@googlegroups.com, olivier...@gmail.com
Alright, thank you for your help.

-- Olivier
>>> >> > For more options, visit https://groups.google.com/d/optout.
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google
>>> > Groups
>>> > "HBase Indexer Users" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send
>>> > an
>>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "HBase Indexer Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an

Wolfgang Hoschek

unread,
Jan 9, 2015, 1:47:10 PM1/9/15
to Olivier Armand, hbase-ind...@googlegroups.com
Nothing prevents you from running loadSolr or any other custom command as part of your morphline, followed by a dropRecord command to ensure that the morphline never passes any document back to the hbase-indexer framework (otherwise the hbase-indexer framework would send the document to solr again - which is probably not what you want). It's ok to run in this manner.

Just keep in mind that hbase-indexer has some features that the hbase-indexer framework applies when it receives output documents from the morphline, like the config params "table", "unique-key-formatter", "unique-key-field", "row-field", "column-family-field" and "table-name-field" - see https://github.com/NGDATA/hbase-indexer/wiki/Indexer-configuration#table
Because of dropRecord essentially pipes records into /dev/null you couldn't take advantage of with these features, but you don't really need these features anyway and that's just fine for your app, I don't know.

Wolfgang.
> >>> >> > email to hbase-indexer-u...@googlegroups.com.
> >>> >> > For more options, visit https://groups.google.com/d/optout.
> >>> >
> >>> > --
> >>> > You received this message because you are subscribed to the Google
> >>> > Groups
> >>> > "HBase Indexer Users" group.
> >>> > To unsubscribe from this group and stop receiving emails from it, send
> >>> > an
> >>> > email to hbase-indexer-u...@googlegroups.com.
> >>> > For more options, visit https://groups.google.com/d/optout.
> >>
> >> --
> >> You received this message because you are subscribed to the Google Groups
> >> "HBase Indexer Users" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an
> >> email to hbase-indexer-u...@googlegroups.com.
> >> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups "HBase Indexer Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hbase-indexer-u...@googlegroups.com.

Olivier Armand

unread,
Jan 12, 2015, 3:57:40 AM1/12/15
to hbase-ind...@googlegroups.com, olivier...@gmail.com
It looks like what we are looking for, many thanks Wolfgang.

-- Olivier
> >>> >> > For more options, visit https://groups.google.com/d/optout.
> >>> >
> >>> > --
> >>> > You received this message because you are subscribed to the Google
> >>> > Groups
> >>> > "HBase Indexer Users" group.
> >>> > To unsubscribe from this group and stop receiving emails from it, send
> >>> > an
> >>> > For more options, visit https://groups.google.com/d/optout.
> >>
> >> --
> >> You received this message because you are subscribed to the Google Groups
> >> "HBase Indexer Users" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an
> >> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups "HBase Indexer Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hbase-indexer-user+unsub...@googlegroups.com.
Message has been deleted

Mathieu Delporte

unread,
Jan 13, 2015, 6:06:26 AM1/13/15
to hbase-ind...@googlegroups.com, olivier...@gmail.com
Hello,

I'm working with Olivier on this solution and if you allow it, we have one more question for you.
We expierence some trouble accessing the hbase row_key inside the morphline. Guess this is related to the fact we do not use the hbase-indexer for pushing the record into solr (and maybe because the row_key is not pushed to the morphline ?).

But there might some workaround (or solution we do not know of) for accessing the row_key inside the morphline ?

regards,
Mathieu

Wolfgang Hoschek

unread,
Jan 13, 2015, 7:28:04 AM1/13/15
to Mathieu Delporte, hbase-ind...@googlegroups.com, olivier...@gmail.com
You can fetch it by pasting this code snippet into a “java” morphline command:

           org.apache.hadoop.hbase.client.Result result = (org.apache.hadoop.hbase.client.Result) record.getFirstValue("_attachment_body");
           byte[] rowKey = result.getRow();            
           record.put("myRowKey", rowKey);

Possible gotcha: Make sure that this java command appears *before* the extractHBaseCells morphline command in the morphline config file.

To unsubscribe from this group and stop receiving emails from it, send an email to hbase-indexer-u...@googlegroups.com.

Mathieu Delporte

unread,
Jan 13, 2015, 7:53:59 AM1/13/15
to hbase-ind...@googlegroups.com, mathieu....@gmail.com, olivier...@gmail.com
Thx, this is working like a charm.

Mathieu Delporte

unread,
Jan 20, 2015, 10:17:33 AM1/20/15
to hbase-ind...@googlegroups.com, mathieu....@gmail.com, olivier...@gmail.com
Hi,

We put the whole solution together and faced an issue.

The morphline :
- 1 : retrieve the rowID using the java custom function
- 2 : extract the hbase cells
- 3 : use the loadSolr function
- 4 : eventually use the dropRecord function

The issue comes when the "dropRecord" is added at the end of the morphline in order to prevent Lily from loading the record to Solr. In that case, the record loaded in the morphline is not visible into Solr.
Our guess is that the transaction opened by Lily is not committed in that particular case.

Is there a way to force commit ? Or tells Lily to commit the record without loading it to Solr (a second time) ?

Best regards.
Mathieu DELPORTE

Wolfgang Hoschek

unread,
Jan 20, 2015, 12:47:47 PM1/20/15
to Mathieu Delporte, hbase-ind...@googlegroups.com, olivier...@gmail.com
Unfortunately hbase-indexer doesn't have a lifecylce API for the ResultToSolrMapper class, e.g. with methods such as commit(). Therefore the batch of docs accumulated by the loadSolr command might never be sent to Solr. A workaround might be to set the batchSize param of the loadSolr command to 1 to force sending the data to solr immediately. This affects throughput, of course.

Wolfgang.

To unsubscribe from this group and stop receiving emails from it, send an email to hbase-indexer-u...@googlegroups.com.

Mathieu Delporte

unread,
Jan 21, 2015, 4:59:53 AM1/21/15
to hbase-ind...@googlegroups.com, mathieu....@gmail.com, olivier...@gmail.com
Ok thanks,

Seems like we try to twist a little too much the product for achieving our need (notifying after the indexation).
Guess we will adopt the failsoft approach (notifying just before the indexation) which seems more "compliant" to what the product can achieve fairly easily.

regards.
Reply all
Reply to author
Forward
0 new messages