When the COMMIT_TRANSACTION notify is send?

20 views
Skip to first unread message

Robert Fiser

unread,
Feb 2, 2015, 6:38:54 AM2/2/15
to cdk...@cloudera.org
I did successfully write to solr by loadSolr command.
Now I'm trying to write to elasticsearch by loadElasticsearch command https://github.com/sematext/kite-morphlines-elasticsearch.
This command expected the COMMIT_TRANSACTION notification to write to elasticsearch but this notification never comes.
I think the loadSolr command works because of default value batch size and auto-commit mechanism.

My question is what I'm supposed to do to enable correct committing by handling the COMMIT_TRANSACTION notification?

Robert

Wolfgang Hoschek

unread,
Feb 2, 2015, 11:57:39 AM2/2/15
to Robert Fiser, cdk...@cloudera.org
It's up to the client that pipes data into a morphline to send a COMMIT_TRANSACTION notification (or not). For example, the Flume MorphlineSolrSink sends a COMMIT_TRANSACTION after each batch of flume events to the morphline. Other clients may choose to behave differently. Which client are you using?

Wolfgang.

--
You received this message because you are subscribed to the Google Groups "CDK Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdk-dev+u...@cloudera.org.
For more options, visit https://groups.google.com/a/cloudera.org/d/optout.

Robert Fiser

unread,
Feb 2, 2015, 3:02:37 PM2/2/15
to cdk...@cloudera.org, robert...@socialbakers.com
using hbase-indexer but I didn't found any usage of COMMIT_TRANSACTION expect of demo and tests

Robert

Wolfgang Hoschek

unread,
Feb 2, 2015, 3:09:04 PM2/2/15
to Robert Fiser, cdk...@cloudera.org

Wolfgang Hoschek

unread,
Feb 2, 2015, 3:11:20 PM2/2/15
to robert...@socialbakers.com, cdk...@cloudera.org
Oops, I pasted the wrong link. Here is the right link: https://groups.google.com/d/msg/hbase-indexer-user/eitg7QquxRI/LI61MOi-zigJ

Wolfgang.

Robert Fiser

unread,
Feb 2, 2015, 3:38:08 PM2/2/15
to cdk...@cloudera.org, robert...@socialbakers.com
If I get it right, that means I have to implement the auto-commit feature with batchSize to loadElasticsearch command, because "hbase-indexer doesn't have a lifecylce API for the ResultToSolrMapper class, e.g. with methods such as commit()"?

Robert

Wolfgang Hoschek

unread,
Feb 2, 2015, 3:59:28 PM2/2/15
to Robert Fiser, cdk...@cloudera.org
A work-around would be to essentially use a batchSize of 1 (which has throughput implications, of course).

Wolfgang.

Robert Fiser

unread,
Feb 2, 2015, 4:06:10 PM2/2/15
to cdk...@cloudera.org, robert...@socialbakers.com
But batchSize is parameter of loadSolr command. I'm currently dealing with loadElasticsearch command and there is no batchSize parameter. Other thing is batchSize=1 will probably degrade the performance.

My morphline just contains the extractHbaseCells command and the loadElasticSearch command. (https://github.com/sematext/kite-morphlines-elasticsearch)

Robert

Wolfgang Hoschek

unread,
Feb 2, 2015, 5:37:03 PM2/2/15
to Robert Fiser, cdk...@cloudera.org
In this case you'd need to the modify loadES command to send all outstanding data on each process() call. Yes, there are performance implications.

Wolfgang
Reply all
Reply to author
Forward
0 new messages