Does not replicate impala inserts/deletes

67 views
Skip to first unread message

Don Krapohl

unread,
Oct 14, 2016, 10:18:29 AM10/14/16
to reair
We've had great success with this replicating hive statements.  Our Impala inserts do not trigger the audit so those inserts are never replicated.  I've looked through the internals of your source, hive source and impala source (just set up as an Apache incubator!).  I can't find thrift hooks that impala triggers.  Any guidance?

Paul Yang

unread,
Oct 14, 2016, 5:32:22 PM10/14/16
to Don Krapohl, reair
Hey Don,

Just to confirm, you're using the Hive metastore Thrift server to manage your metadata? If so, there's a way to setup audit logging for the Thrift metastore server and have ReAir read changes from that instead. This way, you can capture changes that aren't made through the Hive CLI. It's not documented yet, but that's the setup that have deployed internally at Airbnb. If you want to try it out, take a look at the class `MetastoreAuditLogListener` in the repo and add this to the hive-site.xml for the Thrift server:

  <property>
    <name>hive.metastore.event.listeners</name>
    <value>com.airbnb.reair.hive.hooks.MetastoreAuditLogListener</value>
  </property>

You'll have to make sure that the JAR is in the classpath and that the other configuration variables are set correctly.

There is one issue with this approach though - there's a bug in some versions of Hive (at least for 0.13) and the listener callback for the exchange partition call is not implemented properly. Consequently, exchange partition calls (https://cwiki.apache.org/confluence/display/Hive/Exchange+Partition) won't be replicated.

Cheers,
Paul

Don Krapohl

unread,
Oct 27, 2016, 12:00:07 PM10/27/16
to reair
We have this configured and are executing through hue and impala shell.  Neither are replicating and we are getting no debug messages about impala inserts.  Hive replication works fine.  None of our service logs show we're missing jars.  We are setting these values on the source hive but no audit triggers are firing:
airbnb.reair.metastore.audit_log.db.password
airbnb.reair.metastore.audit_log.db.username
airbnb.reair.metastore.audit_log.jdbc_url

By inserting these into hive-site.xml, which one?  We use Cloudera's distro so we have llama (which we don't use), the impala daemon safety valve for hive-site, the hive metastore hive-site, or the hiveserver2 hive-site?  Can't seem to get audit triggers no matter which we do.

Paul Yang

unread,
Oct 27, 2016, 2:14:37 PM10/27/16
to Don Krapohl, reair
Are you running the Hive metastore Thrift server, and have you configured your clients to connect to it?

--
You received this message because you are subscribed to the Google Groups "reair" group.
To unsubscribe from this group and stop receiving emails from it, send an email to airbnb-reair+unsubscribe@googlegroups.com.
To post to this group, send email to airbnb...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/airbnb-reair/e4b6aa7a-1a71-4f73-86f8-affeb36a572b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Don Krapohl

unread,
Nov 1, 2016, 10:53:10 AM11/1/16
to reair
Paul,

We are running our Hive metastore thrift server.  Our APIs and impala-shell connect to Impala's 21050 port but how that interacts with the Hive metastore Thrift server is not clear.

Don Krapohl

unread,
Nov 3, 2016, 11:55:35 AM11/3/16
to reair
We've decided to just use batch replication as our max 2-hour replication load would be something less than 60TB, which we can sustain.  I'd still like to see if we can get incremental replication of Impala-inserted data going.  We'll fork the repo in the next couple months and see if we can contribute.
Reply all
Reply to author
Forward
0 new messages