InputFormat to TAP memcached under couchbase

45 views
Skip to first unread message

Corey Nolet

unread,
Mar 10, 2014, 11:52:57 PM3/10/14
to couc...@googlegroups.com
I recently tried the Sqoop connector for Couchbase 2 and it doesn't appear to be working as expected. I have written my own InputFormat here:


I haven't gotten a chance to test it yet but I wanted to know if MOXI would make it hard to get the locality that Im expecting from each of the memcached instances. When I connect to a memcached instance (backing couchbase) on port 11211, will each of those memcached instances give me ALL of the keys in couchbase? or will they only give me the keys that they contain separately?


Thanks!

Aliaksey Kandratsenka

unread,
Mar 11, 2014, 12:10:37 AM3/11/14
to couc...@googlegroups.com
It looks like you're expecting moxi to support tap. But moxi does not support TAP.

Corey Nolet

unread,
Mar 11, 2014, 2:02:44 PM3/11/14
to couc...@googlegroups.com
This kind of answers my question- so if I hit port 11211 and do a tap on the underlying memcached instance, I will get all the keys that exist ONLY on that memcached instance, correct? Will I get duplicate keys on different nodes because of the replicas?

Thanks!

Corey Nolet

unread,
Mar 12, 2014, 4:03:07 PM3/12/14
to couc...@googlegroups.com
Would it possible for someone to provide me with an effective example on how to use the TapClient in couchbase/memcached with a couchbase server installation?

I've been banging my head against the wall for days on this. I need to be able to dump out my couchbase keys/values every hour into HDFS so I can map/reduce over them. I'm using CDH3u4 and the Sqoop connector is freezing up when it begins its map/reduce job. I do not have the luxury of updating to the Sqoop CDH4 version unfortunately but I've seen people complaining of the same problems with that version.

What I've tried is using the TapClient with both the Couchbase libraries and the spy memcached libraries in java. Even with exponential backoff, I can't seem to get the TapClient to return a message where I can pull off a key and a value (it appears I get 'null" for getNextmessage() even with an appropriate timeout of 5 minutes).

What can I do to get this to work? I've been using Couchbase behind Twitter Storm to help with caching for CEP. I've also been using it as a real-time query engine of the underlying CEP cache with ElasticSearch for my customer. If I can't dump the data out to HDFS directly, then I may need to look at other options. I am trying to stay away from views because I want to hit memory directly. I'd also like to preserve data locality if possible (connect directly to memcached or tell couchbase exactly which node(s) i'd like to retrieve keys from.

What are my options here?


I'm wondering if BigCouch would allow me to do this effectively.

Thanks much!

Corey Nolet

unread,
Mar 12, 2014, 9:48:51 PM3/12/14
to couc...@googlegroups.com
I *think* i may have isolated this issue to a client version- though it doesn't make sense to me why the sqoop plugin isn't working. I'm going to try upgrading my client libs to the newest version.

Corey Nolet

unread,
Apr 30, 2014, 9:00:25 PM4/30/14
to couc...@googlegroups.com
I wanted to post back here that I had solved the problem I was having with the input format in the Sqoop plugin. It was using memebase client to perform the tap. Changing this to the couchbase client made it work. I figured it'd be useful to have it here in case other users run into the same issue. In the meantime, I did put the updated version of the input format here:


I've done some work with Couchbase + Elasticsearch + Tinkerpop's Gremlin. I'm grabbing snapshots of graphs from Couchbase every hour with the InputFormat and it appears to be working well. Though it would definitely be faster if I was able to perform filters at the TAP level... I know that's a complicated thing to ask for.
Reply all
Reply to author
Forward
0 new messages