Error running with Bigtable Hbase client

1,493 views
Skip to first unread message

akshar dave

unread,
Oct 27, 2017, 4:31:31 PM10/27/17
to Google Cloud Bigtable Discuss
Hi

Currently while utilizing bigtable-hbase-1.x-hadoop-1.0.0-pre3.jar version, call to HBase client fails while performing flushCommits on Bigtable.

Here is grpc session info:

2017-10-27 17:37:51,462 INFO  grpc.BigtableSession - Bigtable options: BigtableOptions{dataHost=bigtable.googleapis.com, tableAdminHost=bigtableadmin.googleapis.com, instanceAdminHost=bigtableadmin.googleapis.com, projectId=xxxxxx-dev, instanceId=big-table-nutch-test, userAgent=hbase-1.2.0-cdh5.13.0, credentialType=DefaultCredentials, port=443, dataChannelCount=20, retryOptions=RetryOptions{retriesEnabled=true, allowRetriesWithoutTimestamp=false, statusToRetryOn=[INTERNAL, DEADLINE_EXCEEDED, ABORTED, UNAUTHENTICATED, UNAVAILABLE], initialBackoffMillis=5, maxElapsedBackoffMillis=60000, backoffMultiplier=2.0, streamingBufferSize=60, readPartialRowTimeoutMillis=60000, maxScanTimeoutRetries=3}, bulkOptions=BulkOptions{asyncMutatorCount=2, useBulkApi=true, bulkMaxKeyCount=25, bulkMaxRequestSize=1048576, autoflushMs=0, maxInflightRpcs=1000, maxMemory=93218406, enableBulkMutationThrottling=false, bulkMutationRpcTargetMs=100}, callOptionsConfig=CallOptionsConfig{useTimeout=false, shortRpcTimeoutMs=60000, longRpcTimeoutMs=600000}, usePlaintextNegotiation=false}.

Getting following error:

2017-10-27 17:37:51,660 ERROR store.HBaseStore - Failed 1 action: UnsupportedOperationException: 1 time, servers with issues: bigtable.googleapis.com
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: UnsupportedOperationException: 1 time, servers with issues: bigtable.googleapis.com
at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.handleExceptions(BigtableBufferedMutator.java:271)
at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.mutate(BigtableBufferedMutator.java:198)
at org.apache.gora.hbase.store.HBaseTableConnection.flushCommits(HBaseTableConnection.java:115)
at org.apache.gora.hbase.store.HBaseTableConnection.close(HBaseTableConnection.java:127)
at org.apache.gora.hbase.store.HBaseStore.close(HBaseStore.java:819)
at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:56)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Thanks
Akshar

Solomon Duskis

unread,
Oct 28, 2017, 8:52:16 PM10/28/17
to akshar dave, Google Cloud Bigtable Discuss
That exception by itself is not enough to give you an answer.  The UnsupportedOperationException is a bit strange.  I'm not sure where that's coming from.  Here's a guide on getting more information from a RetriesExhaustedWithDetailsException, which may or may not help you, since neither Gora or BigtableBufferedMutator are under your control.

This seems like a client-side thing, so this is likely some strange interaction between our library and Gora.  I unfortunately, don't have that much experience with Gora.


Solomon Duskis | Google Cloud Bigtable Tech Lead | sdu...@google.com | 914-462-0531

Akshar

--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtable-discuss+unsub...@googlegroups.com.
To post to this group, send email to google-cloud-bigtable-discuss@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-bigtable-discuss/d8958ff5-3cc7-43e5-8ac5-c2dfa41bedba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Akshar Dave

unread,
Oct 30, 2017, 4:04:21 PM10/30/17
to Solomon Duskis, Google Cloud Bigtable Discuss
I pinged Gora community to see if they can point me towards additional troubleshooting steps to get further details.

Solomon Duskis

unread,
Oct 30, 2017, 6:18:23 PM10/30/17
to Akshar Dave, Google Cloud Bigtable Discuss
We're going to take a stab at reproducing this in the near future.

-Solomon


Solomon Duskis | Google Cloud Bigtable Tech Lead | sdu...@google.com | 914-462-0531

Akshar Dave

unread,
Oct 30, 2017, 9:01:21 PM10/30/17
to Solomon Duskis, Google Cloud Bigtable Discuss
Glad to hear 👍 Thanks! Look forward to it

Akshar
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtabl...@googlegroups.com.
To post to this group, send email to google-cloud-b...@googlegroups.com.

Solomon Duskis

unread,
Nov 28, 2017, 10:01:11 AM11/28/17
to Akshar Dave, Google Cloud Bigtable Discuss
It took us a while, but we found the problem.  There's a subtle difference between Bigtable and HBase regarding deletes.  This line in gora and this line is the problem.  HBase requires a delete by family + timestamp to work effectively.   Cloud Bigtable does not support delete by family with a timestamp due to the way Bigtable works.

FWIW, the new Cloud Bigtable client (1.0.0) gives a better error message than the one you first posted, which is how we found this problem.  Fixing this subtle issue would require further discussions with the Apache Gora developers.


Solomon Duskis | Google Cloud Bigtable Tech Lead | sdu...@google.com | 914-462-0531

Akshar
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtable-discuss+unsub...@googlegroups.com.
To post to this group, send email to google-cloud-bigtable-discuss@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Google Cloud Bigtable Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-bigtable-discuss+unsub...@googlegroups.com.
To post to this group, send email to google-cloud-bigtable-discuss@googlegroups.com.

Solomon Duskis

unread,
Jan 9, 2018, 1:19:50 PM1/9/18
to Akshar Dave, Google Cloud Bigtable Discuss
The stack trace relates to the fact that HBase Hadoop doesn't properly close connections.  That doesn't actually affect anything.

I don't have a clear sense of what the problem is.  We didn't see this problem.  I'll see if we can recreate this issue.


Solomon Duskis | Google Cloud Bigtable Tech Lead | sdu...@google.com | 914-462-0531

On Wed, Jan 3, 2018 at 6:05 PM, Akshar Dave <aksha...@gmail.com> wrote:
Finally got to trying out the recommended change ...

inject, generate, fetch, parse have all worked. The change does get rid of the failure. I can also see the crawled html in the big table.

I only did about 6 urls, but for some reason the updatedb phase is still running after 15 minutes and only 28%

the jobs do always output this error a bunch of times at the start, but it doens't seem to affect the job

Jan 03, 2018 9:49:56 PM com.google.bigtable.repackaged.io.grpc.internal.ManagedChannelImpl$ManagedChannelReference cleanQueue
SEVERE: *~*~*~ Channel com.google.bigtable.repackaged.io.grpc.internal.ManagedChannelImpl-35 for target bigtable.googleapis.com:443 was not shutdown properly!!! ~*~*~*
    Make sure to call shutdown()/shutdownNow() and awaitTermination().
java.lang.RuntimeException: ManagedChannel allocation site
    at com.google.bigtable.repackaged.io.grpc.internal.ManagedChannelImpl$ManagedChannelReference.<init>(ManagedChannelImpl.java:991)
    at com.google.bigtable.repackaged.io.grpc.internal.ManagedChannelImpl.<init>(ManagedChannelImpl.java:421)
    at com.google.bigtable.repackaged.io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:329)
    at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.BigtableSession.createNettyChannel(BigtableSession.java:483)
    at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.BigtableSession$3.create(BigtableSession.java:398)
    at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.io.ChannelPool.<init>(ChannelPool.java:246)
    at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.BigtableSession.createChannelPool(BigtableSession.java:401)
    at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.BigtableSession.createManagedPool(BigtableSession.java:413)
    at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.BigtableSession.getDataChannelPool(BigtableSession.java:276)
    at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.BigtableSession.<init>(BigtableSession.java:236)
    at org.apache.hadoop.hbase.client.AbstractBigtableConnection.<init>(AbstractBigtableConnection.java:143)
    at com.google.cloud.bigtable.hbase1_x.BigtableConnection.<init>(BigtableConnection.java:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
    at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
    at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
    at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:131)
    at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:97)
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:156)
    at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:130)
    at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:78)
    at org.apache.nutch.storage.StorageUtils.initMapperJob(StorageUtils.java:133)
    at org.apache.nutch.storage.StorageUtils.initMapperJob(StorageUtils.java:122)
    at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:224)
    at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:260)
    at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:326)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:334)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Looks like this is likely why the job is running so long ...

Task attempt attempt_1507853061749_0037_r_000002_0 is done from TaskUmbilicalProtocol's point of view. However, it stays in finishing state for too long Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143

Its in the logs for all the completed tasks  for the updatedb job

its saying that all of the tasks are just getting killed by the Application Master because they aren't cleaning themselves up. is that there is zero work for all these tasks to do since i only used 6 urls, so they aren't properly reporting a finished state when they don't have anything to do?

Thanks for all the help so far. 

Akshar

On Thu, Nov 30, 2017 at 10:39 AM, Akshar Dave <aksha...@gmail.com> wrote:
Great, thanks! Will look into building gora locally and try again. Is there a more recent one after pre3?

On Thu, Nov 30, 2017 at 9:10 AM Solomon Duskis <sdu...@google.com> wrote:
Yes.  Changing one line will get you moving forward.


Delete delete = new Delete(keyRaw, timeStamp - PUTS_AND_DELETES_DELETE_TS_OFFSET);

should be 

Delete delete = new Delete(keyRaw);

Then you can build gora/nutch locally with snapshots.  I would also suggest upgrading the bigtable-hbase dependency to release v. 1.0.0.  

-Solomon

On Thu, Nov 30, 2017, 11:57 AM Akshar Dave <aksha...@gmail.com> wrote:
Apologies for Doing DM ... Thanks a lot for figuring out the root cause...is there any way to use Bigtable since we are not interested in delete workflow and fine with data being in the backend till we get this resolved? Would you try commenting out those lines to get it working?
Reply all
Reply to author
Forward
0 new messages