Zookeeper disconnect

45 views
Skip to first unread message

pixalsoft

unread,
Nov 3, 2013, 12:56:32 PM11/3/13
to lily-d...@googlegroups.com
Hi,

We are using Lily 2.0. For the second time in the last two weeks our live site went down. I have provided lily-client and server logs below.

I believe the primary reason is the zookeeper disconnect. In a separate thread  Bruno mentioned that Lily should be able to survive a zookeeper disconnect. Do we need to move to the latest Lily version?

For zookeeper, I have also increased the syncLimit to 10 with a ticktime of 2000 but that doesn't seem to help.

thanks
Prashant


The trouble starts like this: Lily-client logs:
2013-11-03 16:19:19, 267ZooKeeper disconnected at Sun Nov 03 16:45:11 UTC 2013
ZooKeeper connected at Sun Nov 03 16:45:14 UTC 2013
ZooKeeper connected at Sun Nov 03 16:45:22 UTC 2013
470745027 [TypeManager cache refresher] ERROR org.lilyproject.client.RemoteSchemaCache  - Error refreshing type manager cache. Cache is possibly out of date!
 2013-11-03 16:49:14, 597java.lang.reflect.UndeclaredThrowableException
    at $Proxy9.getTypesWithoutCache(Unknown Source)
    at org.lilyproject.repository.impl.AbstractSchemaCache.refreshAll(AbstractSchemaCache.java:338)
    at org.lilyproject.repository.impl.AbstractSchemaCache.access$800(AbstractSchemaCache.java:63)
    at org.lilyproject.repository.impl.AbstractSchemaCache$CacheRefresher.run(AbstractSchemaCache.java:596)
    at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=10, exceptions:
Sun Nov 03 16:45:42 UTC 2013, org.apache.hadoop.hbase.client.ScannerCallable@1e30a83e, java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending


Lily-server logs:
[WARN   ] <2013-11-03 16:45:16,678> (org.lilyproject.util.zookeeper.StateWatchingZooKeeper): Disconnected from ZooKeeper
[INFO   ] <2013-11-03 16:45:16,685> (org.lilyproject.util.zookeeper.LeaderElection): No longer leader for the position of RowLog Processor mq
[INFO   ] <2013-11-03 16:45:16,685> (org.lilyproject.util.zookeeper.LeaderElection): No longer leader for the position of RowLog Processor wal
[INFO   ] <2013-11-03 16:45:16,685> (org.lilyproject.util.zookeeper.LeaderElection): No longer leader for the position of Blob Incubator Monitor
[INFO   ] <2013-11-03 16:45:16,685> (org.lilyproject.util.zookeeper.LeaderElection): No longer leader for the position of Indexer Master
[INFO   ] <2013-11-03 16:45:16,687> (org.lilyproject.rowlog.impl.RowLogProcessorElection): Shutting down row log processor for wal
[INFO   ] <2013-11-03 16:45:16,689> (org.lilyproject.rowlog.impl.RowLogProcessorElection): Shutting down row log processor for mq
[INFO   ] <2013-11-03 16:45:16,689> (org.lilyproject.indexer.master.IndexerMaster): Shutting down as indexer master.
[INFO   ] <2013-11-03 16:45:16,736> (org.lilyproject.rowlog.impl.RowLogProcessorElection): Shutdown of row log processor sucessful for wal
[WARN   ] <2013-11-03 16:45:16,736> (org.lilyproject.util.zookeeper.StateWatchingZooKeeper): Connected to ZooKeeper
[INFO   ] <2013-11-03 16:45:16,751> (org.lilyproject.util.zookeeper.LeaderElection): Elected as leader for the position of RowLog Processor mq
[INFO   ] <2013-11-03 16:45:16,765> (org.lilyproject.rowlog.impl.RowLogProcessorElection): Shutdown of row log processor sucessful for mq
[INFO   ] <2013-11-03 16:45:16,765> (org.lilyproject.rowlog.impl.RowLogProcessorElection): Starting row log processor for mq
[INFO   ] <2013-11-03 16:45:16,775> (org.lilyproject.util.zookeeper.LeaderElection): Elected as leader for the position of RowLog Processor wal
[INFO   ] <2013-11-03 16:45:16,775> (org.lilyproject.rowlog.impl.RowLogProcessorElection): Starting row log processor for wal
[INFO   ] <2013-11-03 16:45:16,821> (org.lilyproject.util.zookeeper.LeaderElection): Elected as leader for the position of Blob Incubator Monitor
[INFO   ] <2013-11-03 16:45:16,823> (org.lilyproject.util.zookeeper.LeaderElection): Elected as leader for the position of Indexer Master
[INFO   ] <2013-11-03 16:45:18,773> (org.lilyproject.indexer.master.IndexerMaster): Shutdown as indexer master successful.
[INFO   ] <2013-11-03 16:45:18,773> (org.lilyproject.indexer.master.IndexerMaster): Starting up as indexer master.
[INFO   ] <2013-11-03 16:45:18,773> (org.lilyproject.indexer.master.IndexerMaster): Startup as indexer master successful.
[INFO   ] <2013-11-03 16:45:27,665> (org.lilyproject.rowlog.impl.RowLogProcessorImpl): Maximum global queue scan threads set to 1
[INFO   ] <2013-11-03 16:45:27,666> (org.lilyproject.rowlog.impl.RowLogProcessorImpl): Maximum global queue scan threads set to 1
[INFO   ] <2013-11-03 16:45:27,667> (org.lilyproject.rowlog.impl.RowLogProcessorElection): Startup of row log processor successful for mq
[INFO   ] <2013-11-03 16:45:27,668> (org.lilyproject.rowlog.impl.RowLogProcessorImpl): RowLog scan batch size (on each shard/split): 1000
[INFO   ] <2013-11-03 16:45:27,668> (org.lilyproject.rowlog.impl.RowLogProcessorImpl): RowLog messages work queue size: 1000
[INFO   ] <2013-11-03 16:45:27,697> (org.lilyproject.rowlog.impl.RowLogProcessorImpl): RowLog scan batch size (on each shard/split): 1000
[INFO   ] <2013-11-03 16:45:27,697> (org.lilyproject.rowlog.impl.RowLogProcessorImpl): RowLog messages work queue size: 1000
[INFO   ] <2013-11-03 16:45:27,698> (org.lilyproject.rowlog.impl.RowLogProcessorElection): Startup of row log processor successful for wal
[WARN   ] <2013-11-03 16:50:39,121> (org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation): Encountered problems when prefetch META table:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=10, exceptions:
Sun Nov 03 16:47:24 UTC 2013, org.apache.hadoop.hbase.client.HTable$4@79871bba, java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=slave1.truegether.com/10.185.6.143:60020]
Sun Nov 03 16:47:45 UTC 2013, org.apache.hadoop.hbase.client.HTable$4@79871bba, java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=slave1.truegether.com/10.185.6.143:60020]
Sun Nov 03 16:48:07 UTC 2013, org.apache.hadoop.hbase.client.HTable$4@79871bba, java.net.SocketTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=slave1.truegether.com/10.185.6.143:60020]
Sun Nov 03 16:48:14 UTC 2013, org.apache.hadoop.hbase.client.HTable$4@79871bba, java.net.NoRouteToHostException: No route to host
Reply all
Reply to author
Forward
0 new messages