Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Master died after failed assignment of regionserver
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  4 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Matt Davies  
View profile  
 More options Jun 27 2011, 12:54 pm
From: Matt Davies <matt.dav...@tynt.com>
Date: Mon, 27 Jun 2011 10:54:28 -0600
Local: Mon, Jun 27 2011 12:54 pm
Subject: Master died after failed assignment of regionserver

Hey all,

Interesting problem I'm running into.  It appears that our master has died
after a failed region assignment.  Any ideas as to why?

We are running version 0.90.1 in this cluster.
H3M1 = master
H3S* = regionservers

out on H3S3 the regionserver process is running (but didn't respond per the
log)

2011-06-27 10:18:22,517 INFO
org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to
disable table ContentGuidance
2011-06-27 10:18:22,522 INFO
org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1
regions.
2011-06-27 10:18:23,527 INFO
org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table
is done=true
2011-06-27 10:18:24,536 INFO
org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table
operation C_M_DELETE_TABLE on table ContentGuidance
2011-06-27 10:18:24,584 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Deleted region
ContentGuidance,,1298676132327.9d63b6173c7307c27aa9728f738f0ef3. from META
2011-06-27 10:19:17,296 WARN org.apache.hadoop.hbase.zookeeper.ZKTable:
Moving table ContentGuidance state to enabled but was already enabled
2011-06-27 10:19:17,313 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
HLog configuration: blocksize=64 MB, rollsize=60.8 MB, enabled=true,
flushlogentries=1, optionallogflushinternal=1000ms
2011-06-27 10:19:17,349 INFO
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs
-- HDFS-200
2011-06-27 10:19:17,350 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
New hlog
/hbase/ContentGuidance/f38bbc9bc672b5ea415b6dcc7e1da43a/.logs/hlog.13091915 57313
2011-06-27 10:19:17,350 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Using getNumCurrentReplicas--HDFS-826
2011-06-27 10:19:17,433 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Onlined ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a.;
next sequenceid=1
2011-06-27 10:19:17,438 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Added region
ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. to META
2011-06-27 10:19:17,439 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a.
2011-06-27 10:19:17,439 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
IPC Server handler 87 on 60000.logSyncer exiting
2011-06-27 10:19:17,470 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 1 region(s)
round-robin across 4 server(s)
2011-06-27 10:19:17,472 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done
2011-06-27 10:19:17,480 INFO
org.apache.hadoop.hbase.master.AssignmentManager: H3S4,60020,1308946657703
unassigned znodes=1 of total=1
2011-06-27 10:19:21,530 INFO
org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to
disable table ContentGuidance
2011-06-27 10:19:21,536 INFO
org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1
regions.
2011-06-27 10:19:22,543 INFO
org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table
is done=true
2011-06-27 10:19:23,544 INFO
org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table
operation C_M_DELETE_TABLE on table ContentGuidance
2011-06-27 10:19:23,551 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Deleted region
ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. from META
2011-06-27 10:20:17,479 FATAL org.apache.hadoop.hbase.master.HMaster: Failed
assignment of regions to serverName=H3S3,60020,1308946657608,
load=(requests=0, regions=7, usedHeap=976, maxHeap=1011)
java.net.SocketTimeoutException: Call to H3S3/10..x.x.x3:60020 failed on
socket timeout exception: java.net.SocketTimeoutException: 60000 millis
timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904
remote=H3S3/10.x.x.x:60020]
at
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:802)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy7.openRegions(Unknown Source)
at
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.j ava:566)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.j ava:776)
at
org.apache.hadoop.hbase.master.AssignmentManager$SingleServerBulkAssigner.r un(AssignmentManager.java:1310)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.j ava:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: 908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904
remote=H3S3/10..x.x.x:60020]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164 )
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBa seClient.java:299)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseCli ent.java:539)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477 )
2011-06-27 10:20:17,482 INFO org.apache.hadoop.hbase.master.HMaster:
Aborting
2011-06-27 10:20:18,390 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 60000
2011-06-27 10:20:18,390 INFO org.apache.hadoop.hbase.master.CatalogJanitor:
H3M1:60000-CatalogJanitor exiting
2011-06-27 10:20:18,390 INFO org.apache.hadoop.hbase.master.HMaster$1:
H3M1:60000-BalancerChore exiting
2011-06-27 10:20:18,392 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server listener on 60000


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jean-Daniel Cryans  
View profile  
 More options Jun 27 2011, 1:21 pm
From: Jean-Daniel Cryans <jdcry...@apache.org>
Date: Mon, 27 Jun 2011 10:21:19 -0700
Local: Mon, Jun 27 2011 1:21 pm
Subject: Re: Master died after failed assignment of regionserver
Yeah the BulkAssigner uses a AssignmentManager.assign method that
won't retry if it gets an exception because originally it was only
used during startup (eg if you can't assign the regions properly at
startup then kill yourself).

In this case it happened to you when creating a table (it also uses
the BulkAssigner now). To me it's a bug, it should just be assigned
somewhere else. Mind filing a jira?

As to how you got a 60 secs timeout, you should take a look at the RS
log (and next time it happens also do a jstack).

Thx,

J-D


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jean-Daniel Cryans  
View profile  
 More options Jun 28 2011, 5:14 pm
From: Jean-Daniel Cryans <jdcry...@apache.org>
Date: Tue, 28 Jun 2011 14:14:56 -0700
Local: Tues, Jun 28 2011 5:14 pm
Subject: Re: Master died after failed assignment of regionserver
Looks like I'll be fixing it in the context of HBASE-3984.

J-D

On Mon, Jun 27, 2011 at 10:21 AM, Jean-Daniel Cryans


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matt Davies  
View profile  
 More options Jun 29 2011, 2:33 pm
From: Matt Davies <m...@mattdavies.net>
Date: Wed, 29 Jun 2011 12:33:45 -0600
Local: Wed, Jun 29 2011 2:33 pm
Subject: Re: Master died after failed assignment of regionserver

Thanks JD.  Sorry - I got swamped and was going to file a Jira, but sounds
like you don't need it now.

On Tue, Jun 28, 2011 at 3:14 PM, Jean-Daniel Cryans <jdcry...@apache.org>wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »