Executing 'create table' statement always hangs, It looks like a previous problem unsolved.

45 views
Skip to first unread message

David

unread,
Oct 23, 2013, 5:21:56 AM10/23/13
to hyperta...@googlegroups.com
I has trouble to executing 'create table' statement  from yesterday to today,  it always hangs until timeout. It looks like the previous thread 'https://groups.google.com/forum/?fromgroups#!topic/hypertable-user/QiAHnA-5k9I', but both are not the same cluster.  so i issue this thread to describe the problem.

Master log is full of the following ERROR:
1382498975 INFO Hypertable.Master : (/root/src/hypertable/src/cc/Hypertable/Master/OperationCreateTable.cc:100) Entering CreateTable-18188(/dinglicom/foo, location=) state=WRITE_METADATA
1382498975 ERROR Hypertable.Master : operator() (/root/src/hypertable/src/cc/Hypertable/Master/OperationProcessor.cc:333): Hypertable::Exception:  - DFS BROKER i/o error
        at void Hypertable::TableMutator::wait_for_flush_completion(Hypertable::TableMutatorAsync*) (/root/src/hypertable/src/cc/Hypertable/Lib/TableMutator.cc:210)
1382498975 ERROR Hypertable.Master : flush (/root/src/hypertable/src/cc/Hypertable/Lib/TableMutator.cc:183): Hypertable::Exception:  - DFS BROKER i/o error
        at void Hypertable::TableMutator::wait_for_flush_completion(Hypertable::TableMutatorAsync*) (/root/src/hypertable/src/cc/Hypertable/Lib/TableMutator.cc:210)

RangeServer log is also full of the following ERROR:
1382474180 ERROR Hypertable.RangeServer : (/root/src/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:478) Problem writing commit log: /hypertable/servers/rs1/log/metadata/0: Error appending 64 bytes to DFS fd 4
1382474180 ERROR Hypertable.RangeServer : (/root/src/hypertable/src/cc/Hypertable/RangeServer/RangeServer.cc:2683) Problem writing 30 bytes to commit log (/hypertable/servers/rs1/log/metadata) - DFS BROKER i/o error

The writing application also reports the following ERROR that was never encountered previously:
1382498906 ERROR cdrimport_test : (/root/src/hypertable/src/cc/Hyperspace/ClientKeepaliveHandler.cc:173) Master session (97) error - HYPERSPACE expired session
1382498908 ERROR cdrimport_test : set_cells (/root/src/hypertable/src/cc/Hypertable/Lib/TableMutator.cc:116): Hypertable::Exception: Problem getting attribute 'Location' of hyperspace file 'UNKNOWN' - COMM broken connection
        at void Hyperspace::Session::attr_get(uint64_t, const std::string&, Hypertable::DynamicBuffer&, Hypertable::Timer*) (/root/src/hypertable/src/cc/Hyperspace/Session.cc:546)
1382498908 ERROR cdrimport_test : Commit (/home/jack/mnt2/MCCloud_V2.0/foundation/../common/incl/DLHypertableClient.h:328): Hypertable::Exception: Problem getting attribute 'Location' of hyperspace file 'UNKNOWN' - COMM broken connection
        at void Hyperspace::Session::attr_get(uint64_t, const std::string&, Hypertable::DynamicBuffer&, Hypertable::Timer*) (/root/src/hypertable/src/cc/Hyperspace/Session.cc:546)
2013-10-23 11:28:29 commitData success
2013-10-23 11:28:29 file size:67108860 use time:32
2013-10-23 11:28:29 total_filesize:[63M] size/s:[1M/s] 
2013-10-23 11:28:29 HandleFile: //data02/cloudilbak/121/1352072004-1352329313-bssap-1-zh_cloud121.cdr
1382498909 INFO cdrimport_test : (/root/src/hypertable/src/cc/AsyncComm/IOHandlerData.cc:242) socket read(57, len=38) failure : Connection timed out
1382498909 INFO cdrimport_test : (/root/src/hypertable/src/cc/AsyncComm/ConnectionManager.cc:389) Received event Event: type=DISCONNECT from=172.16.23.164:38060
1382498909 INFO cdrimport_test : (/root/src/hypertable/src/cc/AsyncComm/ConnectionManager.cc:429) Event: type=DISCONNECT from=172.16.23.164:38060; Problem connecting to Root RangeServer, will retry in 3000 milliseconds...
2013-10-23 11:28:40 commitData
1382498933 INFO cdrimport_test : (/root/src/hypertable/src/cc/AsyncComm/IOHandlerData.cc:242) socket read(56, len=38) failure : Connection timed out
1382498933 INFO cdrimport_test : (/root/src/hypertable/src/cc/AsyncComm/ConnectionManager.cc:389) Received event Event: type=DISCONNECT from=172.16.23.164:38060
1382498933 INFO cdrimport_test : (/root/src/hypertable/src/cc/AsyncComm/ConnectionManager.cc:429) Event: type=DISCONNECT from=172.16.23.164:38060; Problem connecting to Root RangeServer, will retry in 3000 milliseconds...
1382498936 ERROR cdrimport_test : set_cells (/root/src/hypertable/src/cc/Hypertable/Lib/TableMutator.cc:116): Hypertable::Exception: Problem getting attribute 'Location' of hyperspace file 'UNKNOWN' - HYPERSPACE invalid handle
        at void Hyperspace::Session::attr_get(uint64_t, const std::string&, Hypertable::DynamicBuffer&, Hypertable::Timer*) (/root/src/hypertable/src/cc/Hyperspace/Session.cc:546)
1382498936 ERROR cdrimport_test : Commit (/home/jack/mnt2/MCCloud_V2.0/foundation/../common/incl/DLHypertableClient.h:328): Hypertable::Exception: Problem getting attribute 'Location' of hyperspace file 'UNKNOWN' - HYPERSPACE invalid handle
        at void Hyperspace::Session::attr_get(uint64_t, const std::string&, Hypertable::DynamicBuffer&, Hypertable::Timer*) (/root/src/hypertable/src/cc/Hyperspace/Session.cc:546)

I think some block maybe corrupted, but executing 'hadoop fsck /hypertable' denotes all blocks are healthy. At last, no others way, i restart the cluster. All errors disappear, it looks okay. 
Although it's okay now, but i want to know the real reason. The cluster is running on CDH4.3.0+Hypertable0.9.7.12. The attachment is the log of these two days.

log.zip

Doug Judd

unread,
Oct 25, 2013, 5:00:14 PM10/25/13
to hypertable-user
I see lots of exceptions like these in the DfsBroker logs:

java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:528)
        at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)

It appears that there was some network problem or problem with HDFS that prevented the DfsBroker from connecting to the HDFS servers.

- Doug


--
You received this message because you are subscribed to the Google Groups "Hypertable User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hypertable-us...@googlegroups.com.
To post to this group, send email to hyperta...@googlegroups.com.
Visit this group at http://groups.google.com/group/hypertable-user.
For more options, visit https://groups.google.com/groups/opt_out.



--
Doug Judd
CEO, Hypertable Inc.

David

unread,
Oct 25, 2013, 9:22:14 PM10/25/13
to hyperta...@googlegroups.com, do...@hypertable.com
I also feel like that the connection from DfsBroker to HDFS server is prevented, but strangely it is okay after restarting hypertable.
I never saw the following ERROR ago, what can the ERROR denote?

Doug Judd

unread,
Oct 25, 2013, 9:57:27 PM10/25/13
to hypertable-user
Take a look at the following Stack Overflow question:


Is it possible that someone was tweaking the firewall rules, or HDFS was restarted while Hypertable was running?

- Doug


--
You received this message because you are subscribed to the Google Groups "Hypertable User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hypertable-us...@googlegroups.com.
To post to this group, send email to hyperta...@googlegroups.com.
Visit this group at http://groups.google.com/group/hypertable-user.
For more options, visit https://groups.google.com/groups/opt_out.

David

unread,
Oct 28, 2013, 10:48:08 PM10/28/13
to hyperta...@googlegroups.com, do...@hypertable.com
Hi Doug,
The cluster always disables the firewall, and HDFS also did not restarted.
Is there other possible reason to result in the problem?

Shinobi_Jack

unread,
Nov 21, 2013, 10:40:39 PM11/21/13
to hyperta...@googlegroups.com, do...@hypertable.com
sometimes, when drop table, long times no response, so we stop command with "ctrl+c".
then, reentry ./ht shell and execute "get listing" the table has dorpped. redo create the same table, then the error as David said appears . 
why?  is it wrong operation or other reasons?  need more advice. thanks for your help.
在 2013年10月29日星期二UTC+8上午10时48分08秒,David写道:
Reply all
Reply to author
Forward
0 new messages