Cluster creation fails at 50%

30 views
Skip to first unread message

Nicolas Punzo

unread,
May 21, 2013, 12:20:57 PM5/21/13
to serenge...@googlegroups.com
Hello,

I have redeployed properly the Serengeti vApp with a static IP address, I've configured the network, the resourcepool and the datastore, but when I create the cluster, I have this message :

FAILED 50%

node group: master,  instance number: 1
roles:[hadoop_namenode, hadoop_jobtracker]
  NAME                    IP           STATUS    TASK
  ------------------------------------------------------------------------------------
  HadoopCluster-master-0  192.168.1.2  VM Ready  Starting service hadoop-0.20-namenode

node group: worker,  instance number: 3
roles:[hadoop_datanode, hadoop_tasktracker]
  NAME                    IP           STATUS    TASK
  --------------------------------------------------------------------
  HadoopCluster-worker-1  192.168.1.4  VM Ready  Formatting data disks
  HadoopCluster-worker-2  192.168.1.5  VM Ready  Formatting data disks
  HadoopCluster-worker-0  192.168.1.3  VM Ready  Formatting data disks

node group: client,  instance number: 1
roles:[hadoop_client, pig, hive, hive_server]
  NAME                    IP           STATUS    TASK
  --------------------------------------------------------------------
  HadoopCluster-client-0  192.168.1.6  VM Ready  Formatting data disks

cluster HadoopCluster create failed: you can get task failure details from serengeti server log at: /opt/serengeti/logs/task/1


The /opt/serengeti/logs/task/1/stderr.log file contains the following error :

ERROR: Failed to authenticate to http://10.192.135.250:4000 as serengeti with key /opt/serengeti/.chef/serengeti.pem

So, if someone has already faced with this problem, I'd be happy to know how he has solved it :)

Best regards,

Nico

Nicolas Punzo

unread,
May 23, 2013, 7:55:54 AM5/23/13
to serenge...@googlegroups.com
Hello,

I have dug my problem and I've seen that when the problem occurs, the chef server web-ui displays "HttpServerException", and the couchdb server has crashed.
Typing "service couchdb status" gives "couchdb dead but subsys locked" and restarting couchdb solves the problem.

However, couchdb crashes so often that I can't deploy any cluster. I can't find in any log file (such as /var/log/couchdb/couch.log) which was the cause of the problem, because it seems to crash suddenly. Have you already seen, and solved this problem?

Best regards,

Nico


2013/5/21 Nicolas Punzo <nicola...@gmail.com>

Hui Hu

unread,
May 23, 2013, 10:41:49 PM5/23/13
to serenge...@googlegroups.com
Hi Nico,

We're working on fixing the couchdb crash issue. It's tracked in this bug https://issuetracker.springsource.com/browse/SERENGETI-1304.

Current work around is :

login serengeti server as user serengeti and execute the following commands
sudo sed -i -e "s|COUCHDB_STDERR_FILE=/dev/null|COUCHDB_STDERR_FILE=/var/log/couchdb/stderr.log|" /etc/sysconfig/couchdb
sudo sed -i -e "s|COUCHDB_STDOUT_FILE=/dev/null|COUCHDB_STDOUT_FILE=/var/log/couchdb/stdout.log|" /etc/sysconfig/couchdb
sudo sed -i -e "s|COUCHDB_RESPAWN_TIMEOUT=0|COUCHDB_RESPAWN_TIMEOUT=2|" /etc/sysconfig/couchdb
sudo service couchdb restart

Then use Serengeti to create cluster. This work around will let couchdb restart in 2 seconds after crached.

-Jesse Hu
Project Serengeti, VMware

Thanks & Best Regards,
Hui Hu,  Beijing,  China


2013/5/23 Nicolas Punzo <nicola...@gmail.com>
--
 
---
You received this message because you are subscribed to the Google Groups "serengeti-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to serengeti-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Nicolas Punzo

unread,
May 23, 2013, 11:50:20 PM5/23/13
to serenge...@googlegroups.com

Hahaha, yesterday I fixed it by a quite similar way, doing "watch --interval=5 service couchdb start". Glad to see I'm not the only one to use quick and dirty solutions :D

Best regards,

Nicolas

Hui Hu

unread,
May 24, 2013, 1:43:09 AM5/24/13
to serenge...@googlegroups.com
That's cool. Please let us know whether the couchdb crash issue is gone.

Thanks & Best Regards,
Hui Hu,  Beijing,  China


2013/5/24 Nicolas Punzo <nicola...@gmail.com>
Reply all
Reply to author
Forward
0 new messages