Cluster creation fails at 50%

Nicolas Punzo

unread,

May 21, 2013, 12:20:57 PM5/21/13

to serenge...@googlegroups.com

Hello,

I have redeployed properly the Serengeti vApp with a static IP address, I've configured the network, the resourcepool and the datastore, but when I create the cluster, I have this message :

FAILED 50%

node group: master, instance number: 1
roles:[hadoop_namenode, hadoop_jobtracker]
NAME                    IP           STATUS    TASK
------------------------------------------------------------------------------------
HadoopCluster-master-0 192.168.1.2 VM Ready Starting service hadoop-0.20-namenode

node group: worker, instance number: 3
roles:[hadoop_datanode, hadoop_tasktracker]
NAME                    IP           STATUS    TASK
--------------------------------------------------------------------
HadoopCluster-worker-1 192.168.1.4 VM Ready Formatting data disks
HadoopCluster-worker-2 192.168.1.5 VM Ready Formatting data disks
HadoopCluster-worker-0 192.168.1.3 VM Ready Formatting data disks

node group: client, instance number: 1
roles:[hadoop_client, pig, hive, hive_server]
NAME                    IP           STATUS    TASK
--------------------------------------------------------------------
HadoopCluster-client-0 192.168.1.6 VM Ready Formatting data disks

cluster HadoopCluster create failed: you can get task failure details from serengeti server log at: /opt/serengeti/logs/task/1

The /opt/serengeti/logs/task/1/stderr.log file contains the following error :

ERROR: Failed to authenticate to http://10.192.135.250:4000 as serengeti with key /opt/serengeti/.chef/serengeti.pem

So, if someone has already faced with this problem, I'd be happy to know how he has solved it :)

Best regards,

Nico

Nicolas Punzo

unread,

May 23, 2013, 7:55:54 AM5/23/13

to serenge...@googlegroups.com

Hello,

I have dug my problem and I've seen that when the problem occurs, the chef server web-ui displays "HttpServerException", and the couchdb server has crashed.

Typing "service couchdb status" gives "couchdb dead but subsys locked" and restarting couchdb solves the problem.

However, couchdb crashes so often that I can't deploy any cluster. I can't find in any log file (such as /var/log/couchdb/couch.log) which was the cause of the problem, because it seems to crash suddenly. Have you already seen, and solved this problem?

Best regards,

Nico

2013/5/21 Nicolas Punzo <nicola...@gmail.com>

Hui Hu

unread,

May 23, 2013, 10:41:49 PM5/23/13

to serenge...@googlegroups.com

Hi Nico,

We're working on fixing the couchdb crash issue. It's tracked in this bug https://issuetracker.springsource.com/browse/SERENGETI-1304.

Current work around is :

login serengeti server as user serengeti and execute the following commands
sudo sed -i -e "s|COUCHDB_STDERR_FILE=/dev/null|COUCHDB_STDERR_FILE=/var/log/couchdb/stderr.log|" /etc/sysconfig/couchdb
sudo sed -i -e "s|COUCHDB_STDOUT_FILE=/dev/null|COUCHDB_STDOUT_FILE=/var/log/couchdb/stdout.log|" /etc/sysconfig/couchdb
sudo sed -i -e "s|COUCHDB_RESPAWN_TIMEOUT=0|COUCHDB_RESPAWN_TIMEOUT=2|" /etc/sysconfig/couchdb

sudo service couchdb restart

Then use Serengeti to create cluster. This work around will let couchdb restart in 2 seconds after crached.

-Jesse Hu
Project Serengeti, VMware

Thanks & Best Regards,

Hui Hu, Beijing, China

2013/5/23 Nicolas Punzo <nicola...@gmail.com>

--

---
You received this message because you are subscribed to the Google Groups "serengeti-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to serengeti-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Nicolas Punzo

unread,

May 23, 2013, 11:50:20 PM5/23/13

to serenge...@googlegroups.com

Hahaha, yesterday I fixed it by a quite similar way, doing "watch --interval=5 service couchdb start". Glad to see I'm not the only one to use quick and dirty solutions :D

Best regards,

Nicolas

Hui Hu

unread,

May 24, 2013, 1:43:09 AM5/24/13

to serenge...@googlegroups.com

That's cool. Please let us know whether the couchdb crash issue is gone.

Thanks & Best Regards,

Hui Hu, Beijing, China

2013/5/24 Nicolas Punzo <nicola...@gmail.com>

Reply all

Reply to author

Forward