Re: Editing Hadoop Template

390 views
Skip to first unread message

Emma Lin

unread,
May 14, 2013, 9:23:21 PM5/14/13
to serenge...@googlegroups.com, Jun Wang (c)
Thomas,
Using customized Linux OS is a feature of recent milestone, which will be released at the end of June.
Before that, if you're using CentOS/Linux 5 version compatible OS, it's fine to install your own hadoop template VM following the installation guide from source code.

+cc Jun, as Jun is working on the customized OS feature. If you want to change code to get this feature earlier, Jun may provide more help.

thanks
Emma

From: "Thomas Skoff" <thomas....@gmail.com>
To: serenge...@googlegroups.com
Sent: Tuesday, May 14, 2013 11:38:17 PM
Subject: Editing Hadoop Template

Hello,
 
I have another question, now that we have been able to successfully deploy Hadoop using Serengeti with default settings, we need to test the ability to create Hadoop clusters using a specific version of Fedora.  I've been looking through the source code in the Serengeti Pantry, but can not find what file(s) that need to be modified in order to use a different Hadoop template. Any assistance would be greatly appreciated.
 
Thank you,
Thomas


--
 
---
You received this message because you are subscribed to the Google Groups "serengeti-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to serengeti-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

jun wang

unread,
May 15, 2013, 10:27:32 PM5/15/13
to serenge...@googlegroups.com
Beside Jesse's comment, if you want to create a Fedora template, you can:
create a vm template with fedora and the corresponding vmtools installed(the vm disk configuration can refer to the centos 5.x template), and run the commands following the readme related to "Detailed Install Instructions for serengeti node template:"https://github.com/vmware-serengeti/doc/blob/master/installation_guide_from_source_code_M2.md

Thanks,
Jun


Date: Wed, 15 May 2013 04:10:40 -0700
From: huh...@gmail.com
To: serenge...@googlegroups.com
Subject: Re: Editing Hadoop Template

Currently Serengeti only supports CentOS 5.6+ as the VM Template. We're working on supporting CentOS 6 (will be released soon).  If you want to use Fedora as the template, you need to modify the serengeti-pantry code. So first you need to learn Chef, the create the cluster using Fedora VM Template to see what cookbook error occurs, then modify the cookbook accorrdingly.   I guess there won't be too much technical obstacles for supporting Fedora.

If you have any question, please post it here or ping me.

-Jesse Hu
VMware Project Serengeti

在 2013年5月14日星期二UTC+8下午11时38分17秒,Thomas Skoff写道:

jun wang

unread,
May 18, 2013, 4:14:51 AM5/18/13
to serenge...@googlegroups.com
Bootstrap error means these nodes can get ip already, correct? If so, please have a look at /opt/Serengeti/logs/ironfan.log, and as Jesse mentioned, you may need to modify cookbooks. If you are not sure, please attach this log file here.

Thanks,
Jun

On 2013-5-18, at 上午5:57, "Dido Dontchev" <dido.d...@gmail.com> wrote:

Jun,

We followed the instructions below to the letter on a Fedora 18.

Verified che client is installed using  chief-client -v command

We receive a bootstrapping failure, with no error in sterr.log

Where else can we look to identify a possible cause?

Thanks in advance.

DMD

Dido Dontchev

unread,
May 19, 2013, 12:36:45 AM5/19/13
to serenge...@googlegroups.com
Jun,

Thank you for your response. There is no ironfan.log at the directory your specified on our Serengeti management server.

I have attached a view from the /opt/serengeti/logs folder

We deployed it straight from the OVF file on serengeti project website. we will need to modify the cookbooks, but we don't know exactly what do we need to modify since there is no indication of an error other than the bootstrap failure.

Another odd problem is after we attempt to create the cluster , all nodes get cloned and powered up, after the bootstrap failure we can't log on the new clones any more using the credentials from the template. the passwords are being changed (i suspect the password-crypt.sh is being executed somehow...) however unlike the centos template the newly scrambled password is not shown at the login screen

Thanks again!

DMD


[serengeti@localhost logs]$ pwd
/opt/serengeti/logs
[serengeti@localhost logs]$ ls -al
total 6324
drwxr-xr-x  3 serengeti serengeti    4096 May 19 04:04 .
drwxr-xr-x 18 serengeti serengeti    4096 May 17 21:50 ..
-rw-r--r--  1 serengeti serengeti     301 May 17 21:50 serengeti-firstboot.err
-rw-------  1 serengeti serengeti    5722 May 17 21:50 serengeti-firstboot.log
-rw-r--r--  1 serengeti serengeti     466 May 18 19:05 serengeti-subsequentboot.err
-rw-------  1 serengeti serengeti    1798 May 18 19:05 serengeti-subsequentboot.log
-rw-r--r--  1 serengeti serengeti 6420275 May 19 03:25 serengeti.log
drwxr-xr-x 15 serengeti serengeti    4096 May 19 03:24 task
-rw-r--r--  1 serengeti serengeti     682 May 19 00:19 vhm.log
-rw-r--r--  1 serengeti serengeti     424 May 19 00:19 vhm.xml
-rw-r--r--  1 serengeti serengeti       0 May 19 00:19 vhm.xml.lck
[serengeti@localhost logs]$

Dido Dontchev

unread,
May 22, 2013, 11:55:52 AM5/22/13
to serenge...@googlegroups.com
Jun,

Here is more info regarding our problem with using serengeti with fedora:

Errors and warnings from stdout.log file (attempting Fedora install)

CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_userCONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_userCONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_userCONFIG WARN: unknown config item: vc_userCONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_pwdCONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
[Mon, 20 May 2013 18:15:51 +0000] WARN: no matched sharestores in host:10.10.1.3
[Mon, 20 May 2013 18:15:51 +0000] WARN: no matched localstores in host:10.10.1.3
[Mon, 20 May 2013 18:15:51 +0000] WARN: no matched localstores in host:10.10.1.2
[Mon, 20 May 2013 18:15:52 +0000] WARN: no matched sharestores in host:10.10.1.1
[Mon, 20 May 2013 18:15:52 +0000] WARN: no matched localstores in host:10.10.1.1
10.10.1.32 [Mon, 20 May 2013 19:11:41 +0000] WARN: Installation of Java packages are only supported on Debian/Ubuntu at this time. Please install it manually.
10.10.1.32                 echo '[ERROR] downloading tarball failed'
10.10.1.32 [Mon, 20 May 2013 19:12:44 +0000] WARN: Problem parsing line 'Could not get metalink https://mirrors.fedoraproject.org/metalink?repo=fedora-18&arch=x86_64 error was' from yum-dump.py! Please check your yum configuration.
10.10.1.32 [Mon, 20 May 2013 19:12:44 +0000] WARN: Problem parsing line '14: curl#6 - "Couldn't resolve host"' from yum-dump.py! Please check your yum configuration.
10.10.1.32 [Mon, 20 May 2013 19:12:44 +0000] WARN: Problem parsing line 'Could not get metalink https://mirrors.fedoraproject.org/metalink?repo=updates-released-f18&arch=x86_64 error was' from yum-dump.py! Please check your yum configuration.
10.10.1.32 [Mon, 20 May 2013 19:12:44 +0000] WARN: Problem parsing line '14: curl#6 - "Couldn't resolve host"' from yum-dump.py! Please check your yum configuration.
10.10.1.32 [Mon, 20 May 2013 19:13:11 +0000] ERROR: package[hmonitor-vsphere-namenode-daemon] (hadoop_cluster::namenode line 438) has had an error
10.10.1.32 [Mon, 20 May 2013 19:13:11 +0000] ERROR: package[hmonitor-vsphere-namenode-daemon] (/var/chef/cache/cookbooks/hadoop_cluster/libraries/hadoop_cluster.rb:438:in `hadoop_ha_package') had an error:
10.10.1.32 [Mon, 20 May 2013 19:13:11 +0000] ERROR: Running exception handlers
10.10.1.32 [Mon, 20 May 2013 19:13:11 +0000] ERROR: Exception handlers complete
10.10.1.34 [Mon, 20 May 2013 19:41:42 +0000] ERROR: search(:node, 'cluster_name:myFedora AND provides_service:hadoop-0.20-namenode') failed, return empty.
10.10.1.34 [Mon, 20 May 2013 19:41:42 +0000] ERROR: Running exception handlers
10.10.1.34 [Mon, 20 May 2013 19:41:42 +0000] ERROR: Exception handlers complete
10.10.1.33 [Mon, 20 May 2013 19:41:42 +0000] ERROR: search(:node, 'cluster_name:myFedora AND provides_service:hadoop-0.20-namenode') failed, return empty.
10.10.1.33 [Mon, 20 May 2013 19:41:42 +0000] ERROR: Running exception handlers
10.10.1.33 [Mon, 20 May 2013 19:41:42 +0000] ERROR: Exception handlers complete
10.10.1.31 [Mon, 20 May 2013 19:41:42 +0000] ERROR: search(:node, 'cluster_name:myFedora AND provides_service:hadoop-0.20-namenode') failed, return empty.
10.10.1.31 [Mon, 20 May 2013 19:41:42 +0000] ERROR: Running exception handlers
10.10.1.31 [Mon, 20 May 2013 19:41:42 +0000] ERROR: Exception handlers complete
10.10.1.35 [Mon, 20 May 2013 19:41:46 +0000] ERROR: search(:node, 'cluster_name:myFedora AND provides_service:hadoop-0.20-namenode') failed, return empty.
10.10.1.35 [Mon, 20 May 2013 19:41:46 +0000] ERROR: Running exception handlers
10.10.1.35 [Mon, 20 May 2013 19:41:46 +0000] ERROR: Exception handlers complete

Errors and warnings from stdout.log file (attempting default (Centos 5.6 install)


CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwdCONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
CONFIG WARN: unknown config item: vc_user
CONFIG WARN: unknown config item: vc_pwd
[Mon, 20 May 2013 19:46:23 +0000] WARN: no matched sharestores in host:10.10.1.3
[Mon, 20 May 2013 19:46:23 +0000] WARN: no matched localstores in host:10.10.1.3
[Mon, 20 May 2013 19:46:23 +0000] WARN: no matched localstores in host:10.10.1.2
[Mon, 20 May 2013 19:46:23 +0000] WARN: no matched sharestores in host:10.10.1.1
[Mon, 20 May 2013 19:46:23 +0000] WARN: no matched localstores in host:10.10.1.1
10.10.1.36 [Mon, 20 May 2013 20:06:09 +0000] WARN: Installation of Java packages are only supported on Debian/Ubuntu at this time. Please install it manually.
10.10.1.36                 echo '[ERROR] downloading tarball failed'
10.10.1.37 [Mon, 20 May 2013 20:08:10 +0000] WARN: Installation of Java packages are only supported on Debian/Ubuntu at this time. Please install it manually.
10.10.1.40 [Mon, 20 May 2013 20:08:10 +0000] WARN: Installation of Java packages are only supported on Debian/Ubuntu at this time. Please install it manually.
10.10.1.37                 echo '[ERROR] downloading tarball failed'
10.10.1.40                 echo '[ERROR] downloading tarball failed'
10.10.1.38 [Mon, 20 May 2013 20:08:12 +0000] WARN: Installation of Java packages are only supported on Debian/Ubuntu at this time. Please install it manually.
10.10.1.38                 echo '[ERROR] downloading tarball failed'
10.10.1.39 [Mon, 20 May 2013 20:08:14 +0000] WARN: Installation of Java packages are only supported on Debian/Ubuntu at this time. Please install it manually.
10.10.1.39                 echo '[ERROR] downloading tarball failed'
10.10.1.38 [Mon, 20 May 2013 20:13:59 +0000] WARN: Installation of Sun Java packages are only supported on Debian/Ubuntu at this time. Please install it manually.
10.10.1.38       echo "WARNING: Can't find sql file to create Hive metastore tables. Will let Hive create them automatcially."

We are going line by line trying to figure out which recipe needs modifying. Right now we have a hard time matching th errors with the corresponding cookbooks.

Thanks again for your assistance!

DMD

Jesse Hu

unread,
May 27, 2013, 2:41:09 AM5/27/13
to serenge...@googlegroups.com
Hi Dido,

Ther root cause of the failure on your Fedora VM is :


10.10.1.32 [Mon, 20 May 2013 19:13:11 +0000] ERROR: package[hmonitor-vsphere-namenode-daemon] (hadoop_cluster::namenode line 438) has had an error

Could you ssh to 10.10.1.32 and run 'sudo yum install hmonitor-vsphere-namenode-daemon' and send us the output ?

The password will be changed to a random value.  You can login Serengeti Server as user serengeti and run 'ssh 10.10.1.32' to login the VM.

在 2013年5月22日星期三UTC+8下午11时55分52秒,Dido Dontchev写道:

Dido Dontchev

unread,
May 28, 2013, 3:43:40 PM5/28/13
to serenge...@googlegroups.com
Jun,

Thank you for you respond.

Here is the output:


[hadoop@localhost ~]$ sudo yum install hmonitor-vsphere-namenode-daemon
Loaded plugins: langpacks, presto, refresh-packagekit
14: curl#6 - "Couldn't resolve host"
http://yum.jablonskis.org/fedora/18/x86_64/repodata/repomd.xml: [Errno 14] curl#6 - "Couldn't resolve host"
Trying other mirror.
14: curl#6 - "Couldn't resolve host"
Nothing to do


It is trying to reach out to internet.


Thanks!

DMD

Jesse Hu

unread,
May 28, 2013, 10:37:43 PM5/28/13
to serenge...@googlegroups.com
I see.  Because you're using Feroda, so by default the EPEL yum repo file is in /etc/yum.repo.d/ (this is the dir for Redhat, maybe Fedora has another dir).  When installing any rpm packages by Chef  (e.g. package[hmonitor-vsphere-namenode-daemon]), Chef will try to find this rpm in all the yum repos specified in /etc/yum.repo.d/*.repo . I guess your VM can't access Internet, so it raised the Internet timeout error.

There are 2 solution :

A) remove all the files in /etc/yum.repo.d/ in your Fedora VM template, or mkdir /etc/yum.repo.d/backup and move all *.repo into this backup dir; then create a new cluster in Serengeti.  We have added all the neccessary RPMs (for CentOS 5.6+) in Serengeti Internal yum server, so there is no need to give Internet access to the VM. Hope the rpms also works for Fedora, if not, please to go solution B.

B) give Internet access to the VM.  You can config VLAN in vSphere to give direct Internet access or via a http_proxy server.
This is guide for configuring a http_proxy server in Serengeti:
on Serengeti Server, add the following content into /opt/serengeti/conf/serengeti.properties

# set http proxy server
serengeti.http_proxy = http://<proxy_server:port>

# set the IPs of Serengeti Server and the your yum repository servers for 'serengeti.no_proxy'. The wildcard for matching multi IPs doesn't work. serengeti.no_proxy = 10.x.y.z, 192.168.x.y, etc.

Thanks
-Jeses Hu

在 2013年5月29日星期三UTC+8上午3时43分40秒,Dido Dontchev写道:

Dido Dontchev

unread,
May 29, 2013, 12:10:13 PM5/29/13
to serenge...@googlegroups.com
Jun,

Thank you for you help again ;)


You pointed us in the right directoon. We found out that a cook book recipe add_repo is performing check based on platform type. It only lists centos there and therefore the yum.repo.d directory never gets modified on the fedora clones. 

We are testimg it now.

Thanks again!


DMD

Dido Dontchev

unread,
May 30, 2013, 5:24:51 AM5/30/13
to serenge...@googlegroups.com
Jun,

The add_repo.rb is not creating the necessary serengeti-base.repo in /etc/yum.repos.d even if after we added 'fedora' & 'Fedora' to the 'when' line

DMD

Hu Hui

unread,
May 30, 2013, 10:01:00 AM5/30/13
to Dido Dontchev, serenge...@googlegroups.com
please use Chef::Log.info to print out the value of node.platform in the recipe add_repo . i guess it should be fedora.

Jesse

Dido Dontchev <dido.d...@gmail.com>编写:

Dido Dontchev

unread,
May 31, 2013, 12:23:50 AM5/31/13
to serenge...@googlegroups.com, Dido Dontchev
Jun, we passed over the first hurdle regarding creating /usr/lib/.hadoop symlinks and /etc/conf  (somewhat manually)


But now we are getting different kind of errors :

From stdout.log:


10.10.1.32 [Fri, 31 May 2013 00:04:48 +0000] ERROR: execute[format namenode] (hadoop_cluster::bootstrap_format_namenode line 26) has had an error
10.10.1.32 [Fri, 31 May 2013 00:04:48 +0000] ERROR: execute[format namenode] (/var/chef/cache/cookbooks/hadoop_cluster/recipes/bootstrap_format_namenode.rb:26:in `from_file') had an error:
10.10.1.32 execute[format namenode] (hadoop_cluster::bootstrap_format_namenode line 26) had an error: Chef::Exceptions::ShellCommandFailed: Expected process to exit with [0], but received '127'
10.10.1.32 ---- Begin output of
10.10.1.32     yes 'Y' | hadoop namenode -format
10.10.1.32
10.10.1.32     exit_status=$?
10.10.1.32
10.10.1.32     if [ $exit_status -eq 0 ]; then touch /mnt/hadoop/.namenode_formatted.log ; fi
10.10.1.32     exit $exit_status
10.10.1.32    ----
10.10.1.32 STDOUT:
10.10.1.32 STDERR: sh: hadoop: command not found
10.10.1.32 ---- End output of
10.10.1.32     yes 'Y' | hadoop namenode -format
10.10.1.32
10.10.1.32     exit_status=$?
10.10.1.32     if [ $exit_status -eq 0 ]; then touch /mnt/hadoop/.namenode_formatted.log ; fi
10.10.1.32
10.10.1.32     exit $exit_status
10.10.1.32    ----
10.10.1.32 Ran
10.10.1.32     yes 'Y' | hadoop namenode -format
10.10.1.32
10.10.1.32     exit_status=$?
10.10.1.32
10.10.1.32     if [ $exit_status -eq 0 ]; then touch /mnt/hadoop/.namenode_formatted.log ; fi
10.10.1.32     exit $exit_status
10.10.1.32    returned 127
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/shell_out.rb:206:in `invalid!'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/shell_out.rb:192:in `error!'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/mixin/shell_out.rb:36:in `shell_out!'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/provider/execute.rb:58:in `action_run'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource.rb:440:in `run_action'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/runner.rb:45:in `run_action'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/runner.rb:81:in `block (2 levels) in converge'
10.10.1.32
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/runner.rb:81:in `each'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/runner.rb:81:in `block in converge'
10.10.1.32
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection.rb:94:in `block in execute_each_resource'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection/stepable_iterator.rb:116:in `call'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection/stepable_iterator.rb:116:in `call_iterator_block'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection/stepable_iterator.rb:85:in `step'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection/stepable_iterator.rb:104:in `iterate'
10.10.1.32
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection.rb:92:in `execute_each_resource'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/runner.rb:76:in `converge'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/client.rb:312:in `converge'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/client.rb:160:in `run'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/application/client.rb:239:in `block in run_application'
10.10.1.32
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/application/client.rb:229:in `loop'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/application/client.rb:229:in `run_application'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/application.rb:67:in `run'
10.10.1.32 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/bin/chef-client:26:in `<top (required)>'
10.10.1.32 /usr/bin/chef-client:19:in `load'
10.10.1.32 /usr/bin/chef-client:19:in `<main>'
10.10.1.32 [Fri, 31 May 2013 00:04:48 +0000] ERROR: Running exception handlers
10.10.1.32 [Fri, 31 May 2013 00:04:48 +0000] FATAL: Saving node information to /var/chef/cache/failed-run-data.json
10.10.1.32 [Fri, 31 May 2013 00:04:48 +0000] ERROR: Exception handlers complete
10.10.1.32 [Fri, 31 May 2013 00:04:48 +0000] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out
10.10.1.32 [Fri, 31 May 2013 00:04:48 +0000] FATAL: Chef::Exceptions::ShellCommandFailed: execute[format namenode] (hadoop_cluster::bootstrap_format_namenode line 26) had an error: Chef::Exceptions::ShellCommandFailed: Expected process to exit with [0], but received '127'
10.10.1.32 ---- Begin output of
10.10.1.32     yes 'Y' | hadoop namenode -format
10.10.1.32



From chef-stacktrace-out:

[hadoop@localhost cache]$ vi chef-stacktrace.out
/usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection/stepable_iterator.rb:85:in `step'
/usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection/stepable_iterator.rb:104:in `iterate'
/usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index'
/usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection.rb:92:in `execute_each_resource'
/usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/runner.rb:76:in `converge'
/usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/client.rb:312:in `converge'
/usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/client.rb:160:in `run'
/usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/application/client.rb:239:in `block in run_application'
/usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/application/client.rb:229:in `loop'
/usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/application/client.rb:229:in `run_application'
/usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/application.rb:67:in `run'
/usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/bin/chef-client:26:in `<top (required)>'
/usr/bin/chef-client:19:in `load'
/usr/bin/chef-client:19:in `<main>'

Still trying to understand the time line and hierarchy of the process. We have been debugging and we are making progress. We are appreciate all of you help


DMD

Jesse Hu

unread,
May 31, 2013, 3:40:13 AM5/31/13
to serenge...@googlegroups.com, Dido Dontchev
STDERR: sh: hadoop: command not found

This error means the hadoop command is not in $PATH.  Which hadoop distro are you deploying ?  And could you login that VM to check which directory the hadoop command is installed in ? Usually it will be in /usr/bin/hadoop .

First, please ensure the hadoop package is installed :  rpm -qa | grep hadoop  (if using yum to deploy hadoop)  or ls /usr/lib/hadoop/ (if using tar to deploy hadoop)
Second, use 'rpm -q --fileprovide hadoop' to find out where is the hadoop command

-Jesse Hu

在 2013年5月31日星期五UTC+8下午12时23分50秒,Dido Dontchev写道:

Dido Dontchev

unread,
May 31, 2013, 1:57:04 PM5/31/13
to serenge...@googlegroups.com, Dido Dontchev
Jun.

So,

You are correct again. hadoop is not installed. so we identified the recipe that doesn't run correctly. namenode.rb is referencing

hadoop_package node[:hadoop][:packages][:namenode][:name]


hadoop_ha_package "namenode" if hortonworks_hmonitor_enabled


from libraries/hadoop_cluster.rb


Yesterday we downloaded the tarball manualy and installed it in , but it seems that not everything runs smoothly there...

so the issue is definitely in  namenode.rb, hadoop_cluster.rb and add_repo.rb

we modified add_repo to iclude 'fedora' at the when line

added markers here and there like Chef::Log.info("worthy message") in attempt to possibly identify where the breakdown occurs
uploaded all that with knife cookbook upload -a


Can you explain what exactly the following code means?

when 'centos', 'redhat'
  prefix = node[:platform] == 'centos' ? 'CentOS' : 'rhel'


Thanks!

DMD

Dido Dontchev

unread,
Jun 1, 2013, 1:18:53 PM6/1/13
to serenge...@googlegroups.com
Jun,


So finally we got the namenode to bootstrap correctly and the worker modes. The only bootstrap failure we get is for the client.

Error is appearing o line 49 of
Postgresql/server_redhat.rb 

" package posthresql"

Return code 1 instead of 0


I ranthe yum install postgresql from within the cloned client node and it failed due to missing dependancies libssl.so.6 and libcrypt.so.6 .  Fedora uses a lot newer versions of these lib files. Tried to download openssl098e and run it. But is too old of a version and it doesnt exist in the repository. Any ideas?

Thanks again!


DMD

Hui Hu

unread,
Jun 3, 2013, 5:44:07 AM6/3/13
to serenge...@googlegroups.com
The following code in recipe add_repo is for removing the standard CentOS yum repos (/etc/yum.repos.d/CentOS*.repo and adding the Serengeti's internal yum repo under /etc/yum.repos.d/.    Serengeti's internal yum repo contains the postgresql and other related packages.  

case node[:platform]

when 'centos', 'redhat'
  prefix = node[:platform] == 'centos' ? 'CentOS' : 'rhel'
  if !node[:enable_standard_os_yum_repos] or !is_connected_to_internet
    directory '/etc/yum.repos.d/backup' do
      mode '0755'
    end
    file = "/etc/yum.repos.d/#{prefix}*.repo"
    execute 'disable all standard yum repos' do
      only_if "ls #{file}"
      command "mv -f #{file} /etc/yum.repos.d/backup/; rm -rf /etc/yum.repos.d/*.repo"
    end
  else
    file = "/etc/yum.repos.d/backup/#{prefix}*.repo"
    execute 'enable all standard yum repos' do
      only_if "ls #{file}"
      command "mv -f #{file} /etc/yum.repos.d/"
    end
  end

  yum_repos = package_repos
  yum_repos.each do |yum_repo|
    Chef::Log.info("Add yum repo #{yum_repo}")
    file = "/etc/yum.repos.d/#{::File.basename(yum_repo)}"
    remote_file file do
      source yum_repo
      mode '0644'
    end
  end
end

According to your info "I ranthe yum install postgresql from within the cloned client node and it failed due to missing dependancies libssl.so.6 and libcrypt.so.6 .  Fedora uses a lot newer versions of these lib files.",  seems the postgresql package in the Serengeti's yum repo can not be used on Fedora.

So you need to turn on the standard Fedora yum repos like this :

1) open /opt/serengeti/.chef/knife.rb and change the line 'knife[:enable_standard_os_yum_repos] = false' to 'knife[:enable_standard_os_yum_repos] = true'
2) Ensure the VMs that Serengeti created can connect to the Internet, so it can download the postgresql package from Internet.
If the VM needs a http_proxy server to connect to the Internet, pleae login Serengeti Server, add the following content into /opt/serengeti/conf/serengeti.properties then restart tomcat 'sudo service tomcat restart'


# set http proxy server
serengeti.http_proxy = http://<proxy_server:port>

# set the IPs of Serengeti Server and the your yum repository servers for 'serengeti.no_proxy'. The wildcard for matching multi IPs doesn't work. serengeti.no_proxy = 10.x.y.z, 192.168.x.y, etc.

3) continue the cluster creation by running 'cluster create --name ... --resume'




Thanks & Best Regards,
Hui Hu,  Beijing,  China


2013/6/2 Dido Dontchev <dido.d...@gmail.com>
--

Dido Dontchev

unread,
Jun 6, 2013, 11:16:15 PM6/6/13
to serenge...@googlegroups.com
Jesse,

So the current issue  identified bellow :


client bootstrap fail stdout.log:

10.10.1.54 [Thu, 06 Jun 2013 09:29:11 +0000] INFO: execute[postgresql initdb] ran successfully
10.10.1.54
10.10.1.54 [Thu, 06 Jun 2013 09:29:11 +0000] INFO: Processing template[/var/lib/pgsql/data/postgresql.conf] action create (postgresql::server_redhat line 67)
10.10.1.54
10.10.1.54 [Thu, 06 Jun 2013 09:29:11 +0000] INFO: template[/var/lib/pgsql/data/postgresql.conf] owner changed to 26
10.10.1.54
10.10.1.54 [Thu, 06 Jun 2013 09:29:11 +0000] INFO: template[/var/lib/pgsql/data/postgresql.conf] group changed to 26
10.10.1.54
10.10.1.54 [Thu, 06 Jun 2013 09:29:11 +0000] INFO: template[/var/lib/pgsql/data/postgresql.conf] updated content
10.10.1.54
10.10.1.54 [Thu, 06 Jun 2013 09:29:11 +0000] INFO: Processing service[postgresql] action enable (postgresql::server_redhat line 74)
10.10.1.54
10.10.1.54 [Thu, 06 Jun 2013 09:29:12 +0000] INFO: service[postgresql] enabled
10.10.1.54
10.10.1.54 [Thu, 06 Jun 2013 09:29:12 +0000] INFO: Processing service[postgresql] action start (postgresql::server_redhat line 74)
10.10.1.54
10.10.1.54 [Thu, 06 Jun 2013 09:29:12 +0000] ERROR: service[postgresql] (postgresql::server_redhat line 74) has had an error
10.10.1.54
10.10.1.54 [Thu, 06 Jun 2013 09:29:12 +0000] ERROR: service[postgresql] (/var/chef/cache/cookbooks/postgresql/recipes/server_redhat.rb:74:in `from_file') had an error:
10.10.1.54
10.10.1.54 service[postgresql] (postgresql::server_redhat line 74) had an error: Chef::Exceptions::Exec: /sbin/service postgresql start returned 1, expected 0
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/mixin/command.rb:127:in `handle_command_failures'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/mixin/command.rb:74:in `run_command'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/provider/service/init.rb:37:in `start_service'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/provider/service.rb:57:in `action_start'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource.rb:440:in `run_action'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/runner.rb:45:in `run_action'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/runner.rb:81:in `block (2 levels) in converge'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/runner.rb:81:in `each'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/runner.rb:81:in `block in converge'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection.rb:94:in `block in execute_each_resource'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection/stepable_iterator.rb:116:in `call'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection/stepable_iterator.rb:116:in `call_iterator_block'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection/stepable_iterator.rb:85:in `step'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection/stepable_iterator.rb:104:in `iterate'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/resource_collection.rb:92:in `execute_each_resource'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/runner.rb:76:in `converge'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/client.rb:312:in `converge'
10.10.1.54
10.10.1.54 /usr/lib/ru
10.10.1.54 by/gems/1.9.1/gems/chef-0.10.8/lib/chef/client.rb:160:in `run'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/application/client.rb:239:in `block in run_application'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/application/client.rb:229:in `loop'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/application/client.rb:229:in `run_application'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/lib/chef/application.rb:67:in `run'
10.10.1.54
10.10.1.54 /usr/lib/ruby/gems/1.9.1/gems/chef-0.10.8/bin/chef-client:26:in `<top (required)>'
10.10.1.54
10.10.1.54 /usr/bin/chef-client:19:in `load'
10.10.1.54
10.10.1.54 /usr/bin/chef-client:19:in `<main>'
10.10.1.54
10.10.1.54 [Thu, 06 Jun 2013 09:29:12 +0000] ERROR: Running exception handlers
10.10.1.54
10.10.1.54 [Thu, 06 Jun 2013 09:29:13 +0000] FATAL: Saving node information to /var/chef/cache/failed-run-data.json
10.10.1.54
10.10.1.54 [Thu, 06 Jun 2013 09:29:13 +0000] ERROR: Exception handlers complete
10.10.1.54
10.10.1.54 [Thu, 06 Jun 2013 09:29:13 +0000] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out
10.10.1.54
10.10.1.54 [Thu, 06 Jun 2013 09:29:13 +0000] FATAL: Chef::Exceptions::Exec: service[postgresql] (postgresql::server_redhat line 74) had an error: Chef::Exceptions::Exec: /sbin/service postgresql start returned 1, expected 0
10.10.1.54
Bootstrapping node myFedora-client-0 (10.10.1.54) completed with exit status 1
Bootstrapping cluster myFedora completed with exit status [0, 0, 0, 0, 1]


 The recipe fails to start postgresql. After the cloning process I was able to manually start the postgresql service on t the client node, so we know that postgresql is installed successfully on the client node. 

Your input is greatly appreciated!

DMD

Hui Hu

unread,
Jun 8, 2013, 3:40:42 AM6/8/13
to serenge...@googlegroups.com
That's a good new you are about to make Feroda work in Serengeti.

To debug this issue, please login to that client VM as user serengeti and run 'sudo /sbin/service postgresql start' and check the output and postgresql log, to find out why it can't start.

If you don't need hive server (which requires postgresql ) , you can remove hive_server role from the cluster spec file, then I think you can create a cluster successfully.  hive_server role is included in the default hadoop cluster spec.

Thanks & Best Regards,
Hui Hu,  Beijing,  China


2013/6/7 Dido Dontchev <dido.d...@gmail.com>

Thomas Skoff

unread,
Jul 10, 2013, 2:39:46 PM7/10/13
to serenge...@googlegroups.com
We have still been unable to get the hadoop client running Fedora to pass the bootstrapping process.  We are trying Fedora 13 (based on requirements given to us) and get the following error that occurs while bootstrapping the Client:
 

Chef::Exceptions::ResourceNotFound: Cannot find a resource matching service[postgresql] (did you define it first?)

 Attached is our stdout log and the failed run data log, which provides some insight.  This looks like a Chef error, but we have not been able to find a resolution in Chef documentation. 
 
Any information or assistance would be greatly appreciated.
 
Regards,
Thomas Skoff
 
Failed Run Data.rtf
stdout.rtf

Jesse Hu

unread,
Jul 10, 2013, 11:25:18 PM7/10/13
to serenge...@googlegroups.com
Hi Thomas,

I checked your logs. The problem should be in this line https://github.com/vmware-serengeti/serengeti-pantry/blob/m4.ga/cookbooks/postgresql/recipes/server.rb#L39 :

case node.platform
when "redhat", "centos", "fedora", "suse", "scientific", "amazon"
  include_recipe "postgresql::server_redhat" # this line is not executed. could you add a log here to check the value of 'node.platform'?
when "debian", "ubuntu"
  include_recipe "postgresql::server_debian"
end

template "#{node[:postgresql][:dir]}/pg_hba.conf" do
  source "pg_hba.conf.erb"
  owner "postgres"
  group "postgres"
  mode "0600"
  notifies :reload, resources(:service => "postgresql"), :immediately
end

Have you already get postgresql cookbook work on Fedora? If not , you can remove 'hive_server' role for the client node group in your cluster spec file and create a new one.  'hive_server' role uses trigger the installation of postgresql server.

-Jesse

在 2013年7月11日星期四UTC+8上午2时39分46秒,Thomas Skoff写道:

Dido Dontchev

unread,
Jul 13, 2013, 5:37:20 PM7/13/13
to serenge...@googlegroups.com
Jesse Hu,

Thank you for all your help. We were finally able to create hadoop clusters via serengeti isong both Fedora 18 and Fedora 13 as base OS for the template. It appears to be running stable, so thanks again!


NPS
Message has been deleted

Jesse Hu

unread,
Jul 16, 2013, 2:28:10 AM7/16/13
to serenge...@googlegroups.com
Hi Dido,

We are very happy and willing to help out. That's really an exciting news. This is the 1st OS template (besides CentOS 5.6+ and 6.2+) that verified and reported by the Serengeti open source community on which Serengeti can run well. We believe Serengeti can run on more OS templates (in addition to CentOS, Redhat, Fedora).

We are also open to accept your github pull request for enabling Serengeti with Fedora template support.

Please don't hesitate to post questions here or send us email for more support.

-Jesse Hu @Serengeti

Dido Dontchev

unread,
Jul 24, 2013, 11:09:57 AM7/24/13
to serenge...@googlegroups.com
Jesse,

How can we add/modify already created cluster with serengeti ? Is it possible to add nodes to existing cluster?

Deyan

jun wang

unread,
Jul 24, 2013, 8:51:58 PM7/24/13
to serenge...@googlegroups.com
Hi Deyan,

You can use cli command "cluster resize --name clustername --nodeGroup groupname --instanceNum your_expected_total_number_in_this_group" to add more worker nodes.

Thanks.


Date: Wed, 24 Jul 2013 08:09:57 -0700
From: dido.d...@gmail.com

To: serenge...@googlegroups.com
Subject: Re: Editing Hadoop Template

Dido Dontchev

unread,
Jul 25, 2013, 12:50:00 AM7/25/13
to serenge...@googlegroups.com
Jun,

I tried that option and and it didn't go thru... It provisioned the additional node, but failed the bootstrap.

Deyan

Jesse Hu

unread,
Jul 26, 2013, 6:54:03 AM7/26/13
to serenge...@googlegroups.com
could you ssh to one of the failure node and paste the content of  /var/chef/cache/chef-stacktrace.out ? It will show why it failed.

And you can also run 'sudo chef-client' on the node to see whether it can succeed.

-Jesse
Message has been deleted
Message has been deleted

Jithender Reddy

unread,
Oct 3, 2013, 2:11:04 AM10/3/13
to serenge...@googlegroups.com
Hello,

 you will provide the taring for hadoop serengety?

Regards,
Jithender reddy
Hadoop Engineer.


On Tuesday, May 14, 2013 9:08:17 PM UTC+5:30, Thomas Skoff wrote:
Hello,
 
I have another question, now that we have been able to successfully deploy Hadoop using Serengeti with default settings, we need to test the ability to create Hadoop clusters using a specific version of Fedora.  I've been looking through the source code in the Serengeti Pantry, but can not find what file(s) that need to be modified in order to use a different Hadoop template. Any assistance would be greatly appreciated.
 
Thank you,
Thomas

Jesse Hu

unread,
Oct 6, 2013, 7:31:34 AM10/6/13
to serenge...@googlegroups.com
Hi Jithender,

You can also follow this guide to use Fedora : https://groups.google.com/forum/?fromgroups=#!topic/serengeti-user/UQ__Y5ARy8U

Jesse
Reply all
Reply to author
Forward
0 new messages