Add a instance for Ganeti error.

728 views
Skip to first unread message

harryxiyou

unread,
Jun 22, 2013, 4:19:15 AM6/22/13
to Ganeti Users list, Lance Albertson, Michele Tartara, Guido Trotter, Constantinos Venetsanopoulos
Hi all,

I have written a WIKI about how to install Ganeti on a physical PC
here https://code.google.com/p/gsocxy/wiki/InstallGaneti .

It works well at the beginning. However, when i add an instance for
node1.example.com, it happens something wrong to me. The messages
are as follows.


root@node1:~# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 6.0.7 (squeeze)
Release: 6.0.7
Codename: squeeze

root@node1:~# gnt-cluster version
Software version: 2.1.6
Internode protocol: 30
Configuration format: 2010000
OS api version: 15
Export interface: 0

root@node1:~# gnt-node list
Node DTotal DFree MTotal MNode MFree Pinst Sinst
node1.example.com 222.5G 222.5G ? ? ? 0 0

root@node1:~# gnt-instance add -t plain -n node1.example.com -s 10G -o
debootstrap+default inst1.example.com
Failure: prerequisites not met for this operation:
error type: environment_error, error details:
Can't compute free memory on node node1.example.com, result was 'None'

root@node1:~# gnt-instance add -t drbd -n node1.example.com -s 10G -o
debootstrap+default inst1.example.com
Failure: prerequisites not met for this operation:
error type: wrong_input, error details:
The networked disk templates need a mirror node

root@node1:~# gnt-node info
Node name: node1.example.com
primary ip: 192.168.1.100
secondary ip: 192.168.1.100
master candidate: True
drained: False
offline: False
primary for no instances
secondary for no instances

root@node1:~# gnt-cluster info
Cluster name: cluster1.example.com
Cluster UUID: 3d170fa1-a627-474a-a01a-da3406851aeb
Creation time: 2013-06-22 04:06:48
Modification time: 2013-06-22 04:06:48
Master node: node1.example.com
Architecture (this node): 64bit (x86_64)
Tags: (none)
Default hypervisor: xen-pvm
Enabled hypervisors: xen-pvm
Hypervisor parameters:
- xen-pvm:
bootloader_args:
bootloader_path:
initrd_path:
kernel_args: ro
kernel_path: /boot/vmlinuz-2.6-xenU
migration_port: 8002
root_path: /dev/sda1
use_bootloader: False
OS specific hypervisor parameters:
Cluster parameters:
- candidate pool size: 10
- master netdev: xen-br0
- lvm volume group: xenvg
- file storage path: /srv/ganeti/file-storage
- maintenance of node health: False
- uid pool:
Default instance parameters:
- default:
auto_balance: True
memory: 128
vcpus: 1
Default nic parameters:
- default:
link: xen-br0
mode: bridged


Would anyone please give me some suggestions? Thanks very much ;-)

--
Thanks
Weiwei Jia (Harry Wei)

harryxiyou

unread,
Jun 22, 2013, 4:44:32 AM6/22/13
to Ganeti Users list, Lance Albertson, Michele Tartara, Guido Trotter, Constantinos Venetsanopoulos
I have found some clues by /var/log/ganeti/commands.log

[...]
2013-06-22 04:06:43,666: gnt-cluster init pid=1924 INFO run with
arguments '-g xenvg --master-netdev xen-br0 cluster1.example.com'
2013-06-22 04:06:46,916: gnt-cluster init pid=1924 ERROR RPC error in
version from node node1.example.com: Connection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
failed (111: Connection refused)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


This tells me node1.example.com may have something wrong, right?
But how to solve it? I think it may lead the '?' below the 'MTotal,
MNode and MFree', right?


2013-06-22 04:06:56,842: gnt-node list pid=2039 INFO run with no arguments
2013-06-22 04:08:42,617: gnt-instance add pid=2046 INFO run with
arguments '-t plain -n node1.example.com -s 1G -o debootstrap+default
inst1.example.com'
2013-06-22 04:08:46,051: gnt-instance add pid=2046 ERROR Error during
command processing
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1676, in GenericMain
result = func(options, args)
File "/usr/sbin/gnt-instance", line 347, in AddInstance
return GenericInstanceCreate(constants.INSTANCE_CREATE, opts, args)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1820, in
GenericInstanceCreate
SubmitOrSend(op, opts)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1505, in SubmitOrSend
return SubmitOpCode(op, cl=cl, feedback_fn=feedback_fn, opts=opts)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1482, in SubmitOpCode
op_results = PollJob(job_id, cl=cl, feedback_fn=feedback_fn)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1464, in PollJob
return GenericPollJob(job_id, _LuxiJobPollCb(cl), reporter)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1299, in
GenericPollJob
errors.MaybeRaise(msg)
File "/usr/lib/pymodules/python2.6/ganeti/errors.py", line 396, in MaybeRaise
raise err_class, tuple(result[1])
OpPrereqError: ("Can't compute free memory on node node1.example.com,
result was 'None'", 'environment_error')
2013-06-22 04:14:43,494: gnt-node list pid=2091 INFO run with no arguments
2013-06-22 04:15:15,998: gnt-instance add pid=2115 INFO run with
arguments '-t plain -n node1.example.com -s 10G -o debootstrap+default
inst1.example.com'
2013-06-22 04:15:19,431: gnt-instance add pid=2115 ERROR Error during
command processing
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1676, in GenericMain
result = func(options, args)
File "/usr/sbin/gnt-instance", line 347, in AddInstance
return GenericInstanceCreate(constants.INSTANCE_CREATE, opts, args)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1820, in
GenericInstanceCreate
SubmitOrSend(op, opts)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1505, in SubmitOrSend
return SubmitOpCode(op, cl=cl, feedback_fn=feedback_fn, opts=opts)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1482, in SubmitOpCode
op_results = PollJob(job_id, cl=cl, feedback_fn=feedback_fn)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1464, in PollJob
return GenericPollJob(job_id, _LuxiJobPollCb(cl), reporter)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1299, in
GenericPollJob
errors.MaybeRaise(msg)
File "/usr/lib/pymodules/python2.6/ganeti/errors.py", line 396, in MaybeRaise
raise err_class, tuple(result[1])
OpPrereqError: ("Can't compute free memory on node node1.example.com,
result was 'None'", 'environment_error')
2013-06-22 04:15:28,322: gnt-instance add pid=2129 INFO run with
arguments '-t drbd -n node1.example.com -s 10G -o debootstrap+default
inst1.example.com'
2013-06-22 04:15:31,354: gnt-instance add pid=2129 ERROR Error during
command processing
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1676, in GenericMain
result = func(options, args)
File "/usr/sbin/gnt-instance", line 347, in AddInstance
return GenericInstanceCreate(constants.INSTANCE_CREATE, opts, args)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1820, in
GenericInstanceCreate
SubmitOrSend(op, opts)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1505, in SubmitOrSend
return SubmitOpCode(op, cl=cl, feedback_fn=feedback_fn, opts=opts)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1482, in SubmitOpCode
op_results = PollJob(job_id, cl=cl, feedback_fn=feedback_fn)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1464, in PollJob
return GenericPollJob(job_id, _LuxiJobPollCb(cl), reporter)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1299, in
GenericPollJob
errors.MaybeRaise(msg)
File "/usr/lib/pymodules/python2.6/ganeti/errors.py", line 396, in MaybeRaise
raise err_class, tuple(result[1])
OpPrereqError: ('The networked disk templates need a mirror node',
'wrong_input')
2013-06-22 04:17:18,654: gnt-node info pid=2134 INFO run with no arguments
2013-06-22 04:17:37,454: gnt-cluster info pid=2135 INFO run with no arguments
2013-06-22 04:18:25,926: gnt-cluster version pid=2145 INFO run with no arguments
2013-06-22 04:20:05,073: gnt-node list pid=2166 INFO run with no arguments
2013-06-22 04:30:34,470: gnt-node volumes pid=2209 INFO run with no arguments
2013-06-22 04:31:01,434: gnt-os list pid=2212 INFO run with no arguments

harryxiyou

unread,
Jun 22, 2013, 5:34:58 AM6/22/13
to Sebastian Gebhard, Ganeti Users list, Lance Albertson, Michele Tartara, Guido Trotter, Constantinos Venetsanopoulos
On Sat, Jun 22, 2013 at 4:46 PM, Sebastian Gebhard <se...@fs.ei.tum.de> wrote:
> Hi Weiwei,

Hi Sebastian,

>
> I can't really speak for the DRBD part of the errors you are getting,
> but I might have a clue on the memory issue.
>

Yeah, thanks very much ;-)

[...]
> From your wiki article I think you missed some options while configuring
> XEN (happened to me as well):
> You should set
> (enable-dom0-ballooning no)
> in your /etc/xen/xend-config.sxp

Yeah, it should be set, i have added it.

>
> Second, I can't see where you switched your grub to booting the xen
> hypervisor? If Squeeze is using Grub2, it would just be as easy as
> mv /etc/grub.d/20_linux_xen /etc/grub.d/08_linux_xen
> which will change the list of kernels your grub will boot and make the
> xen hypervisor the default kernel.

Yeah, it should be added, too ;-)

>
> Thirdly, do not forget to set the memory for the hypervisor using the
> kernel cmdline. This is done in
> /etc/default/grub by adding
> GRUB_CMDLINE_XEN="dom0_mem=2048M"
> Note that this is also specific to Grub2 and might be different to older
> Grub versions.
>

Yeah, as above ;-)

I am not clear about the differences between 'Linux 2.6.32-5-xen-amd64'
and 'Linux 2.6.32-5-xen-amd64 and XEN 4.0-amd64' entry in the grub
list. And when it chooses the default 'Linux 2.6.32-5-xen-amd64 and
XEN 4.0-amd64'
entry, it won't start but just recycle to start again and again :-/

Any comments?

> Afterwards, you have to run update-grub again and it should work.
>
> I hope I could help you with this!
>

Thanks very much, i think the key points with me is when i init cluster
with `gnt-cluster init cluster1`, the logs would be as follows.

[...]
2013-06-22 05:18:16,862: gnt-cluster init pid=1549 INFO run with
arguments 'cluster1'
2013-06-22 05:18:20,162: gnt-cluster init pid=1549 ERROR RPC error in
version from node node1.example.com: Connection failed (111:
Connection refused)
[...]

Have you ever happened to this condition? Thanks ;-)

Sebastian Gebhard

unread,
Jun 22, 2013, 4:46:53 AM6/22/13
to gan...@googlegroups.com, harryxiyou, Lance Albertson, Michele Tartara, Guido Trotter, Constantinos Venetsanopoulos
Hi Weiwei,

I can't really speak for the DRBD part of the errors you are getting,
but I might have a clue on the memory issue.


Am 22.06.13 10:19, schrieb harryxiyou:
> Hi all,
>
> I have written a WIKI about how to install Ganeti on a physical PC
> here https://code.google.com/p/gsocxy/wiki/InstallGaneti .
>
> It works well at the beginning. However, when i add an instance for
> node1.example.com, it happens something wrong to me. The messages
> are as follows.
> root@node1:~# gnt-node list
> Node DTotal DFree MTotal MNode MFree Pinst Sinst
> node1.example.com 222.5G 222.5G ? ? ? 0 0
>
> root@node1:~# gnt-instance add -t plain -n node1.example.com -s 10G -o
> debootstrap+default inst1.example.com
> Failure: prerequisites not met for this operation:
> error type: environment_error, error details:
> Can't compute free memory on node node1.example.com, result was 'None'
>

From your wiki article I think you missed some options while configuring
XEN (happened to me as well):
You should set
(enable-dom0-ballooning no)
in your /etc/xen/xend-config.sxp

Second, I can't see where you switched your grub to booting the xen
hypervisor? If Squeeze is using Grub2, it would just be as easy as
mv /etc/grub.d/20_linux_xen /etc/grub.d/08_linux_xen
which will change the list of kernels your grub will boot and make the
xen hypervisor the default kernel.

Thirdly, do not forget to set the memory for the hypervisor using the
kernel cmdline. This is done in
/etc/default/grub by adding
GRUB_CMDLINE_XEN="dom0_mem=2048M"
Note that this is also specific to Grub2 and might be different to older
Grub versions.

Afterwards, you have to run update-grub again and it should work.

I hope I could help you with this!

Cheers,
Sebastian

harryxiyou

unread,
Jun 23, 2013, 3:12:54 AM6/23/13
to Sebastian Gebhard, Ganeti Users list, Lance Albertson, Michele Tartara, Guido Trotter, Constantinos Venetsanopoulos
This is because i set the dom0 with the memory of 2048 which is
beyond the scope. When i set 1024M instead, it starts up.

>
>> Afterwards, you have to run update-grub again and it should work.
>>
>> I hope I could help you with this!
>>
>
> Thanks very much, i think the key points with me is when i init cluster
> with `gnt-cluster init cluster1`, the logs would be as follows.
>
> [...]
> 2013-06-22 05:18:16,862: gnt-cluster init pid=1549 INFO run with
> arguments 'cluster1'
> 2013-06-22 05:18:20,162: gnt-cluster init pid=1549 ERROR RPC error in
> version from node node1.example.com: Connection failed (111:
> Connection refused)
> [...]
>
> Have you ever happened to this condition? Thanks ;-)
>

The problem has been solved a little and the latest conditions as follows.

root@node1:~# gnt-cluster init cluster1
===============================
The logs:

2013-06-23 03:02:57,492: gnt-cluster init pid=3406 INFO run with
arguments 'cluster1'
2013-06-23 03:03:01,014: gnt-cluster init pid=3406 ERROR RPC error in
version from node node1.example.com: Connection failed (111:
Connection refused)

There is still something wrong here :-/



root@node1:~# gnt-node list
Node DTotal DFree MTotal MNode MFree Pinst Sinst
node1.example.com 222.5G 222.5G 2.0G 1019M 992M 0 0
=============================================================
The logs: 2013-06-23 03:03:31,272: gnt-node list pid=3522 INFO run
with no arguments



root@node1:~# gnt-instance add -t plain -n node1.example.com -s 10G -o
debootstrap+default inst1.example.com
Sun Jun 23 03:08:48 2013 * creating instance disks...
Sun Jun 23 03:08:48 2013 adding instance inst1.example.com to cluster config
Sun Jun 23 03:08:49 2013 - INFO: Waiting for instance
inst1.example.com to sync disks.
Sun Jun 23 03:08:49 2013 - INFO: Instance inst1.example.com's disks
are in sync.
Sun Jun 23 03:08:49 2013 * running the instance OS create scripts...
Failure: command execution error:
Could not add os for instance inst1.example.com on node
node1.example.com: OS create script failed (exited with exit code 1),
last lines in the log file:

sfdisk: ERROR: sector 0 does not have an msdos signature
/dev/xenvg/435c5457-dc14-4330-a309-c4a8c99a5cf2.disk0: unrecognized
partition table type
Old situation:
No partitions found
New situation:
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End #cyls #blocks Id System
/dev/xenvg/435c5457-dc14-4330-a309-c4a8c99a5cf2.disk0p1 * 0+
1304 1305- 10482412 83 Linux
/dev/xenvg/435c5457-dc14-4330-a309-c4a8c99a5cf2.disk0p2 0
- 0 0 0 Empty
/dev/xenvg/435c5457-dc14-4330-a309-c4a8c99a5cf2.disk0p3 0
- 0 0 0 Empty
/dev/xenvg/435c5457-dc14-4330-a309-c4a8c99a5cf2.disk0p4 0
- 0 0 0 Empty
Successfully wrote the new partition table

Re-reading the partition table ...
BLKRRPART: Invalid argument

I: Retrieving Release
E: Failed getting release file
http://ftp.us.debian.org/debian/dists/lenny/Release

===================================================================
The logs:

2013-06-23 03:08:44,237: gnt-instance add pid=3584 INFO run with
arguments '-t plain -n node1.example.com -s 10G -o debootstrap+default
inst1.example.com'
2013-06-23 03:09:33,688: gnt-instance add pid=3584 ERROR Error during
command processing
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1676, in GenericMain
result = func(options, args)
File "/usr/sbin/gnt-instance", line 347, in AddInstance
return GenericInstanceCreate(constants.INSTANCE_CREATE, opts, args)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1820, in
GenericInstanceCreate
SubmitOrSend(op, opts)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1505, in SubmitOrSend
return SubmitOpCode(op, cl=cl, feedback_fn=feedback_fn, opts=opts)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1482, in SubmitOpCode
op_results = PollJob(job_id, cl=cl, feedback_fn=feedback_fn)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1464, in PollJob
return GenericPollJob(job_id, _LuxiJobPollCb(cl), reporter)
File "/usr/lib/pymodules/python2.6/ganeti/cli.py", line 1299, in
GenericPollJob
errors.MaybeRaise(msg)
File "/usr/lib/pymodules/python2.6/ganeti/errors.py", line 396, in MaybeRaise
raise err_class, tuple(result[1])
OpExecError: Could not add os for instance inst1.example.com on node
node1.example.com: OS create script failed (exited with exit code 1),
last lines in the log file:

sfdisk: ERROR: sector 0 does not have an msdos signature
/dev/xenvg/435c5457-dc14-4330-a309-c4a8c99a5cf2.disk0: unrecognized
partition table type
Old situation:
No partitions found
New situation:
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End #cyls #blocks Id System
/dev/xenvg/435c5457-dc14-4330-a309-c4a8c99a5cf2.disk0p1 * 0+
1304 1305- 10482412 83 Linux
/dev/xenvg/435c5457-dc14-4330-a309-c4a8c99a5cf2.disk0p2 0
- 0 0 0 Empty
/dev/xenvg/435c5457-dc14-4330-a309-c4a8c99a5cf2.disk0p3 0
- 0 0 0 Empty
/dev/xenvg/435c5457-dc14-4330-a309-c4a8c99a5cf2.disk0p4 0
- 0 0 0 Empty
Successfully wrote the new partition table

Re-reading the partition table ...
BLKRRPART: Invalid argument

I: Retrieving Release
E: Failed getting release file
http://ftp.us.debian.org/debian/dists/lenny/Release



Any comments? Thanks very much ;-)

Michele Tartara

unread,
Jun 24, 2013, 3:36:31 AM6/24/13
to harryxiyou, Ganeti Users list, Lance Albertson, Guido Trotter, Constantinos Venetsanopoulos
As I replied in the other email already, you should the valid address (a FQDN, not just the IP) of the node you are trying to connect to. And I strongly suspect node1.example.com is not valid ;-)

Best,
Michele

harryxiyou

unread,
Jun 24, 2013, 11:59:20 AM6/24/13
to Michele Tartara, Ganeti Users list, Lance Albertson, Guido Trotter, Constantinos Venetsanopoulos
On Mon, Jun 24, 2013 at 3:36 PM, Michele Tartara <mtar...@google.com> wrote:
[...]
> As I replied in the other email already, you should the valid address (a
> FQDN, not just the IP) of the node you are trying to connect to.

My /etc/hosts are as follows:

# cat /etc/hosts
127.0.0.1 localhost
192.168.1.102 cluster1.example.com cluster1
192.168.1.105 inst1.example.com inst1
192.168.1.100 node1.example.com node1

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

And i installed Ganeti by the way here (I wrote)
https://code.google.com/p/gsocxy/wiki/InstallGaneti

> And I strongly suspect node1.example.com is not valid ;-)

Yeah, i think so. But i am not clear about how to fix it well ;-)

Would you please have a look at my installing Ganeti way here
https://code.google.com/p/gsocxy/wiki/InstallGaneti
and then give me some suggestions to fix it well.

Thanks very much ;-)

Michele Tartara

unread,
Jun 24, 2013, 12:46:00 PM6/24/13
to harryxiyou, Ganeti Users list, Lance Albertson, Guido Trotter, Constantinos Venetsanopoulos
First of all, a general suggestion: NEVER actually use example.com in a real system. It's reserved by RFC 2606 for use in documentation, and it just means that it should be substituted by some other value.
So, when you define your configuration, you can use any random (preferably non-existing elsewhere) name, but it's better if you don't use example.com

Then, your IP assignment seems to be going to give you headaches when you start adding more nodes, because pairing numbers and names is going to be quite difficult.
I'd configure the actual name resolution as follows (note that I used "harrycluster.com" as the name, given that it's not existing on the public internet --so, no conflicts-- and you can just configure your /etc/hosts files to make it known to all your machines):

192.168.1.1 cluster1.harrycluster.com cluster1
192.168.1.101 node1.harrycluster.com node1
192.168.1.102 node2.harrycluster.com node2
192.168.1.103 node3.harrycluster.com node3
192.168.1.201 instance1.harrycluster.com instance1
192.168.1.202 instance2.harrycluster.com instance2
192.168.1.203 instance3.harrycluster.com instance3

Then, another small suggestion. Right after creating the cluster (gnt-cluster init) try running:
gnt-cluster verify
to check that everything is in order.

Finally, try using a different OS creation script when creating the instance. I'd suggest you to try with:
-o busybox
that is the simplest one available and see if that one works.

Best,
Michele

harryxiyou

unread,
Jun 24, 2013, 11:58:43 PM6/24/13
to Michele Tartara, Ganeti Users list, Lance Albertson, Guido Trotter, Constantinos Venetsanopoulos
On Tue, Jun 25, 2013 at 12:46 AM, Michele Tartara <mtar...@google.com> wrote:
[...]
> First of all, a general suggestion: NEVER actually use example.com in a real
> system. It's reserved by RFC 2606 for use in documentation, and it just
> means that it should be substituted by some other value.
> So, when you define your configuration, you can use any random (preferably
> non-existing elsewhere) name, but it's better if you don't use example.com.

Yeah, i will not use example.com, thanks ;-)

>
> Then, your IP assignment seems to be going to give you headaches when you
> start adding more nodes, because pairing numbers and names is going to be
> quite difficult.

Yeah, i think so.

> I'd configure the actual name resolution as follows (note that I used
> "harrycluster.com" as the name, given that it's not existing on the public
> internet --so, no conflicts-- and you can just configure your /etc/hosts
> files to make it known to all your machines):
>
> 192.168.1.1 cluster1.harrycluster.com cluster1
> 192.168.1.101 node1.harrycluster.com node1
> 192.168.1.102 node2.harrycluster.com node2
> 192.168.1.103 node3.harrycluster.com node3
> 192.168.1.201 instance1.harrycluster.com instance1
> 192.168.1.202 instance2.harrycluster.com instance2
> 192.168.1.203 instance3.harrycluster.com instance3
>

Yeah, harrycluster.com is very good ;-)

> Then, another small suggestion. Right after creating the cluster
> (gnt-cluster init) try running:
> gnt-cluster verify
> to check that everything is in order.
>

Yeah, this is essential for us, thanks.

> Finally, try using a different OS creation script when creating the
> instance. I'd suggest you to try with:
> -o busybox
> that is the simplest one available and see if that one works.
>

Okay, let me have a try according to above suggestions ;-)

Thanks very much and best wishes to you ;-)

harryxiyou

unread,
Jun 25, 2013, 9:20:26 PM6/25/13
to Michele Tartara, Ganeti Users list, Lance Albertson, Guido Trotter, Constantinos Venetsanopoulos
On Tue, Jun 25, 2013 at 12:46 AM, Michele Tartara <mtar...@google.com> wrote:
[...]
I have corrected my /etc/hosts and hostname with above config. And
then I init ganeti cluster again but it still tells me the same error as
follows.

root@node1:~# gnt-cluster init cluster1

Logs:
2013-06-25 21:14:16,946: gnt-cluster init pid=2333 INFO run with
arguments 'cluster1'
2013-06-25 21:14:22,386: gnt-cluster init pid=2333 ERROR RPC error in
version from node node1.harrycluster.com: Connection failed (111:
Connection refused)

root@node1:~# gnt-cluster verify
Tue Jun 25 21:18:22 2013 * Verifying global settings
Tue Jun 25 21:18:22 2013 * Gathering data (1 nodes)
Tue Jun 25 21:18:23 2013 * Verifying node status
Tue Jun 25 21:18:23 2013 * Verifying instance status
Tue Jun 25 21:18:23 2013 * Verifying orphan volumes
Tue Jun 25 21:18:23 2013 * Verifying orphan instances
Tue Jun 25 21:18:23 2013 * Verifying N+1 Memory redundancy
Tue Jun 25 21:18:23 2013 * Other Notes
Tue Jun 25 21:18:23 2013 * Hooks Results

Logs:
2013-06-25 21:18:22,541: gnt-cluster verify pid=2551 INFO run with no arguments

root@node1:~# gnt-node list
Node DTotal DFree MTotal MNode MFree Pinst Sinst
node1.harrycluster.com 222.5G 222.5G 2.0G 1.4G 592M 0 0

Logs:
2013-06-25 21:18:46,063: gnt-node list pid=2597 INFO run with no arguments

Michele Tartara

unread,
Jun 26, 2013, 3:56:45 AM6/26/13
to harryxiyou, Ganeti Users list, Lance Albertson, Guido Trotter, Constantinos Venetsanopoulos
Looks like it is not able to connect to node1 to ask for its version number.
What's the IP address of the machine you are the "gnt-cluster init" command on? 

It should be the address of node1.
 

root@node1:~# gnt-cluster verify
Tue Jun 25 21:18:22 2013 * Verifying global settings
Tue Jun 25 21:18:22 2013 * Gathering data (1 nodes)
Tue Jun 25 21:18:23 2013 * Verifying node status
Tue Jun 25 21:18:23 2013 * Verifying instance status
Tue Jun 25 21:18:23 2013 * Verifying orphan volumes
Tue Jun 25 21:18:23 2013 * Verifying orphan instances
Tue Jun 25 21:18:23 2013 * Verifying N+1 Memory redundancy
Tue Jun 25 21:18:23 2013 * Other Notes
Tue Jun 25 21:18:23 2013 * Hooks Results

Logs:
2013-06-25 21:18:22,541: gnt-cluster verify pid=2551 INFO run with no arguments

root@node1:~# gnt-node list
Node                   DTotal  DFree MTotal MNode MFree Pinst Sinst
node1.harrycluster.com 222.5G 222.5G   2.0G  1.4G  592M     0     0

Logs:
2013-06-25 21:18:46,063: gnt-node list pid=2597 INFO run with no arguments



--
Thanks
Weiwei  Jia (Harry Wei)

Cheers,
Michele

harryxiyou

unread,
Jun 26, 2013, 4:25:41 AM6/26/13
to Ganeti Users list, Guido Trotter, Lance Albertson, Constantinos Venetsanopoulos, Michele Tartara
Sorry, forget to reply all.

---------- Forwarded message ----------
From: harryxiyou <harry...@gmail.com>
Date: Wed, Jun 26, 2013 at 4:24 PM
Subject: Re: Add a instance for Ganeti error.
To: Michele Tartara <mtar...@google.com>


On Wed, Jun 26, 2013 at 3:56 PM, Michele Tartara <mtar...@google.com> wrote:
[...]

> Looks like it is not able to connect to node1 to ask for its version number.

Yeah, i think so. But i do not know the problems :-/

> What's the IP address of the machine you are the "gnt-cluster init" command
> on?
>
> It should be the address of node1.
>

Yeah, of course, it is the node1 and my operations are as follows.

root@node1:~# ifconfig
eth0 Link encap:Ethernet HWaddr 00:24:1d:b5:31:f0
inet6 addr: fe80::224:1dff:feb5:31f0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:158180 errors:0 dropped:0 overruns:0 frame:0
TX packets:3376 errors:0 dropped:0 overruns:0 carrier:1
collisions:0 txqueuelen:1000
RX bytes:21467127 (20.4 MiB) TX bytes:444798 (434.3 KiB)
Interrupt:238

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:2331 errors:0 dropped:0 overruns:0 frame:0
TX packets:2331 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:578470 (564.9 KiB) TX bytes:578470 (564.9 KiB)

xen-br0 Link encap:Ethernet HWaddr 00:24:1d:b5:31:f0
inet addr:192.168.1.106 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::224:1dff:feb5:31f0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2204 errors:0 dropped:0 overruns:0 frame:0
TX packets:498 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:242620 (236.9 KiB) TX bytes:72565 (70.8 KiB)

root@node1:~# /etc/init.d/ganeti restart
Restarting Ganeti cluster:ganeti-confd...done.
ganeti-rapi...done.
ganeti-masterd...done.
ganeti-noded...done.
Missing configuration file /var/lib/ganeti/server.pem
Incomplete configuration, will not run. ... (warning).

root@node1:~# gnt-cluster init cluster1

root@node1:~# cat /etc/hosts
127.0.0.1 localhost
192.168.1.1 cluster1.harrycluster.com cluster1
192.168.1.105 inst1.harrycluster.com inst1

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.1.106 node1.harrycluster.com node1

root@node1:~# gnt-cluster verify
Wed Jun 26 04:21:55 2013 * Verifying global settings
Wed Jun 26 04:21:55 2013 * Gathering data (1 nodes)
Wed Jun 26 04:21:57 2013 * Verifying node status
Wed Jun 26 04:21:57 2013 * Verifying instance status
Wed Jun 26 04:21:57 2013 * Verifying orphan volumes
Wed Jun 26 04:21:57 2013 * Verifying orphan instances
Wed Jun 26 04:21:57 2013 * Verifying N+1 Memory redundancy
Wed Jun 26 04:21:57 2013 * Other Notes
Wed Jun 26 04:21:57 2013 * Hooks Results



The logs of Ganeti are as follows:

2013-06-26 04:20:34,863: gnt-cluster init pid=6036 INFO run with
arguments 'cluster1'
2013-06-26 04:20:40,297: gnt-cluster init pid=6036 ERROR RPC error in
version from node node1.harrycluster.com: Connection failed (111:
Connection refused)
2013-06-26 04:21:55,966: gnt-cluster verify pid=6170 INFO run with no arguments


I wonder why it cannot connect to node1?

Thanks very much ;-)

Michele Tartara

unread,
Jun 26, 2013, 4:31:33 AM6/26/13
to harryxiyou, Ganeti Users list, Guido Trotter, Lance Albertson, Constantinos Venetsanopoulos
Are the master daemon and the node daemon running on the node after "gnt-cluster init"? Are they listening on the proper network ports as indicated by the documentation? Do you have any firewall that might block the connetions?

Best,
Michele

harryxiyou

unread,
Jun 26, 2013, 5:45:44 AM6/26/13
to Michele Tartara, Ganeti Users list, Guido Trotter, Lance Albertson, Constantinos Venetsanopoulos
On Wed, Jun 26, 2013 at 4:31 PM, Michele Tartara <mtar...@google.com> wrote:
[...]
> Are the master daemon and the node daemon running on the node after
> "gnt-cluster init"?

The daemons are as follows:

root@node1:~# ps aux | grep ganeti
root 6060 0.0 1.1 51460 16672 ? SL 04:20 0:00
/usr/bin/python /usr/sbin/ganeti-noded
root 6090 0.0 0.9 535904 14532 ? Sl 04:20 0:01
/usr/bin/python /usr/sbin/ganeti-masterd
root 6150 0.0 0.7 50636 10876 ? S 04:20 0:00
/usr/bin/python /usr/sbin/ganeti-rapi
root 6272 0.0 0.6 44936 8980 ? S 04:25 0:00
/usr/bin/python /usr/sbin/ganeti-confd
root 6771 0.0 0.0 7548 848 pts/1 S+ 05:42 0:00 grep ganeti

>Are they listening on the proper network ports as
> indicated by the documentation?

I have not found these proper network ports by our documents.


> Do you have any firewall that might block
> the connetions?
>

How to check it?

Michele Tartara

unread,
Jun 26, 2013, 6:49:22 AM6/26/13
to harryxiyou, Ganeti Users list, Guido Trotter, Lance Albertson, Constantinos Venetsanopoulos
On Wed, Jun 26, 2013 at 11:45 AM, harryxiyou <harry...@gmail.com> wrote:
On Wed, Jun 26, 2013 at 4:31 PM, Michele Tartara <mtar...@google.com> wrote:
[...]
> Are the master daemon and the node daemon running on the node after
> "gnt-cluster init"?

The daemons are as follows:

root@node1:~# ps aux | grep ganeti
root      6060  0.0  1.1  51460 16672 ?        SL   04:20   0:00
/usr/bin/python /usr/sbin/ganeti-noded
root      6090  0.0  0.9 535904 14532 ?        Sl   04:20   0:01
/usr/bin/python /usr/sbin/ganeti-masterd
root      6150  0.0  0.7  50636 10876 ?        S    04:20   0:00
/usr/bin/python /usr/sbin/ganeti-rapi
root      6272  0.0  0.6  44936  8980 ?        S    04:25   0:00
/usr/bin/python /usr/sbin/ganeti-confd
root      6771  0.0  0.0   7548   848 pts/1    S+   05:42   0:00 grep ganeti


Ok, this seems to be correct.
 
>Are they listening on the proper network ports as
> indicated by the documentation?

I have not found these proper network ports by our documents.

For example:
"The ganeti-noded daemon listens to port 1811 TCP, on all interfaces, by default."

Executing
netstat -lnp
should tell you what process is listening on what port.

> Do you have any firewall that might block
> the connetions?
>

How to check it?


Well... you should know if you configured a firewall on your machines.
Most likely, you can find out with:
iptables -L

Cheers,
Michele

harryxiyou

unread,
Jun 26, 2013, 10:51:36 AM6/26/13
to Michele Tartara, Ganeti Users list, Guido Trotter, Lance Albertson, Constantinos Venetsanopoulos
root@node1:~# netstat -lnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program name
tcp 0 0 0.0.0.0:50037 0.0.0.0:*
LISTEN 840/rpc.statd
tcp 0 0 0.0.0.0:22 0.0.0.0:*
LISTEN 8945/sshd
tcp 0 0 0.0.0.0:5080 0.0.0.0:*
LISTEN 12265/python
tcp 0 0 127.0.0.1:25 0.0.0.0:*
LISTEN 1359/exim4
tcp 0 0 0.0.0.0:8002 0.0.0.0:*
LISTEN 1371/python2.5
tcp 0 0 0.0.0.0:111 0.0.0.0:*
LISTEN 828/portmap
tcp 0 0 0.0.0.0:1811 0.0.0.0:*
LISTEN 12176/python
tcp6 0 0 :::22 :::*
LISTEN 8945/sshd
tcp6 0 0 ::1:25 :::*
LISTEN 1359/exim4
udp 0 0 0.0.0.0:40853 0.0.0.0:*
840/rpc.statd
udp 0 0 0.0.0.0:1814 0.0.0.0:*
12363/python
udp 0 0 0.0.0.0:111 0.0.0.0:*
828/portmap
udp 0 0 0.0.0.0:1016 0.0.0.0:*
840/rpc.statd
Active UNIX domain sockets (only servers)
Proto RefCnt Flags Type State I-Node PID/Program
name Path
unix 2 [ ACC ] STREAM LISTENING 4017
1099/dbus-daemon /var/run/dbus/system_bus_socket
unix 2 [ ACC ] STREAM LISTENING 3915
1015/xenstored /var/run/xenstored/socket
unix 2 [ ACC ] STREAM LISTENING 3916
1015/xenstored /var/run/xenstored/socket_ro
unix 2 [ ACC ] STREAM LISTENING 4370
1371/python2.5 /var/run/xend/xen-api.sock
unix 2 [ ACC ] STREAM LISTENING 4373
1371/python2.5 /var/run/xend/xmlrpc.sock
unix 2 [ ACC ] STREAM LISTENING 26754
12206/python /var/run/ganeti/socket/ganeti-master


root@node1:~# netstat -lnp | grep ganeti
unix 2 [ ACC ] STREAM LISTENING 26754
12206/python /var/run/ganeti/socket/ganeti-master


>
>> > Do you have any firewall that might block
>> > the connetions?
>> >
>>
>> How to check it?
>>
>
> Well... you should know if you configured a firewall on your machines.
> Most likely, you can find out with:
> iptables -L
>

root@node1:~# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination


Now, i am suspecting if there are something wrong with my SSH or SSL
configs. I just find the similar topic here
https://groups.google.com/forum/#!topic/ganeti/KYrLoKK6jfI
And Guido told us how to config them here
https://code.google.com/p/ganeti/wiki/ClusterKeysReplacement

And i have not configured my ssh key. I just run 'mkdir ~root/.ssh'
command before init cluster operation.

Michele Tartara

unread,
Jun 26, 2013, 11:08:05 AM6/26/13
to harryxiyou, Ganeti Users list, Guido Trotter, Lance Albertson, Constantinos Venetsanopoulos
According to the output you sent, the node daemon looks up, and no firewall should be interfering with what you want to do. 

Now, i am suspecting if there are something wrong with my SSH or SSL
configs. I just find the similar topic here
https://groups.google.com/forum/#!topic/ganeti/KYrLoKK6jfI
And Guido told us how to config them here
https://code.google.com/p/ganeti/wiki/ClusterKeysReplacement

And i have not configured my ssh key. I just run 'mkdir ~root/.ssh'
command before init cluster operation.

This can definitely be a valid explanation. You need a working SSH in order for Ganeti to be able to work.

Cheers,
Michele

harryxiyou

unread,
Jun 26, 2013, 8:18:20 PM6/26/13
to Michele Tartara, Ganeti Users list, Guido Trotter, Lance Albertson, Constantinos Venetsanopoulos
On Wed, Jun 26, 2013 at 11:08 PM, Michele Tartara <mtar...@google.com> wrote:
[...]
> This can definitely be a valid explanation. You need a working SSH in order
> for Ganeti to be able to work.
>

Okay, let me configure SSH by
https://code.google.com/p/ganeti/wiki/ClusterKeysReplacement

harryxiyou

unread,
Jun 26, 2013, 8:56:11 PM6/26/13
to Michele Tartara, Ganeti Users list, Guido Trotter, Lance Albertson, Constantinos Venetsanopoulos
On Thu, Jun 27, 2013 at 8:18 AM, harryxiyou <harry...@gmail.com> wrote:
> On Wed, Jun 26, 2013 at 11:08 PM, Michele Tartara <mtar...@google.com> wrote:
> [...]
>> This can definitely be a valid explanation. You need a working SSH in order
>> for Ganeti to be able to work.
>>

I have configured SSH and SSL as follows:

root@node1:~# rm /etc/ssh/ssh_host_*

root@node1:~# dpkg-reconfigure openssh-server
Creating SSH2 RSA key; this may take some time ...
Creating SSH2 DSA key; this may take some time ...
Restarting OpenBSD Secure Shell server: sshd.

root@node1:~# mkdir ~/.ssh

root@node1:~# cd /root/.ssh/

root@node1:~/.ssh# ssh-keygen -t rsa -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
88:b7:a2:b8:09:0d:ce:a1:b0:c3:d1:3b:04:48:75:1d ro...@node1.harrycluster.com
The key's randomart image is:
+--[ RSA 2048]----+
| ... ..E. |
|o . . |
|.. |
| o . . |
|oo o. o S |
|*++ .. . |
|=+.o. . |
|.+ ... |
|+.. |
+-----------------+

root@node1:~/.ssh# ssh-copy-id -i /root/.ssh/id_rsa node1
The authenticity of host 'node1 (192.168.1.106)' can't be established.
RSA key fingerprint is da:d4:64:a5:3c:52:7e:68:b7:6c:e2:65:94:59:49:0a.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node1,192.168.1.106' (RSA) to the list of
known hosts.
root@node1's password:
Now try logging into the machine, with "ssh 'node1'", and check in:

.ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.


Init Ganeti cluster
==============
root@node1:~/.ssh# /etc/init.d/ganeti restart
Restarting Ganeti cluster:ganeti-confd...done.
ganeti-rapi...done.
ganeti-masterd...done.
ganeti-noded...done.
Missing configuration file /var/lib/ganeti/server.pem
Incomplete configuration, will not run. ... (warning).

root@node1:~/.ssh# gnt-cluster init cluster1

root@node1:~/.ssh# gnt-cluster verify
Wed Jun 26 20:53:40 2013 * Verifying global settings
Wed Jun 26 20:53:40 2013 * Gathering data (1 nodes)
Wed Jun 26 20:53:41 2013 * Verifying node status
Wed Jun 26 20:53:41 2013 * Verifying instance status
Wed Jun 26 20:53:41 2013 * Verifying orphan volumes
Wed Jun 26 20:53:41 2013 * Verifying orphan instances
Wed Jun 26 20:53:41 2013 * Verifying N+1 Memory redundancy
Wed Jun 26 20:53:41 2013 * Other Notes
Wed Jun 26 20:53:41 2013 * Hooks Results


Logs of Ganeti
===========

2013-06-26 20:53:11,465: gnt-cluster init pid=16702 INFO run with
arguments 'cluster1'
2013-06-26 20:53:16,967: gnt-cluster init pid=16702 ERROR RPC error in
version from node node1.harrycluster.com: Connection failed (111:
Connection refused)
2013-06-26 20:53:40,606: gnt-cluster verify pid=16819 INFO run with no arguments


There is still this Error. I wonder if I have set the incorrect SSH & SSL
config?
Reply all
Reply to author
Forward
0 new messages