Fwd: NYSERNET-IG: 1 nodes failed to boot

4 views
Skip to first unread message

Ezra Kissel

unread,
May 20, 2015, 9:20:07 PM5/20/15
to protoge...@googlegroups.com
I've been getting this error repeatedly from the Nysernet AM. What's
odd is that sometimes I can ssh to the VM and it has attributes
(hostname, files in /tmp) of another one of my VMs in the same slice.
With this most recent attempt I can't login at all but pcvm2-4 asks me
for a password prompt.

The AM doesn't report any error so I'm at a loss as to what's going on.

thanks,
- ezra


-------- Forwarded Message --------
Subject: NYSERNET-IG: 1 nodes failed to boot
Resent-Date: Wed, 20 May 2015 20:13:02 -0400
Resent-From: ezki...@indiana.edu
Date: Thu, 21 May 2015 00:12:58 +0000
From: testb...@ops.instageni.nysernet.org
<testb...@ops.instageni.nysernet.org>
To: Kissel, Ezra D <ezki...@indiana.edu>
CC: testb...@ops.instageni.nysernet.org
<testb...@ops.instageni.nysernet.org>

Nodes:
[Node: pcvm2-4]
in urn:publicid:IDN+ch.geni.net:idms+slice+idms-g22-stitch failed.








Ezra Kissel

unread,
May 20, 2015, 9:21:50 PM5/20/15
to protoge...@googlegroups.com
Sorry, meant to say attributes of a VM in *another* active slice at the
same AM, not the same slice.

Leigh Stoller

unread,
May 21, 2015, 9:26:15 AM5/21/15
to Ezra Kissel, protoge...@googlegroups.com
> Sorry, meant to say attributes of a VM in *another* active slice 
> at the same AM, not the same slice.

Hi, can you let us know when this situation has happened so that
I can look at it while it is still live.

Thanks
Leigh


Ezra Kissel

unread,
May 21, 2015, 9:30:42 AM5/21/15
to protoge...@googlegroups.com
On 5/21/2015 9:26 AM, Leigh Stoller wrote:
>> Sorry, meant to say attributes of a VM in *another* active slice
>> at the same AM, not the same slice.
>
> Hi, can you let us know when this situation has happened so that
> I can look at it while it is still live.
>

The sliver is still active.

sliver_id="urn:publicid:IDN+instageni.nysernet.org+sliver+29694"

pcvm2-4.instageni.nysernet.org

Leigh Stoller

unread,
May 21, 2015, 10:11:28 AM5/21/15
to Ezra Kissel, protoge...@googlegroups.com
Can you tear that slice down please and hold. Also, I don’t expect any
of the ION links are going to work?

Leigh


Ezra Kissel

unread,
May 21, 2015, 10:34:59 AM5/21/15
to Leigh Stoller, protoge...@googlegroups.com
Tearing down now. There shouldn't be any ION links. There were in the
previous slivers in that slice. I have been using the same slice name
for a long time so perhaps some sliver state has been sticking around?

Leigh Stoller

unread,
May 21, 2015, 10:48:52 AM5/21/15
to Ezra Kissel, protoge...@googlegroups.com
> Tearing down now. There shouldn't be any ION links. There were 
> in the previous slivers in that slice. I have been using the same slice
> name for a long time so perhaps some sliver state has been sticking
> around?

Okay, go ahead and set it up again, and if it is wrong, please
tell me specifically what is wrong. Thanks!

Leigh 


Ezra Kissel

unread,
May 21, 2015, 11:19:29 AM5/21/15
to protoge...@googlegroups.com
On 5/21/2015 10:48 AM, Leigh Stoller wrote:
>> Tearing down now. There shouldn't be any ION links. There were
>> in the previous slivers in that slice. I have been using the same slice
>> name for a long time so perhaps some sliver state has been sticking
>> around?
>
> Okay, go ahead and set it up again, and if it is wrong, please
> tell me specifically what is wrong. Thanks!
>

Looking good now. Node came up with the right properties and
configuration.

thanks,
- ezra

Leigh Stoller

unread,
May 21, 2015, 11:28:02 AM5/21/15
to Ezra Kissel, protoge...@googlegroups.com
> Looking good now. Node came up with the right properties and 
> configuration.

Okay, I don’t really know what went wrong, but if you see it again,
leave it in place and send us a description of what is incorrect.

Thanks!
Leigh


Ezra Kissel

unread,
Jun 1, 2015, 10:39:57 AM6/1/15
to Leigh Stoller, protoge...@googlegroups.com
On 5/21/2015 11:27 AM, Leigh Stoller wrote:
>> Looking good now. Node came up with the right properties and
>> configuration.
>
> Okay, I don’t really know what went wrong, but if you see it again,
> leave it in place and send us a description of what is incorrect.
>

Something similar is happening again. I created two nodes at Nysernet
using an existing slice name. One node came up, the other node prompts
me for a password and the startup scripts don't appear to have executed.
Sliverstatus shows both VMs as "ready."

{
"geni_client_id": "ibp-101-2",
"pg_status": "ready",
"geni_urn": "urn:publicid:IDN+instageni.nysernet.org+sliver+30024",
"geni_error": "",
"geni_status": "ready"
}

pcvm3-1.instageni.nysernet.org

Leaving it as-is for now.

thanks,
- ezra

Leigh Stoller

unread,
Jun 1, 2015, 12:16:49 PM6/1/15
to Ezra Kissel, protoge...@googlegroups.com
> Something similar is happening again. I created two nodes at Nysernet using an existing slice name. One node came up, the other node prompts me for a password and the startup scripts don't appear to have executed. Sliverstatus shows both VMs as "ready."

Hi, please extend it for a few days, I might not have time to look
at it for a day or two.

Leigh





Ezra Kissel

unread,
Jun 1, 2015, 3:20:48 PM6/1/15
to Leigh Stoller, protoge...@googlegroups.com
On 6/1/2015 12:17 PM, Leigh Stoller wrote:
>> Something similar is happening again. I created two nodes at Nysernet using an existing slice name. One node came up, the other node prompts me for a password and the startup scripts don't appear to have executed. Sliverstatus shows both VMs as "ready."
>
> Hi, please extend it for a few days, I might not have time to look
> at it for a day or two.
>

Done. So there's the login/startup failure I mentioned (previous sliver
I mentioned), but also the more frequent node boot failure messages I've
been getting from a number of AMs. Most frequently from nysernet, but
just today from utahddc and gpo as well. An example of the boot failure
case:

{
"geni_client_id": "ig-nyser",
"pg_status": "changing",
"geni_urn": "urn:publicid:IDN+instageni.nysernet.org+sliver+30080",
"geni_error": "",
"geni_status": "changing"
},

pcvm2-5.instageni.nysernet.org

That was also in a new slice "idms-g23-stitch", which I thought might
help instead of re-using the same old slice names. Is there any way for
me to determine *why* these are failing?

thanks,
- ezra

Leigh Stoller

unread,
Jun 1, 2015, 8:39:23 PM6/1/15
to Ezra Kissel, protoge...@googlegroups.com
> Done. So there's the login/startup failure I mentioned (previous sliver I
> mentioned), but also the more frequent node boot failure messages I've
> been getting from a number of AMs. Most frequently from nysernet, but
> just today from utahddc and gpo as well. An example of the boot failure
> case:

Well, I worked on this most of today, and made no progress. Current VMs
with public IPs are not working, and the problem is in the XEN dom0 on both
pc2 and pc3; some iptables rules are being ignored, even after I shut them
down, made sure everything was flushed, and then restarted them.

Made worse is that I cannot login on the console to your image since there
is no getty running on the serial console. I was hoping I might discover
something in the image, but can't do that without a shell.

Stumped.

Leigh

Nicholas Bastin

unread,
Jun 1, 2015, 8:44:57 PM6/1/15
to protoge...@googlegroups.com, Ezra Kissel
On Mon, Jun 1, 2015 at 5:40 PM, Leigh Stoller <lbst...@gmail.com> wrote:
Made worse is that I cannot login on the console to your image since there
is no getty running on the serial console. I was hoping I might discover
something in the image, but can't do that without a shell.

If you're up for a bit of work, you can add a VNC console to get the local framebuffer, and you can add a user for login by monkeying around with the filesystem on Dom0 (you can mount the LVM volume loopback and then manipulate it directly).

Of course in that case you could also configure getty on ttyS0, so that's another option.

--
Nick 

Leigh Stoller

unread,
Jun 1, 2015, 9:38:05 PM6/1/15
to protoge...@googlegroups.com, Ezra Kissel
> If you're up for a bit of work, you can add a VNC console to get the local framebuffer, and you can add a user for login by monkeying around with the filesystem on Dom0 (you can mount the LVM volume loopback and then manipulate it directly).

I just rebooted the physical node, problem solved. So that narrows
it down to something on the XEN host, but no idea what. This particular
node had had only 115 VMs since last reboot. The other problem node
has had about 1200. That disparity is irksome.

Leigh





Ezra Kissel

unread,
Jun 2, 2015, 10:37:40 AM6/2/15
to protoge...@googlegroups.com
On 6/1/2015 9:38 PM, Leigh Stoller wrote:
>> If you're up for a bit of work, you can add a VNC console to get the local framebuffer, and you can add a user for login by monkeying around with the filesystem on Dom0 (you can mount the LVM volume loopback and then manipulate it directly).
>
> I just rebooted the physical node, problem solved. So that narrows
> it down to something on the XEN host, but no idea what. This particular
> node had had only 115 VMs since last reboot. The other problem node
> has had about 1200. That disparity is irksome.
>

FYI - I recreated my stitching rspec and didn't get a node boot failure
at Nysernet for once. However, a node at utahddc failed this time.

{
"geni_client_id": "ig-utah",
"pg_status": "changing",
"geni_urn": "urn:publicid:IDN+utahddc.geniracks.net+sliver+47652",
"geni_error": "",
"geni_status": "changing"
},

pcvm7-2.utahddc.geniracks.net

Leigh Stoller

unread,
Jun 2, 2015, 10:40:28 AM6/2/15
to protoge...@googlegroups.com, Ezra Kissel
> I just rebooted the physical node, problem solved. So that narrows
> it down to something on the XEN host, but no idea what. This particular
> node had had only 115 VMs since last reboot. The other problem node
> has had about 1200. That disparity is irksome.

And pc3 has been rebooted too. The iptables rules were scrogged in
some way, I could not delete rules that needed to be deleted. Maybe
another XEN problem, hard to say.

Leigh





Reply all
Reply to author
Forward
0 new messages