Metadata service unreachable for certain instances

754 views
Skip to first unread message

Rob Vermaas

unread,
Jul 28, 2014, 11:13:29 AM7/28/14
to gce-dis...@googlegroups.com
Hi,

we are experiencing a weird issue related to the metadata service. 

About 1 in 5 instances are not reachable because the instance cannot retrieve the ssh public key from the metadata service. This seems to be different for different zones, us-central1-b gave us 1 in 5 failures, while on europe-west1-b we experience it in about 1 in 40 instances.

While debugging using an image that also includes a hardcoded SSH key, we see the following:

 * we can ping the link-local IP where the metadata service is supposed to be running
 * DNS seems to work
 * port 80 is unreachable, so any call to the metadata service fails unfortunately.
 * rebooting doesn't solve the issue

Has anybody experienced any such issue? I feel like I'm at a dead-end.



Cheers,
Rob

==========

Outputs of commands used to determine the above facts:

$ ping 169.254.169.254
PING 169.254.169.254 (169.254.169.254) 56(84) bytes of data.
64 bytes from 169.254.169.254: icmp_seq=1 ttl=255 time=0.562 ms
64 bytes from 169.254.169.254: icmp_seq=2 ttl=255 time=0.492 ms


; <<>> DiG 9.9.5-W1 <<>> @169.254.169.254 www.google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21168
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;www.google.com.                        IN      A

;; ANSWER SECTION:
www.google.com.         299     IN      A       74.125.70.106
www.google.com.         299     IN      A       74.125.70.147
www.google.com.         299     IN      A       74.125.70.99
www.google.com.         299     IN      A       74.125.70.105
www.google.com.         299     IN      A       74.125.70.104
www.google.com.         299     IN      A       74.125.70.103

;; Query time: 4 msec
;; SERVER: 169.254.169.254#53(169.254.169.254)
;; WHEN: Mon Jul 28 17:08:22 CEST 2014
;; MSG SIZE  rcvd: 139

  nmap  169.254.169.254

Starting Nmap 6.40 ( http://nmap.org ) at 2014-07-28 17:10 CEST
Nmap scan report for metadata.google.internal (169.254.169.254)
Host is up (0.00013s latency).
Not shown: 999 closed ports
PORT   STATE SERVICE
53/tcp open  domain

Nmap done: 1 IP address (1 host up) scanned in 2.43 seconds

  curl -v http://metadata
* Rebuilt URL to: http://metadata/
* Hostname was NOT found in DNS cache
*   Trying 169.254.169.254...
* connect to 169.254.169.254 port 80 failed: Connection refused
* Failed to connect to metadata port 80: Connection refused
* Closing connection 0
curl: (7) Failed to connect to metadata port 80: Connection refused

David Newgas

unread,
Jul 28, 2014, 6:19:31 PM7/28/14
to Rob Vermaas, gce-dis...@googlegroups.com
Hi Rob,

It seems that port scanning the metadata server makes it somewhat unhappy.  As you have observed rebooting does not fix this. In the future I recommend avoiding doing this.

Metadata service may recover with time, but if you don't want to wait you can try re-creating your instance (terminate, preserving the disk, then re-create with the old disk). I strongly recommend making a snapshot as a backup before doing this.

Yours,
David



--
© 2014 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-dis...@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to the Google Groups "gce-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussio...@googlegroups.com.
To post to this group, send email to gce-dis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/77803999-2018-4d25-8057-551a9fc8c0e6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Newgas

unread,
Jul 28, 2014, 6:30:17 PM7/28/14
to Rob Vermaas, gce-dis...@googlegroups.com
Hi Rob,

Re-reading your email it sounds like you might be experiencing the issue before running nmap, in which case my comment might not be very helpful.

Could you let me know your project ID so that I can look into your case in more depth? (Reply in a private email if you are concerned about disclosing your project on a public list).

Yours,
David

David Newgas

unread,
Jul 28, 2014, 6:45:28 PM7/28/14
to Rob Vermaas, gce-dis...@googlegroups.com
In fact, please send the details to gc-...@google.com, and we will follow up from there.

Thanks,
David
Reply all
Reply to author
Forward
0 new messages