gcloud compute ssh was unable to confirm ID of the instance

390 views
Skip to first unread message

Georgi Sotirov

unread,
Jan 25, 2022, 4:40:05 AM1/25/22
to gce-discussion
Hello,

I'm getting a strange error trying to connect from one VM to another (in different subnets of default VPC) with gcloud compute ssh command executed from a bash script.

ERROR: (gcloud.compute.ssh) Established connection with host 1.2.3.4 but was unable to confirm ID of the instance.
This may be due to network connectivity issues. Please check your network settings, and the status of the service you are trying to reach.

The command line is:

gcloud --project=my-project compute ssh --internal-ip '--ssh-flag=-o ConnectTimeout=5' '--ssh-flag=-o Port=22' --zone=europe-west1-c my-vm -- 'sudo sh -c "/usr/local/bin/test"'

And if I run this command line on the terminal it works fine, but from the script I get the error above. What could be the reason for this error? I wasn't able to find any information on the matter. I also tried the command generated with --dry-run and it works fine as well.


Regards,
--
Georgi

Pedro Moreno

unread,
Jan 25, 2022, 6:04:31 AM1/25/22
to gce-discussion


This issue is related with the verification of the instanceID of the server, this could be produced due to: 

1. DNS resolution of metadata.google.internal fails, because /etc/resolv.conf or Windows DNS server settings have been modified.

2. HTTP/HTTPS proxy configuration on the VM exist and the http test connection ends up going somewhere else than the metadata server.

A workaround may be to configure the check to not run this test (verify_internal_ip):

``` $ gcloud config set ssh/verify_internal_ip False ```

Then retry the SSH connection, investigate which is the root cause ( DNS or proxy) that was preventing the connection, look for a fix and revert the ```ssh/verify_internal_ip``` setting it to ```True```.

Georgi Sotirov

unread,
Jan 25, 2022, 7:57:12 AM1/25/22
to gce-discussion
Thank you for your answer.
  1. Yes, it was failing, because (for reasons outside this topic) we use modified /etc/resolv.conf using Google's and CloudFlare's public DNS IPs. I restored the file with 169.254.169.254, so metadata.google.internal could be resolved properly, but the problem persisted.
  2. There is no HTTP/HTTPS proxy configured on the VM trying to connect. The metadata server is accessible. After setting verify_internal_ip to false connection worked printing out the following warning:

    WARNING: Skipping internal IP verification connection and connecting to [1.2.3.4] in the current subnet. This may be the wrong host if the instance is in a different subnet!

    Why it this? The VM from which I'm trying to connect is in default subnet of default VPC and the VM to which I'm trying to connect in a custom subnet of default VPC (e.g. subnet1). Subnetwork IP address ranges cannot overlap as far as I'm aware.
I think it is worth mentioning also that I do not have problem SSH connecting from the same VM to another VM in another custom subnet of default VPC (e.g. subnet2) with the command line provided before.


Regards,
--
Georgi

Georgi Sotirov

unread,
Jan 25, 2022, 8:59:37 AM1/25/22
to gce-discussion
In fact how does gcloud verifies internal IPs (or ID of the instance)? This could help me debug further the problem and understand why there is problem with VMs from one subnet and no problem with VMs from another subnet when both subnets are reachable and SSH connection is possible with ssh command.

osaeed

unread,
Feb 16, 2022, 3:58:59 PM2/16/22
to gce-discussion

Hello Georgi,

As the issue persists after you applied the previous workaround suggestion, this indicates the DNS resolution of metadata.google.internal still fails, as you are not using any proxy in your environment. You can try the following steps:

  • Try to create a new VM on the same subnet where the source VM resides and try to establish an SSH connection to the target VM. 

  • Also try to create a new VM on the same subnet where the targeted VM resides and try to connect from the source VM, and also try to connect from the new VM residing on the source VM subnet.

These steps will narrow down the issue if the issue might be related to the /etc/resolv.conf not being restored successfully.

Also, As this is a technical question I would recommend that you open a case on Stack Overflow[1] or Server Fault[2] as they are more appropriate for these kind of technical questions[3] 

Also you can open a case with GCP support[4], as this might require more in depth investigation to help identify and resolve the issue. 


[1]https://stackoverflow.com/

[2]https://serverfault.com/

[3]https://cloud.google.com/support/docs/community#technical_questions_stack_exchange_stack_overflow_and_server_fault

[4]https://cloud.google.com/support/docs/manage-cases

Georgi Sotirov

unread,
Feb 28, 2022, 3:50:22 AM2/28/22
to gce-discussion
As I explained before the problem was only with gcloud command, not ssh (the command generated by --dry-run works just fine), so I wrote here. Anyway, I already implemented the suggested workaround by setting verify_internal_ip and in future we'll most probably use gcloud --dry-run only to generate the ssh command to run instead.
Reply all
Reply to author
Forward
0 new messages