"gcloud compute ssh" to other VM fails randomly in startup scripts

739 views
Skip to first unread message

Yan Li

unread,
Jun 29, 2019, 10:24:14 AM6/29/19
to gce-discussion
Hi!

I've a GCE VM instance that, in its startup script, uses "gcloud compute ssh" to connect to other VM instances. I have granted its service account access to other VM instances.

The service account seems always automatically enabled, so I don't use any "gcloud auth login" in my startup script. Most of those ssh went fine, but sometimes the ssh would fail with this error message:

ERROR: (gcloud.compute.ssh) You do not currently have an active account selected.
Please run:
  $ gcloud auth login
to obtain new credentials, or if you have already logged in with a
different account:
  $ gcloud config set account ACCOUNT
to select an already authenticated account to use.

This error could happen after several successful ssh to the same other VM instance. So it seems that the service account could be disabled somehow in the middle of my startup script?

Not sure if this matters but my startup script is long running (several hours) and continuously uses ssh to control other VM instances.

Thanks! Any suggestions are welcome!

--
Yan

James (Google Cloud Platform Support)

unread,
Jul 3, 2019, 4:54:11 PM7/3/19
to gce-discussion

Hello Yan,


This is probably due to the frequency of logins that are being performed or stale ssh keys but it would help me understand the issue better if I had more context on what your scripts are doing. I also have a few more questions:


What is the OS you are running? I ask because we have seen these types of issues occur with Ubuntu based images that come with features like SSH Guard and disabling SSH guard through a startup script is what we recommend. Sample script is below:


#! /bin/bash 


sudo apt-get remove --auto-remove sshguard 

sudo apt-get purge --auto-remove sshguard


Also, is there any chance you are rotating SSH keys frequently? This would explain the abruptness you have observed where this just suddenly stopped working. The SSH keys might have been rotated during the period your test was running.


You can check this by navigating to  Compute Engine > Metadata in your GCP console then click ‘SSH Keys’ and check for any expiry dates on keys. If they are short lived keys, switching to more persistent keys that do not expire is what we recommend in these cases.


We also have a script you can use to debug SSH issues[1] all you need to do is follow the steps on that page to get some logging around what can be causing the issue. 


Hope that helps Yan.


[1] https://github.com/GoogleCloudPlatform/compute-ssh-diagnostic-sh


Reply all
Reply to author
Forward
0 new messages