gcloud/gcutil service account authentication for use with salt-cloud inside GCE instance?

1,239 views
Skip to first unread message

Janne Enberg

unread,
Aug 28, 2014, 10:33:58 AM8/28/14
to gce-dis...@googlegroups.com
I'm trying to set up a Salt Stack master with salt-cloud on GCE, to be able to manage the infrastructure completely with salt-cloud.

I was trying to follow the instructions as per e.g. https://github.com/GoogleCloudPlatform/compute-video-demo-salt and it all seems fine, except for the fairly simplified section:
"Create a Compute Engine SSH key and upload it to the metadata server. The easist way to do this is to use the gcutil command-line utility and try to SSH from the machine back into itself."

Why do I say it is simplified? Because that thing doesn't seem to be written with "best practices" in mind, and the tools don't seem to work as advertised.

First time I try and set up google cloud sdk tools on the GCE instance, I run ```gcloud auth login``` as I was told it tells me that if I want to authenticate on a GCE instance, I should use service accounts instead of my private account for security (the "best practices" that I mentioned) or whatnuts, which makes sense to me.

So I find out how and create a service account, get the email address and P12 file for it, put that file on the server and run, as per instructions (replacing <> -tokens with real values):
gcloud auth activate-service-account --project <my-project-id> '<service...@developer.gserviceaccount.com>' --key-file </path/to/p12-file>

The output for this is just:
"Activated service account credentials for <service...@developer.gserviceaccount.com>"

Now I'm apparently supposed to be authenticated, so I try SSH-ing back to the machine itself, as instructed:
# gcutil ssh salt
You are not currently logged in. To authenticate, run
 $ gcloud auth login

Running ```gcloud auth login``` again seems like it's trying to get me to authenticate using my personal account, which is absolutely not what I was told to do.

```gcloud auth list``` seems to confirm that the service account I'm trying to get to work is active

There also seems to be some other issues with this, since trying to run e.g. ```gcloud compute instances list``` gives me an error like this:
# gcloud compute instances list
Traceback (most recent call last):
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/gcloud/gcloud.py", line 150, in <module>
    main()
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/gcloud/gcloud.py", line 146, in main
    _cli.Execute()
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/calliope/cli.py", line 431, in Execute
    post_run_hooks=self.__post_run_hooks, kwargs=kwargs)
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/calliope/frontend.py", line 274, in _Execute
    pre_run_hooks=pre_run_hooks, post_run_hooks=post_run_hooks)
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/calliope/backend.py", line 885, in Run
    output_formatter(result)
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/calliope/backend.py", line 870, in OutputFormatter
    command_instance.Display(args, obj)
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py", line 267, in Display
    PrintTable(resources, self._resource_spec.table_cols)
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py", line 45, in PrintTable
    for resource in resources:
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py", line 252, in Run
    for item in items:
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/lister.py", line 105, in ProcessResults
    for resource in resources:
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/lister.py", line 89, in _ConvertProtobufsToDicts
    for resource in resources:
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py", line 179, in FilterResults
    for item in items:
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/lister.py", line 44, in List
    batch_url=batch_url)
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/batch_helper.py", line 30, in MakeRequests
    responses = batch.Execute(http)
  File "/opt/google-cloud-sdk/./lib/googlecloudapis/apitools/base/py/batch.py", line 180, in Execute
    batch_http_request.Execute(http)
  File "/opt/google-cloud-sdk/./lib/googlecloudapis/apitools/base/py/batch.py", line 426, in Execute
    self._Execute(http)
  File "/opt/google-cloud-sdk/./lib/googlecloudapis/apitools/base/py/batch.py", line 389, in _Execute
    response = http_wrapper.MakeRequest(http, request)
  File "/opt/google-cloud-sdk/./lib/googlecloudapis/apitools/base/py/http_wrapper.py", line 140, in MakeRequest
    redirections=redirections, connection_type=connection_type)
  File "/opt/google-cloud-sdk/./lib/oauth2client/util.py", line 132, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/opt/google-cloud-sdk/./lib/oauth2client/client.py", line 475, in new_request
    self._refresh(request_orig)
  File "/opt/google-cloud-sdk/./lib/oauth2client/client.py", line 663, in _refresh
    self._do_refresh_request(http_request)
  File "/opt/google-cloud-sdk/./lib/oauth2client/client.py", line 677, in _do_refresh_request
    body = self._generate_refresh_request_body()
  File "/opt/google-cloud-sdk/./lib/oauth2client/client.py", line 861, in _generate_refresh_request_body
    assertion = self._generate_assertion()
  File "/opt/google-cloud-sdk/./lib/oauth2client/client.py", line 977, in _generate_assertion
    private_key, self.private_key_password), payload)
  File "/opt/google-cloud-sdk/./lib/oauth2client/crypt.py", line 294, in make_signed_jwt
    signature = signer.sign(signing_input)
  File "/opt/google-cloud-sdk/./lib/oauth2client/crypt.py", line 112, in sign
    return crypto.sign(self._key, message, 'sha256')
AttributeError: 'module' object has no attribute 'sign'


Also if I try and run salt-cloud, I get an interesting error:
# salt-cloud -P -m /etc/salt/cloud.maps.d/staging.map
[INFO    ] salt-cloud starting
[INFO    ] Applying map from '/etc/salt/cloud.maps.d/staging.map'.
The following virtual machines are set to be created:
  gw-1
  salt
  web-1

Proceed? [N/y] y
... proceeding
[INFO    ] Calculating dependencies for gw-1
[INFO    ] Calculating dependencies for salt
[INFO    ] Calculating dependencies for web-1
[INFO    ] Since parallel deployment is in use, ssh console output is disabled. All ssh output will be logged though
[INFO    ] Cloud pool size: 3
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.6/multiprocessing/pool.py", line 259, in _handle_results
    task = get()
TypeError: ('__init__() takes exactly 2 arguments (1 given)', <class 'libcloud.common.google.GoogleAuthError'>, ())



It really looks like it's an authentication error as with gcutil ssh, but I doubt it should be giving errors like that .. After that error salt-cloud is totally stuck and I need to CTRL+Z and killall -9 salt-cloud to get rid of it..


Now what am I supposed to do here? The gcloud itself tells me not to use my personal account on the GCE instance (though it stopped saying that at some point for some reason), yet activating a service account (really no good documentation exists on how to do this, so I've had to "wing it") does not seem to work at all.




Eric Johnson

unread,
Aug 28, 2014, 12:10:52 PM8/28/14
to gce-dis...@googlegroups.com
Hi Janne,

Thank you for reporting this and taking the time to write up the details of your (poor) experience.  I think there are two issues here and I'll work on getting the guide cleaned up.  The main issue is that the docs are outdated with respect to changes that have been made with 'gcloud' and 'gcutil' which I think I can get addressed by updating the procedure.  Your last traceback will take a bit more digging but I hope to find/fix the issue as I dig into this further.

Could I ask you to reply with the output of 'salt --versions-report' and 'pip freeze' so I can make sure I'm able to reproduce the errors?

I'll report back shortly with some suggestions to get you unblocked.

Kind regards,
-erjohnso


On Thursday, August 28, 2014 7:33:58 AM UTC-7, Janne Enberg wrote:
I'm trying to set up a Salt Stack master with salt-cloud on GCE, to be able to manage the infrastructure completely with salt-cloud.

I was trying to follow the instructions as per e.g. https://github.com/GoogleCloudPlatform/compute-video-demo-salt and it all seems fine, except for the fairly simplified section:
"Create a Compute Engine SSH key and upload it to the metadata server. The easist way to do this is to use the gcutil command-line utility and try to SSH from the machine back into itself."

Why do I say it is simplified? Because that thing doesn't seem to be written with "best practices" in mind, and the tools don't seem to work as advertised.

First time I try and set up google cloud sdk tools on the GCE instance, I run ```gcloud auth login``` as I was told it tells me that if I want to authenticate on a GCE instance, I should use service accounts instead of my private account for security (the "best practices" that I mentioned) or whatnuts, which makes sense to me.

So I find out how and create a service account, get the email address and P12 file for it, put that file on the server and run, as per instructions (replacing <> -tokens with real values):
gcloud auth activate-service-account --project <my-project-id> '<service-account@developer.gserviceaccount.com>' --key-file </path/to/p12-file>

The output for this is just:
"Activated service account credentials for <service-account@developer.gserviceaccount.com>"

Janne Enberg

unread,
Aug 28, 2014, 2:47:47 PM8/28/14
to gce-dis...@googlegroups.com


On Thursday, August 28, 2014 7:10:52 PM UTC+3, Eric Johnson wrote:
Hi Janne,

Thank you for reporting this and taking the time to write up the details of your (poor) experience.  I think there are two issues here and I'll work on getting the guide cleaned up.  The main issue is that the docs are outdated with respect to changes that have been made with 'gcloud' and 'gcutil' which I think I can get addressed by updating the procedure.  Your last traceback will take a bit more digging but I hope to find/fix the issue as I dig into this further.

Could I ask you to reply with the output of 'salt --versions-report' and 'pip freeze' so I can make sure I'm able to reproduce the errors?

I'll report back shortly with some suggestions to get you unblocked.

Kind regards,
-erjohnso

On Thursday, August 28, 2014 7:33:58 AM UTC-7, Janne Enberg wrote:
I'm trying to set up a Salt Stack master with salt-cloud on GCE, to be able to manage the infrastructure completely with salt-cloud.

I was trying to follow the instructions as per e.g. https://github.com/GoogleCloudPlatform/compute-video-demo-salt and it all seems fine, except for the fairly simplified section:
"Create a Compute Engine SSH key and upload it to the metadata server. The easist way to do this is to use the gcutil command-line utility and try to SSH from the machine back into itself."

Why do I say it is simplified? Because that thing doesn't seem to be written with "best practices" in mind, and the tools don't seem to work as advertised.

First time I try and set up google cloud sdk tools on the GCE instance, I run ```gcloud auth login``` as I was told it tells me that if I want to authenticate on a GCE instance, I should use service accounts instead of my private account for security (the "best practices" that I mentioned) or whatnuts, which makes sense to me.

So I find out how and create a service account, get the email address and P12 file for it, put that file on the server and run, as per instructions (replacing <> -tokens with real values):
gcloud auth activate-service-account --project <my-project-id> '<service...@developer.gserviceaccount.com>' --key-file </path/to/p12-file>

The output for this is just:
"Activated service account credentials for <service...@developer.gserviceaccount.com>"


Hi,

Sure, here are the outputs:

# salt --versions-report
           Salt: 2014.1.7
         Python: 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
         Jinja2: 2.2.1
       M2Crypto: 0.20.2
 msgpack-python: 0.1.13
   msgpack-pure: Not Installed
       pycrypto: 2.0.1
         PyYAML: 3.10
          PyZMQ: 2.2.0.1
            ZMQ: 3.2.4

 # pip freeze
Babel==0.9.4
Jinja2==2.2.1
M2Crypto==0.20.2
PyYAML==3.10
SocksiPy-branch==1.01
apache-libcloud==0.15.1
boto==2.30.0
cffi==0.8.6
crcmod==1.7
cryptography==0.5.4
distribute==0.6.10
ethtool==0.6
gcs-oauth2-boto-plugin==1.7
google-api-python-client==1.2
gsutil==4.5
httplib2==0.9
iniparse==0.3.1
iwlib==1.0
mercurial==1.4
msgpack-python==0.1.13
pyOpenSSL==0.14
pycparser==2.10
pycrypto==2.0.1
pycurl==7.19.0
pygpgme==0.1
python-gflags==2.0
pyzmq==2.2.0.1
retry-decorator==1.0.0
salt==2014.1.7
six==1.7.3
urlgrabber==3.9.1
yum-metadata-parser==1.1.2



- Janne

Eric Johnson

unread,
Aug 28, 2014, 7:15:31 PM8/28/14
to gce-dis...@googlegroups.com
Hi Janne,

Ok, I think I've cleaned up the process and have a branch at https://github.com/GoogleCloudPlatform/compute-video-demo-salt/tree/gcloud-update.  If you follow these steps and create a salt master running inside Compute Engine, have a Service Account and PKCS12 private key converted to PEM format, and enabled the 'compute' scope for the master, you should be able to walk through the demo end-to-end.

But! I did trip over a recent regression in libcloud called out at https://github.com/saltstack/salt/issues/14985.  The fix for this in libcloud is at https://github.com/apache/libcloud/pull/349.  So, for now, using the stable version of libcloud will not work with salt-cloud until this fixed is merged and a new libcloud is released.

Give a yell if you still find problems.  If it looks good to you, I'll merge in the change (and likely add a temporary warning about the libcloud bug).

Kind regards
-erjohnso

Janne Enberg

unread,
Aug 29, 2014, 10:12:10 AM8/29/14
to gce-dis...@googlegroups.com
I still find some issues with setting up my salt-cloud based on the guide... I managed to get it working perfectly once, after hours of battling on one VM and trying out everything .. I thought I had recorded the "relevant steps", but when I tried to reproduce the result on a fresh VM I can't get it to work again.

I've spent the last ~8 hours on this alone, you could say I'm a bit annoyed at all this atm.. ;)


Issues with Salt setup

Firstly, you are suggesting to install an old version of salt and then "patching" it .. why not just a) update instructions to a newer version (e.g. 2014.1.10, or 2014.7), b) install the latest stable?

I.e.
a) curl -L http://bootstrap.saltstack.org | bash -s -- -M -N git v2014.1.10
or
b) curl -L http://bootstrap.saltstack.org | bash -s -- -M -N stable

For me the reason for not doing the option b is actually a rather sad one. If you install the stable via packages, it will install python-crypto 2.0.1 package from CentOS repos and that thing just is not compatible with salt-cloud or something. I get errors like:
[ERROR   ] Failed to get the output of 'gce.avail_locations()': 'module' object has no attribute 'HAVE_DECL_MPZ_POWM_SEC'
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/salt/cloud/__init__.py", line 453, in location_list
    data[alias][driver] = self.clouds[fun]()
  File "/usr/lib/python2.6/site-packages/salt/cloud/libcloudfuncs.py", line 136, in avail_locations
    conn = get_conn()   # pylint: disable=E0602
  File "/usr/lib/python2.6/site-packages/salt/cloud/clouds/gce.py", line 168, in get_conn
    driver = get_driver(Provider.GCE)
  File "/usr/lib/python2.6/site-packages/libcloud/compute/providers.py", line 171, in get_driver
    return _get_provider_driver(DRIVERS, provider)
  File "/usr/lib/python2.6/site-packages/libcloud/utils/misc.py", line 44, in get_driver
    _mod = __import__(mod_name, globals(), locals(), [driver_name])
  File "/usr/lib/python2.6/site-packages/libcloud/compute/drivers/gce.py", line 24, in <module>
    from libcloud.common.google import GoogleResponse
  File "/usr/lib/python2.6/site-packages/libcloud/common/google.py", line 87, in <module>
    from Crypto.PublicKey import RSA
  File "/usr/lib64/python2.6/site-packages/Crypto/PublicKey/RSA.py", line 75, in <module>
    from Crypto.Util.number import getRandomRange, bytes_to_long, long_to_bytes
  File "/usr/lib64/python2.6/site-packages/Crypto/Util/number.py", line 56, in <module>
    if _fastmath is not None and not _fastmath.HAVE_DECL_MPZ_POWM_SEC:
AttributeError: 'module' object has no attribute 'HAVE_DECL_MPZ_POWM_SEC'

There seems to be similar bugs reported e.g. at:

Apparently (based on the google groups discussion) the opinion with Salt devs is that since CentOS is shipping that version, it's CentOS's fault if it doesn't work.

However, I get around that issue rather easily if I just install Salt from git and "pip install pycrypto" instead of "yum install python-crypto", so, the fix would be rather easy to integrate into the salt installer really.

Either way, what I ended up doing to install Salt was:
curl -L http://bootstrap.saltstack.org | bash -s -- -D -M -N git v2014.1.10

Also if I tried v2014.7, it complained it can't start the salt-api and salt-master services, but salt-master service was running fine and I found no real issues with it after those errors during install.


Google Cloud SDK Setup

The guide skips any mention of installing the Cloud SDK on the salt master, just assumes it is installed. I also *assume* I need to run gcloud auth authenticate-service-account here to get things working right.

After setting it up I can run "gcloud compute instances list", but trying "gcloud compute ssh salt --zone europe-west1-a" just gives me permission denied for a bunch of times and then tells me to try again later because "Your SSH key has not propagated to your instance yet". No matter how long I wait, it will never work.

Also it doesn't seem to matter if the gcloud compute ssh -command works, since I could get salt-cloud working just fine without ssh working, so maybe it's better to use "gcloud compute instances list" for the test command too?


Other things

Then eventually it looks like I could get things working if I make sure I install apache-libcloud with the patches you mentioned using:

In addition to all this mess, GitHub is being an ass and giving me "400 Bad Request" errors ~10-30% of the time when doing curl requests to bootstrap.saltstack.org which can be really frustrating ;)


My current work in progress "recipe"

So this is the chain of commands I run on a fresh GCE machine with the CentOS 6 image and compute read+write permissions, that I would get everything working, but still doesn't quite get me there .. I still end up with errors that seem like libcloud compatibility issues..

I thought on the last machine these were fixed by installing libcloud from the fork's patched branch, but apparently that wasn't the only part of the puzzle and I lost some pieces while putting this together.

Either way, here's the steps I've got written down atm to get where I am with CentOS 6 setup:


# Enable EPEL repos

# Install all kinds of dependencies, not 100% sure of these anymore but these should at least be enough
yum -y install gcc python-devel python-pip python-setuptools openssl-devel libffi-devel

# Install salt, v2014.1.10 / v2014.7 both seem to work equally well/badly
curl -L http://bootstrap.saltstack.org | bash -s -- -M -N git v2014.1.10

# The above command still unfortunately installs python-crypto from CentOS repos that we'll need to replace
yum -y remove python-crypto
pip install pycrypto

# Install some more salt-cloud dependencies
pip install -U pyOpenSSL

# Install Google Cloud SDK tools under /opt and activate environment
curl https://sdk.cloud.google.com | CLOUDSDK_CORE_DISABLE_PROMPTS=1 PREFIX=/opt bash
echo '# The next line updates PATH for the Google Cloud SDK.
source '/opt/google-cloud-sdk/path.bash.inc'

# The next line enables bash completion for gcloud.
source '/opt/google-cloud-sdk/completion.bash.inc'

# Make sure cloud SDK will look for pyOpenSSL from system packages
export CLOUDSDK_PYTHON_SITEPACKAGES=1
' > /etc/profile.d/google-cloud-sdk.sh
source /etc/profile.d/google-cloud-sdk.sh

# Test that Google Cloud SDK access works
gcloud compute instances list


Now if I set up my /etc/salt configuration files (cloud, cloud.profiles, the .map, service account pem certificate ..) I should be able to just run salt-cloud -P -m /etc/salt/my.map .. right?

What I can do is run salt-cloud --list-locations gce, without problems, if I run salt-cloud -P -m /etc/salt/my.map, it will recognize which of the machines are running, so authentication etc. should be ok, but then I get the error:



The following exception was thrown by libcloud when trying to run the initial deployment:
{u'domain': u'global', u'message': u"The resource 'projects/my-test-project/zones/europe-west1-b/disks/gw-1' was not found", u'reason': u'notFound'}
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/salt/cloud/clouds/gce.py", line 1889, in create
    node_data = conn.create_node(**kwargs)  # pylint: disable=W0142
  File "/etc/salt/src/apache-libcloud/libcloud/compute/drivers/gce.py", line 1212, in create_node
    ex_disk_type=ex_disk_type)
  File "/etc/salt/src/apache-libcloud/libcloud/compute/drivers/gce.py", line 1492, in create_volume
    return self.ex_get_volume(name, location)
  File "/etc/salt/src/apache-libcloud/libcloud/compute/drivers/gce.py", line 2476, in ex_get_volume
    response = self.connection.request(request, method='GET').object
  File "/etc/salt/src/apache-libcloud/libcloud/common/google.py", line 593, in request
    *args, **kwargs)
  File "/etc/salt/src/apache-libcloud/libcloud/common/base.py", line 695, in request
    response = responseCls(**kwargs)
  File "/etc/salt/src/apache-libcloud/libcloud/common/base.py", line 118, in __init__
    self.object = self.parse_body()
  File "/etc/salt/src/apache-libcloud/libcloud/common/google.py", line 222, in parse_body
    raise ResourceNotFoundError(message, self.status, code)
ResourceNotFoundError: {u'domain': u'global', u'message': u"The resource 'projects/my-test-project/zones/europe-west1-b/disks/gw-1' was not found", u'reason': u'notFound'}
[ERROR   ] Caught Exception, terminating workers
TRACE: 'bool' object has no attribute 'pop'
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/salt/cloud/__init__.py", line 58, in _call
    ret = func(*args, **kw)
  File "/usr/lib/python2.6/site-packages/salt/cloud/__init__.py", line 2100, in create_multiprocessing
    output.pop('deploy_kwargs', None)
AttributeError: 'bool' object has no attribute 'pop'



Error: There was a query error: Exception caught
Caught Exception, terminating workers
TRACE: 'bool' object has no attribute 'pop'
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/salt/cloud/__init__.py", line 58, in _call
    ret = func(*args, **kw)
  File "/usr/lib/python2.6/site-packages/salt/cloud/__init__.py", line 2100, in create_multiprocessing
    output.pop('deploy_kwargs', None)
AttributeError: 'bool' object has no attribute 'pop'


This very much looks to me like an issue with libcloud?

Latest pip freeze:
Babel==0.9.4
Jinja2==2.2.1
M2Crypto==0.20.2
PyYAML==3.10
backports.ssl-match-hostname==3.4.0.2
cffi==0.8.6
chardet==2.0.1
cryptography==0.5.4
distribute==0.6.10
ethtool==0.6
iniparse==0.3.1
iwlib==1.0
msgpack-python==0.1.13
ordereddict==1.1
pyOpenSSL==0.14
pycparser==2.10
pycrypto==2.6.1
pycurl==7.19.0
pygpgme==0.1
pyzmq==2.2.0.1
requests==1.1.0
salt==2014.1.10
six==1.7.3
urlgrabber==3.9.1
urllib3==1.5
yum-metadata-parser==1.1.2



I feel I'm getting closer to cracking it, but my patience is wearing thin for the day .. any ideas are greatly appreciated.



- Janne

 

Eric Johnson

unread,
Aug 29, 2014, 2:31:47 PM8/29/14
to Janne Enberg, gce-dis...@googlegroups.com
I can definitely sympathize with your experience since I've had a few long stretches of time working through these things myself. I find it gets a bit easier over time but there's a dynamic element to the setup that makes it a bit of a challenge each time. :-(
 


Issues with Salt setup

Firstly, you are suggesting to install an old version of salt and then "patching" it .. why not just a) update instructions to a newer version (e.g. 2014.1.10, or 2014.7), b) install the latest stable?

With respect to our mutual experiences mentioned above, the reason why I wrote the instructions to lock it into a specific version is that you'll have a higher chance of success in replicating the set up and get the end-to-end demo to work.  All of these tools are rapidly evolving (salt, liblcoud, and Google) and that makes it especially hard to try to get it all working when using 'develop'.  Once 2014.7 has been released, it will include the most recent GCE salt-cloud features and I'll update the guide to exclude the separate manual step to copy that into the 2014.1 install.
Sorry that the instructions aren't as directly applicable to CentOS.  I intend for the demo walkthroughs to be a reference starting point, but not an all-inclusive set of instructions for running Salt with GCE, so YMMV obviously.

I haven't tried this, but I noticed that the salt_boostrap script has a -P arg for using pip install.  That seems to just be a fallback option though vs preferred, so it might not help out with this case.  There's also a -p for citing packages too that might help.  I'd suggest opening a github issue for salt_bootstrap and CentOS if you haven't already.  Given the easy fix you've found, I'm sure they'd be willing to merge that in.
 


Google Cloud SDK Setup

The guide skips any mention of installing the Cloud SDK on the salt master, just assumes it is installed. I also *assume* I need to run gcloud auth authenticate-service-account here to get things working right.

If you're using an "official" GCE image, gcloud should be installed.  I just tried with a fresh CentOS 6 instance with Service Account and 'compute' and 'storage' scopes set, that I could log into it via SSH and run the updated steps in the salt guide,

# After logging into the new instance, SSH back into itself and generate a new compute engine key (no passphrase)
$ gcloud compute ssh $(hostname -s) --zone europe-west1-b
[snip]
$ exit # log out, back into the first SSH session
$ gcloud compute instances list
[snip]

So, I didn't need to run 'gcloud auth', but you have to make sure you've created the instance with at least "compute read".
 

After setting it up I can run "gcloud compute instances list", but trying "gcloud compute ssh salt --zone europe-west1-a" just gives me permission denied for a bunch of times and then tells me to try again later because "Your SSH key has not propagated to your instance yet". No matter how long I wait, it will never work.

Also it doesn't seem to matter if the gcloud compute ssh -command works, since I could get salt-cloud working just fine without ssh working, so maybe it's better to use "gcloud compute instances list" for the test command too?

The reason why you want to "SSH into yourself" with gcloud compute on your salt master is so that it generates the SSH keys, uploads your public key to the GCE metadata service.  This way, when you create minions from your master and bootstrap them, your salt master's SSH access to the new instances should Just Work.
I ran into similar errors to the ones you saw, but I was able to clear them up by installing the patched version of libcloud.  That fix has been merged into libcloud 'trunk' now to FYI.

I should have time this evening to try all of this out on CentOS using your instructions.  I'll let you know how it goes and if I can come up with any other useful info.

 



- Janne

 

--
© 2014 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-dis...@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to a topic in the Google Groups "gce-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gce-discussion/urP5GY2gJH0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gce-discussio...@googlegroups.com.
To post to this group, send email to gce-dis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/f0d01edf-1bb0-4e07-b7e1-045e57b5c01a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Eric Johnson

unread,
Sep 2, 2014, 12:56:59 PM9/2/14
to Janne Enberg, gce-dis...@googlegroups.com
Hi Janne,

Sorry for the delay in trying to narrow down your last issue.

I was able to get a Salt Master up and running on GCE using CentOS.  I basically followed your steps verbatim and was able to reproduce the the error you posted in your last message.   The only thing I did to "fix" it was to uninstall apache-libcloud (at the version you specified, https://github.com/raphtheb/libcloud.git@9385b5f373c3ef66cd615aaaa8ff8f8112151c59) and then re-install using 'trunk' (since that metadata regression had been merged).


After doing that, I was able to run 'salt-cloud -P -y -m /etc/salt/demo.map' and it created the 4 Debian minions and bootstrapped them.  Note that I did *not* try CentOS minions/bootstrapping.

Hope that helps!
-erjohnso

Janne Enberg

unread,
Sep 5, 2014, 5:47:55 AM9/5/14
to gce-dis...@googlegroups.com

On Thursday, August 28, 2014 5:33:58 PM UTC+3, Janne Enberg wrote:
I'm trying to set up a Salt Stack master with salt-cloud on GCE, to be able to manage the infrastructure completely with salt-cloud.

I was trying to follow the instructions as per e.g. https://github.com/GoogleCloudPlatform/compute-video-demo-salt and it all seems fine, except for the fairly simplified section:
"Create a Compute Engine SSH key and upload it to the metadata server. The easist way to do this is to use the gcutil command-line utility and try to SSH from the machine back into itself."

Why do I say it is simplified? Because that thing doesn't seem to be written with "best practices" in mind, and the tools don't seem to work as advertised.

First time I try and set up google cloud sdk tools on the GCE instance, I run ```gcloud auth login``` as I was told it tells me that if I want to authenticate on a GCE instance, I should use service accounts instead of my private account for security (the "best practices" that I mentioned) or whatnuts, which makes sense to me.

So I find out how and create a service account, get the email address and P12 file for it, put that file on the server and run, as per instructions (replacing <> -tokens with real values):
gcloud auth activate-service-account --project <my-project-id> '<service-account@developer.gserviceaccount.com>' --key-file </path/to/p12-file>

The output for this is just:
"Activated service account credentials for <service-account@developer.gserviceaccount.com>"


Hi,

So I think I figured out what all the errors about the disks meant and what has been the core issue all this time..

It seems to be that all salt things are littered with horribad error messages, every time I've had the error listed above, saying stuff like:
{u'domain': u'global', u'message': u"The resource 'projects/my-test-project/zones/europe-west1-b/disks/gw-1' was not found", u'reason': u'notFound'}
...
AttributeError: 'bool' object has no attribute 'pop'

The issue has been that I've once successfully created a VM with that name already, and the disk image is left behind blocking creation of a new one with the same name, which totally confuses salt-cloud. I have gotten to that error quite a lot of times, meaning I've gotten a working build a lot of times, but the error messages were so bad that it was impossible to tell what the issue was.

The cause of the issue seems to be that I had probably deleted the VM that I had managed to successfully create without deleting the boot disk, as deleting the boot disk seems to be a checkbox unchecked by default in the GCE webui on machines created by salt-cloud, even with the delete_boot_pd: True -option. I guess this option only affects salt-cloud and it doesn't actually set the "delete boot disk when instance is deleted" -option for the VM.

When I create a VM manually from the webui with the "Delete boot disk when instance is deleted" -option, I don't need to worry about the checkbox when deleting the instance, thus I never even thought there would be such an option for me to look for. Just one time when I was deleting some of the VMs I had managed to create again I noticed that and then realized what the issue was..



Another thing that started causing me issues very fast was the amount of memory on my f1-micro instance, since apparently salt-master takes several hundred megs of run to run. To make sure things work, I switched to g1-small, maybe some swap on the f1-micro would work too, but not into that atm.



After these realizations, I've been able to easily get to the state where I can create new instances, but my CentOS minions just don't work out of the box, I assume this is because on the CentOS image sshd is set up (correctly) with "PermitRootLogin no", as I get this kind of messages in the salt-cloud output:
[INFO    ] Creating GCE instance web-1 in europe-west1-b
[INFO    ] Creating GCE instance vpn-1 in europe-west1-b
[INFO    ] Rendering deploy script: /usr/lib/python2.6/site-packages/salt/cloud/deploy/bootstrap-salt.sh
[INFO    ] Rendering deploy script: /usr/lib/python2.6/site-packages/salt/cloud/deploy/bootstrap-salt.sh
[ERROR   ] Authentication failed: status code 255
[ERROR   ] Failed to start Salt on Cloud VM vpn-1
[INFO    ] Created Cloud VM 'vpn-1'
[ERROR   ] Authentication failed: status code 255
[ERROR   ] Failed to start Salt on Cloud VM web-1
[INFO    ] Created Cloud VM 'web-1'

I tried to find some workarounds .. tried to set up salt to create a new SSH key for salt, configure it with ssh_key_file in main config and ssh_username in profile, then I added that to the project metadata's SSH keys, but for some reason those project-wide SSH keys fail to propagate to new GCE instances, at least when created via salt-cloud.

Tried to set up a startup-script metadata with a script that creates a salt user and ~/.ssh/authorized_keys for it, but the gcloud tool is broken and crashes if I try to do anything with metadata.

# gcloud compute project-info add-metadata --metadata-from-file startup-script=startup.sh
Traceback (most recent call last):
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/gcloud/gcloud.py", line 150, in <module>
    main()
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/gcloud/gcloud.py", line 146, in main
    _cli.Execute()
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/calliope/cli.py", line 431, in Execute
    post_run_hooks=self.__post_run_hooks, kwargs=kwargs)
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/calliope/frontend.py", line 274, in _Execute
    pre_run_hooks=pre_run_hooks, post_run_hooks=post_run_hooks)
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/calliope/backend.py", line 885, in Run
    output_formatter(result)
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/calliope/backend.py", line 870, in OutputFormatter
    command_instance.Display(args, obj)
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py", line 918, in Display
    list(resources)
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py", line 881, in Run
    new_object = self.Modify(args, objects[0])
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py", line 929, in Modify
    new_object = copy.deepcopy(existing)
  File "/usr/lib64/python2.6/copy.py", line 189, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/usr/lib64/python2.6/copy.py", line 338, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/lib64/python2.6/copy.py", line 162, in deepcopy
    y = copier(x, memo)
  File "/usr/lib64/python2.6/copy.py", line 255, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib64/python2.6/copy.py", line 162, in deepcopy
    y = copier(x, memo)
  File "/usr/lib64/python2.6/copy.py", line 255, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib64/python2.6/copy.py", line 189, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/usr/lib64/python2.6/copy.py", line 338, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/lib64/python2.6/copy.py", line 162, in deepcopy
    y = copier(x, memo)
  File "/usr/lib64/python2.6/copy.py", line 255, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib64/python2.6/copy.py", line 162, in deepcopy
    y = copier(x, memo)
  File "/usr/lib64/python2.6/copy.py", line 255, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib64/python2.6/copy.py", line 189, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/usr/lib64/python2.6/copy.py", line 329, in _reconstruct
    y.append(item)
  File "/opt/google-cloud-sdk/./lib/protorpc/messages.py", line 1087, in append
    self.__field.validate_element(value)
AttributeError: 'FieldList' object has no attribute '_FieldList__field'

If I try to create one via the webui, it's not a textarea but a single text input and everything ends up on one line, pasting the script in it makes it all end up on one line .. I changed the input to a textarea via my chrome dev tools, but submitting the multilne text via that seems to freak out the webui totally and it just clears the value. Confirmed these via curl -H 'Metadata-Flavor: Google' metadata/computeMetadata/v1/project/attributes/startup-script .. it is not just the webui showing it wrong.

Then I tried to pass the bash script through a quick tr '\n' ';' and pasted that to the project startup-script metadata input, and told salt-cloud to create the instances.

I get the same errors:
[ERROR   ] Authentication failed: status code 255
[ERROR   ] Failed to start Salt on Cloud VM vpn-1
[INFO    ] Created Cloud VM 'vpn-1'
[ERROR   ] Authentication failed: status code 255
[ERROR   ] Failed to start Salt on Cloud VM web-1
[INFO    ] Created Cloud VM 'web-1'

But after the failure I can SSH in via ssh -i /etc/salt/ssh_key.pem (same SSH key as defined in my main config) salt@vpn-1 (same SSH username as in my profiles)..

I'm again running out if ideas here, been also trying to ask around on #salt and #gcloud @Freenode but that hasn't been very fruitful so far.

Also now google groups started messing with me sending me "We're writing to let you know that the group you tried to contact (gce-discussion) may not exist, or you may not have permission to post messages to the group. A few more details on why you weren't able to post:" -emails when trying to reply via email.. gah


- Janne


 

Eric Johnson

unread,
Sep 5, 2014, 11:00:01 AM9/5/14
to Janne Enberg, Janne Enberg, gce-dis...@googlegroups.com
On Fri, Sep 5, 2014 at 2:43 AM, Janne Enberg <janne....@lietu.net> wrote:
Hi,

So I think I figured out what all the errors about the disks meant and what has been the core issue all this time..

It seems to be that all salt things are littered with horribad error messages, every time I've had the error listed above, saying stuff like:
{u'domain': u'global', u'message': u"The resource 'projects/my-test-project/zones/europe-west1-b/disks/gw-1' was not found", u'reason': u'notFound'}
...
AttributeError: 'bool' object has no attribute 'pop'

The issue has been that I've once successfully created a VM with that name already, and the disk image is left behind blocking creation of a new one with the same name, which totally confuses salt-cloud. I have gotten to that error quite a lot of times, meaning I've gotten a working build a lot of times, but the error messages were so bad that it was impossible to tell what the issue was.

The cause of the issue seems to be that I had probably deleted the VM that I had managed to successfully create without deleting the boot disk, as deleting the boot disk seems to be a checkbox unchecked by default in the GCE webui on machines created by salt-cloud, even with the delete_boot_pd: True -option. I guess this option only affects salt-cloud and it doesn't actually set the "delete boot disk when instance is deleted" -option for the VM.

When I create a VM manually from the webui with the "Delete boot disk when instance is deleted" -option, I don't need to worry about the checkbox when deleting the instance, thus I never even thought there would be such an option for me to look for. Just one time when I was deleting some of the VMs I had managed to create again I noticed that and then realized what the issue was..


Hi Janne,

Ah, yes, I should've thought to head off some of these issues with re-using minion names. You'll likely also need to purge the minion keys on your master too if you create/delete minions with the same names (see 'salt-key').

For the disks, back when salt-cloud first got support for GCE, there was no option to set an 'autoDelete' flag on a disk as a GCE feature.  So, the workaround at the time was to bake that functionality into salt-cloud and use the "delete_boot_pd" attribute and GCE metadata to record the user-preference.  IIRC, the default is to retain the boot disk, but if you set "delete_boot_pd" to True, salt-cloud stuffs the name of your 'profile' into GCE metadata [code reference] and then when you use salt-cloud to destroy the instance, it checks to see if you'd also like the disk destroyed.

salt-cloud has not been updated to use GCE's disk 'autoDelete' but libcloud has made some progress to support it.  So I imagine you've tripped over some issues with mixing/matching GCE's disk autoDelete and salt-cloud's disk "auto delete".  I'd suggest for now to just use salt-cloud for create/destroy and set boot_delete_pd for your minions.
 

Another thing that started causing me issues very fast was the amount of memory on my f1-micro instance, since apparently salt-master takes several hundred megs of run to run. To make sure things work, I switched to g1-small, maybe some swap on the f1-micro would work too, but not into that atm.



After these realizations, I've been able to easily get to the state where I can create new instances, but my CentOS minions just don't work out of the box, I assume this is because on the CentOS image sshd is set up (correctly) with "PermitRootLogin no", as I get this kind of messages in the salt-cloud output:
[INFO    ] Creating GCE instance web-1 in europe-west1-b
[INFO    ] Creating GCE instance vpn-1 in europe-west1-b
[INFO    ] Rendering deploy script: /usr/lib/python2.6/site-packages/salt/cloud/deploy/bootstrap-salt.sh
[INFO    ] Rendering deploy script: /usr/lib/python2.6/site-packages/salt/cloud/deploy/bootstrap-salt.sh
[ERROR   ] Authentication failed: status code 255
[ERROR   ] Failed to start Salt on Cloud VM vpn-1
[INFO    ] Created Cloud VM 'vpn-1'
[ERROR   ] Authentication failed: status code 255
[ERROR   ] Failed to start Salt on Cloud VM web-1
[INFO    ] Created Cloud VM 'web-1'

I tried to find some workarounds .. tried to set up salt to create a new SSH key for salt, configure it with ssh_key_file in main config and ssh_username in profile, then I added that to the project metadata's SSH keys, but for some reason those project-wide SSH keys fail to propagate to new GCE instances, at least when created via salt-cloud.

I have the most success with this when I'm logged in as root on my salt master and then use 'gcloud compute ssh salt --zone ZONE' to SSH back into the salt master (see item #5 under the Software section of the walkthrough).  This will generate the SSH key pair and auto-upload the public key to the GCE metadata service.  The keys will then be propagated to all of your minions (assuming they're running the Google daemons) and placed in the minion's /root/.ssh/authorized_keys file.  Then your new minions can be bootstrapped via SSH.
 

Tried to set up a startup-script metadata with a script that creates a salt user and ~/.ssh/authorized_keys for it, but the gcloud tool is broken and crashes if I try to do anything with metadata.

# gcloud compute project-info add-metadata --metadata-from-file startup-script=startup.sh
Traceback (most recent call last):
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/gcloud/gcloud.py", line 150, in <module>
    main()
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/gcloud/gcloud.py", line 146, in main
    _cli.Execute()
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/calliope/cli.py", line 431, in Execute
    post_run_hooks=self.__post_run_hooks, kwargs=kwargs)
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/calliope/frontend.py", line 274, in _Execute
    pre_run_hooks=pre_run_hooks, post_run_hooks=post_run_hooks)
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/calliope/backend.py", line 885, in Run
    output_formatter(result)
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/calliope/backend.py", line 870, in OutputFormatter
    command_instance.Display(args, obj)
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py", line 918, in Display
    list(resources)
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py", line 881, in Run
    new_object = self.Modify(args, objects[0])
  File "/opt/google-cloud-sdk/./lib/googlecloudsdk/compute/lib/base_classes.py", line 929, in Modify
    new_object = copy.deepcopy(existing)
  File "/usr/lib64/python2.6/copy.py", line 189, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/usr/lib64/python2.6/copy.py", line 338, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/lib64/python2.6/copy.py", line 162, in deepcopy
    y = copier(x, memo)
  File "/usr/lib64/python2.6/copy.py", line 255, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib64/python2.6/copy.py", line 162, in deepcopy
    y = copier(x, memo)
  File "/usr/lib64/python2.6/copy.py", line 255, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib64/python2.6/copy.py", line 189, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/usr/lib64/python2.6/copy.py", line 338, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/lib64/python2.6/copy.py", line 162, in deepcopy
    y = copier(x, memo)
  File "/usr/lib64/python2.6/copy.py", line 255, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib64/python2.6/copy.py", line 162, in deepcopy
    y = copier(x, memo)
  File "/usr/lib64/python2.6/copy.py", line 255, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib64/python2.6/copy.py", line 189, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/usr/lib64/python2.6/copy.py", line 329, in _reconstruct
    y.append(item)
  File "/opt/google-cloud-sdk/./lib/protorpc/messages.py", line 1087, in append
    self.__field.validate_element(value)
AttributeError: 'FieldList' object has no attribute '_FieldList__field'

This looks like a bug with the Cloud SDK utility and I'll pass that along internally. 

    If I try to create one via the webui, it's not a textarea but a
    single text input and everything ends up on one line, pasting the
    script in it makes it all end up on one line .. I changed the input
    to a textarea via my chrome dev tools, but submitting the multilne
    text via that seems to freak out the webui totally and it just
    clears the value. Confirmed these via curl -H 'Metadata-Flavor:
    Google'
    metadata/computeMetadata/v1/project/attributes/startup-script .. it
    is not just the webui showing it wrong.

I'll pass this feedback along to the web UI team.
 

Then I tried to pass the bash script through a quick tr '\n' ';' and pasted that to the project startup-script metadata input, and told salt-cloud to create the instances.

I get the same errors:
[ERROR   ] Authentication failed: status code 255
[ERROR   ] Failed to start Salt on Cloud VM vpn-1
[INFO    ] Created Cloud VM 'vpn-1'
[ERROR   ] Authentication failed: status code 255
[ERROR   ] Failed to start Salt on Cloud VM web-1
[INFO    ] Created Cloud VM 'web-1'

But after the failure I can SSH in via ssh -i /etc/salt/ssh_key.pem (same SSH key as defined in my main config) salt@vpn-1 (same SSH username as in my profiles)..

I'm again running out if ideas here, been also trying to ask around on #salt and #gcloud @Freenode but that hasn't been very fruitful so far.

Feel free to reach out to me on IRC too anytime and we can work through things real time, erjohnso on freenode/oftc. I'm mostly AFK except for when I'm at work Mon-Fri Pacific time.




- Janne



Reply all
Reply to author
Forward
0 new messages