Trouble running composer update (from 1.5 to 1.6.1)

558 views
Skip to first unread message

shawn...@faire.com

unread,
Apr 30, 2019, 6:43:31 PM4/30/19
to cloud-composer-discuss
I've been trying to use the new update workflow (both web-ui and gcloud-cli) and I can't get it to work.

I'm currently on composer-1.5.0-airflow-1.9.0, python 2. 

The script that used to exist at https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/composer/tools has been taken down (I don't have a copy) so it's a case of "don't use the old way but the new way is still beta".

update command:
`gcloud beta composer environments update datascience --location us-east1 --image-version composer-1.6.1-airflow-1.9.0`



After a rather long time (a couple hours) I finally get a failure (note that the composer operation is finally marked with the failure state about 30 minutes after the last failed composer-agent pod calls pub-sub).

```
$ gcloud beta composer operations describe --location us-east1 c3a38dff-a85d-4351-aace-d44df05f77ad
done: true
error:
  code: 3
  message: |
    Http error status code: 400
    Http error message: BAD REQUEST
    Additional errors:
    {"ResourceType":"c19bd7407b84f035b-tp/us-east1-datascience-e3db07eb-gae-typer:appengine.apps.services.versions.create","ResourceErrorCode":"400","ResourceErrorMessage":"Docker image gcr.io/c19bd7407b84f035b-tp/c3a38dff-a85d-4351-aace-d44df05f77ad was either not found, or is not in Docker V2 format.  Please visit https://cloud.google.com/container-registry/docs/ui "}
metadata:
  '@type': type.googleapis.com/google.cloud.orchestration.airflow.service.v1beta1.OperationMetadata
  createTime: '2019-04-30T21:23:20.434Z'
  endTime: '2019-04-30T22:23:27.340Z'
  operationType: UPDATE
  resource: projects/faire-data/locations/us-east1/environments/datascience
  resourceUuid: ff067471-5729-4569-8e75-d915cbc0cf46
  state: FAILED
name: projects/faire-data/locations/us-east1/operations/c3a38dff-a85d-4351-aace-d44df05f77ad
```




I think the error in question is misleading and not actually what's going on. When looking at the composer-agent pods I see:

NAMESPACE     NAME                                                             READY   STATUS      RESTARTS   AGE
default       composer-agent-c3a38dff-a85d-4351-aace-d44df05f77ad-4fxhk        0/1     Error       0          1h

default       composer-agent-c3a38dff-a85d-4351-aace-d44df05f77ad-4v2nl        0/1     Error       0          1h
default       composer-agent-c3a38dff-a85d-4351-aace-d44df05f77ad-6kb7p        0/1     Error       0          1h
default       composer-agent-c3a38dff-a85d-4351-aace-d44df05f77ad-7xrfk        0/1     Error       0          1h

default       composer-agent-c3a38dff-a85d-4351-aace-d44df05f77ad-8zsmx        0/1     Error       0          1h
default       composer-agent-c3a38dff-a85d-4351-aace-d44df05f77ad-qmkq6        0/1     Error       0          1h
default       composer-agent-c3a38dff-a85d-4351-aace-d44df05f77ad-qt6cd        0/1     Error       0          1h
default       composer-agent-c3a38dff-a85d-4351-aace-d44df05f77ad-xd8lg        0/1     Error       0          1h
default       composer-agent-c3a38dff-a85d-4351-aace-d44df05f77ad-6vvtq        0/1     Error       0          55m
default       composer-agent-c3a38dff-a85d-4351-aace-d44df05f77ad-vzsp7        0/1     Error       0          49m

default       composer-agent-c3a38dff-a85d-4351-aace-d44df05f77ad-rc8mn        0/1     Completed   0          43m




All of the pods that exit with an Error are attempts to roll forward to 1.6.1 and fail to run mysql dump:

```
ERROR:root:Caught exception: Failed to run command ['gcloud', 'sql', 'export', 'sql', 'us-east1-datascience-e3db07eb-sql', 'gs://us-east1-datascience-e3db07eb-bucket/data/default.backup.gz', '--database=default', '--project', 'c19bd7407b84f035b-tp'], details: Exporting Cloud SQL instance...
...failed.
ERROR: (gcloud.sql.export.sql) [ERROR_RDBMS] mysqldump: Got error: 1049: Unknown database 'default' when selecting the database


Traceback (most recent call last):
  File "/home/airflow/agent/image_version_updater/image_version_updater.py", line 539, in run
    outcome = self._RollForward()
  File "/home/airflow/agent/image_version_updater/image_version_updater.py", line 150, in _RollForward
    if self._HasTimeoutElapsed() or not f():
  File "/home/airflow/agent/image_version_updater/image_version_updater.py", line 180, in _CloneDatabase
    backup_location)
  File "/home/airflow/agent/lib/sql_interface.py", line 73, in export_to_gcs
    _run_shell_command(export_command.split(' '))
  File "/home/airflow/agent/lib/sql_interface.py", line 20, in _run_shell_command
    raise Exception(error_message)
Exception: Failed to run command ['gcloud', 'sql', 'export', 'sql', 'us-east1-datascience-e3db07eb-sql', 'gs://us-east1-datascience-e3db07eb-bucket/data/default.backup.gz', '--database=default', '--project', 'c19bd7407b84f035b-tp'], details: Exporting Cloud SQL instance...
...failed.
ERROR: (gcloud.sql.export.sql) [ERROR_RDBMS] mysqldump: Got error: 1049: Unknown database 'default' when selecting the database




WARNING:root:Failed to update image version. To retry
```




The final pod that "completes" is the roll-back but it also has a bunch of errors in its logs:

```
$ kubectl --context data logs composer-agent-c3a38dff-a85d-4351-aace-d44df05f77ad-rc8mn
/home/airflow/agent/lib/k8s_utils.py:65: SyntaxWarning: name 'k8s_beta' is used prior to global declaration
  global k8s_beta
/home/airflow/agent/lib/k8s_utils.py:111: SyntaxWarning: name 'k8s_beta' is used prior to global declaration
  global k8s_beta
/home/airflow/agent/lib/k8s_utils.py:167: SyntaxWarning: name 'k8s_v1' is used prior to global declaration
  global k8s_v1
/home/airflow/agent/lib/k8s_utils.py:210: SyntaxWarning: name 'k8s_v1' is used prior to global declaration
  global k8s_v1
INFO:root:Starting Composer Agent.
INFO:root:Log sink is created to export webserver logs.
INFO:root:GKE cluster in zone: us-east1-b
Fetching cluster endpoint and auth data.
kubeconfig entry generated for us-east1-datascience-e3db07eb-gke.
WARNING:googleapiclient.discovery_cache:file_cache is unavailable when using oauth2client >= 4.0.0
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery_cache/__init__.py", line 41, in autodetect
    from . import file_cache
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery_cache/file_cache.py", line 41, in <module>
    'file_cache is unavailable when using oauth2client >= 4.0.0')
ImportError: file_cache is unavailable when using oauth2client >= 4.0.0
INFO:googleapiclient.discovery:URL being requested: GET https://www.googleapis.com/discovery/v1/apis/pubsub/v1/rest
INFO:googleapiclient.discovery:URL being requested: POST https://pubsub.googleapis.com/v1/projects/faire-data/subscriptions/us-east1-datascience-e3db07eb-composer-agent-sub-c3a38dff-a85d-4351-aace-d44df05f77ad:pull?alt=json
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Responding to image version update request.
WARNING:googleapiclient.discovery_cache:file_cache is unavailable when using oauth2client >= 4.0.0
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery_cache/__init__.py", line 41, in autodetect
    from . import file_cache
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery_cache/file_cache.py", line 41, in <module>
    'file_cache is unavailable when using oauth2client >= 4.0.0')
ImportError: file_cache is unavailable when using oauth2client >= 4.0.0
INFO:googleapiclient.discovery:URL being requested: GET https://www.googleapis.com/discovery/v1/apis/pubsub/v1/rest
INFO:root:Set roll forward deadline to 2019-04-30 21:49:23+00:00
Copying file://c3a38dff-a85d-4351-aace-d44df05f77ad [Content-Type=application/octet-stream]...
/ [1 files][  151.0 B/  151.0 B]
Operation completed over 1 objects/151.0 B.
INFO:googleapiclient.discovery:URL being requested: POST https://pubsub.googleapis.com/v1/projects/faire-data/topics/us-east1-datascience-e3db07eb-composer-agent-to-backend-topic-c3a38dff-a85d-4351-aace-d44df05f77ad:publish?alt=json
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
WARNING:root:Roll forward timed out!
WARNING:root:Roll forward timed out!
INFO:root:Starting roll back.
Copying file://c3a38dff-a85d-4351-aace-d44df05f77ad [Content-Type=application/octet-stream]...
/ [1 files][  177.0 B/  177.0 B]
Operation completed over 1 objects/177.0 B.
INFO:googleapiclient.discovery:URL being requested: POST https://pubsub.googleapis.com/v1/projects/faire-data/topics/us-east1-datascience-e3db07eb-composer-agent-to-backend-topic-c3a38dff-a85d-4351-aace-d44df05f77ad:publish?alt=json
INFO:root:Scaling up worker and scheduler deployments in namespace [default]
INFO:root:Restarting deployment: airflow-scheduler in namespace default
INFO:root:Restarting deployment: airflow-worker in namespace default
INFO:root:Deployment(s) restarted: ['airflow-scheduler', 'airflow-worker'] in namespace default
Copying file://c3a38dff-a85d-4351-aace-d44df05f77ad [Content-Type=application/octet-stream]...
/ [1 files][  178.0 B/  178.0 B]
Operation completed over 1 objects/178.0 B.
INFO:googleapiclient.discovery:URL being requested: POST https://pubsub.googleapis.com/v1/projects/faire-data/topics/us-east1-datascience-e3db07eb-composer-agent-to-backend-topic-c3a38dff-a85d-4351-aace-d44df05f77ad:publish?alt=json
INFO:root:Deleting new database: composer-1-6-1-airflow-1-9-0-c3a38dff
INFO:root:Deleting new image: c3a38dff-a85d-4351-aace-d44df05f77ad
ERROR: (gcloud.container.images.delete) [c3a38dff-a85d-4351-aace-d44df05f77ad:latest] digest must be of the form "sha256:<digest>".
ERROR:root:Critical error.
ERROR:root:Failed to update image version.
ERROR:root:
Copying file://c3a38dff-a85d-4351-aace-d44df05f77ad-failed [Content-Type=application/octet-stream]...
/ [1 files][  244.0 B/  244.0 B]
Operation completed over 1 objects/244.0 B.
INFO:googleapiclient.discovery:URL being requested: POST https://pubsub.googleapis.com/v1/projects/faire-data/topics/us-east1-datascience-e3db07eb-composer-agent-to-backend-topic-c3a38dff-a85d-4351-aace-d44df05f77ad:publish?alt=json
```



Feng Lu

unread,
May 1, 2019, 3:37:16 PM5/1/19
to shawn...@faire.com, cloud-composer-discuss
Hi Shawn, 

For the time being, you need to assign project/editor role to your Composer service account, please see the doc here.
This behavior should be fixed in the next Composer release. 

Feng 

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-composer-discuss/b436ddbe-61ba-4fa5-ad50-5e38f6621c95%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages