Dataflow jobs incorrectly marked as failing

1,003 views
Skip to first unread message

Vikram Tiwari

unread,
May 18, 2018, 5:00:11 PM5/18/18
to cloud-composer-discuss
Hi there!

NOTE: This might be an airflow issue rather than composer issue.

I have a very small job that gets scheduled from airflow and executes on dataflow. Dataflow finishes the job properly and returns the `JOB_STATE_DONE` flag at the end of it. But seems like one of the hooks is breaking after that and marking my airflow task as failed. Any solution to this?

All Logs:

```
*** Reading remote log from gs://us-central1-devbeta-c264d68e-bucket/logs/etl_dataflow/init/2018-05-18T19:10:30.216724/1.log.
[2018-05-18 19:10:40,231] {cli.py:374} INFO - Running on host airflow-worker-75bd47cf9c-fqgzh
[2018-05-18 19:10:40,304] {models.py:1196} INFO - Dependencies all met for <TaskInstance: etl_dataflow.init 2018-05-18 19:10:30.216724 [queued]>
[2018-05-18 19:10:40,310] {models.py:1196} INFO - Dependencies all met for <TaskInstance: etl_dataflow.init 2018-05-18 19:10:30.216724 [queued]>
[2018-05-18 19:10:40,310] {models.py:1406} INFO - 
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------

[2018-05-18 19:10:40,347] {models.py:1427} INFO - Executing <Task(DataFlowPythonOperator): init> on 2018-05-18 19:10:30.216724
[2018-05-18 19:10:40,348] {base_task_runner.py:115} INFO - Running: ['bash', '-c', u'airflow run etl_dataflow init 2018-05-18T19:10:30.216724 --job_id 2164 --raw -sd DAGS_FOLDER/dataflow/vikram/dag.py']
[2018-05-18 19:10:42,884] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:10:42,883] {configuration.py:206} WARNING - section/key [celery/celery_ssl_active] not found in config
[2018-05-18 19:10:42,885] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:10:42,885] {default_celery.py:41} WARNING - Celery Executor will run without SSL
[2018-05-18 19:10:42,888] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:10:42,888] {__init__.py:45} INFO - Using executor CeleryExecutor
[2018-05-18 19:10:43,000] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:10:43,000] {app.py:42} ERROR - Failed to update environment from json.
[2018-05-18 19:10:43,003] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:10:43,003] {configuration.py:389} INFO - Reading the config from /etc/airflow/airflow.cfg
[2018-05-18 19:10:43,030] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:10:43,029] {models.py:189} INFO - Filling up the DagBag from /home/airflow/gcs/dags/dataflow/vikram/dag.py
[2018-05-18 19:10:43,423] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:10:43,422] {gcp_dataflow_hook.py:97} INFO - Running command: python /home/airflow/gcs/dags/dataflow/vikram/etl.py --runner=DataflowRunner --requirements_file=/home/airflow/gcs/dags/dataflow/vikram/requirements.txt --end_date=2018-01-02 --runner=DataflowRunner --labels=airflow-version=v1-9-0-composer --extra_package=/home/airflow/gcs/dags/dataflow/vikram/dist/Common-1.0.0.tar.gz --job_name=init-29683c6e --setup_file=/home/airflow/gcs/dags/dataflow/vikram/setup.py --start_date=2018-01-01
[2018-05-18 19:10:44,677] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:10:44,677] {gcp_dataflow_hook.py:124} INFO - Start waiting for DataFlow process to complete.
[2018-05-18 19:27:30,438] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,437] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Basic pipeline options ready
[2018-05-18 19:27:30,540] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,437] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Google cloud pipeline options are ready
[2018-05-18 19:27:30,542] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,437] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Runner is set
[2018-05-18 19:27:30,542] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,437] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Pipeline structured
[2018-05-18 19:27:30,542] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,437] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Starting GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/pipeline.pb...
[2018-05-18 19:27:30,543] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
[2018-05-18 19:27:30,543] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Completed GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/pipeline.pb
[2018-05-18 19:27:30,544] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Starting GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/requirements.txt...
[2018-05-18 19:27:30,544] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Completed GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/requirements.txt
[2018-05-18 19:27:30,544] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Executing command: ['/usr/local/bin/python', '-m', 'pip', 'download', '--dest', '/tmp/dataflow-requirements-cache', '-r', '/home/airflow/gcs/dags/dataflow/vikram/requirements.txt', '--no-binary', ':all:']
[2018-05-18 19:27:30,548] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - You are using pip version 9.0.1, however version 10.0.1 is available.
[2018-05-18 19:27:30,548] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - You should consider upgrading via the 'pip install --upgrade pip' command.
[2018-05-18 19:27:30,548] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Starting GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/pytz-2018.4.tar.gz...
[2018-05-18 19:27:30,549] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Completed GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/pytz-2018.4.tar.gz
[2018-05-18 19:27:30,549] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Starting GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/six-1.11.0.tar.gz...
[2018-05-18 19:27:30,550] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Completed GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/six-1.11.0.tar.gz
[2018-05-18 19:27:30,550] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Starting GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/pandas-0.22.0.tar.gz...
[2018-05-18 19:27:30,551] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Completed GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/pandas-0.22.0.tar.gz
[2018-05-18 19:27:30,551] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Starting GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/python-dateutil-2.7.3.tar.gz...
[2018-05-18 19:27:30,551] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Completed GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/python-dateutil-2.7.3.tar.gz
[2018-05-18 19:27:30,553] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Starting GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/numpy-1.14.2.zip...
[2018-05-18 19:27:30,553] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,441] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Completed GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/numpy-1.14.2.zip
[2018-05-18 19:27:30,554] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,442] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Executing command: ['/usr/local/bin/python', 'setup.py', 'sdist', '--dist-dir', '/tmp/tmpATLVq7']
[2018-05-18 19:27:30,555] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,442] {gcp_dataflow_hook.py:109} WARNING - warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md
[2018-05-18 19:27:30,555] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,442] {gcp_dataflow_hook.py:109} WARNING - 
[2018-05-18 19:27:30,555] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,442] {gcp_dataflow_hook.py:109} WARNING - warning: check: missing required meta-data: url
[2018-05-18 19:27:30,557] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,443] {gcp_dataflow_hook.py:109} WARNING - 
[2018-05-18 19:27:30,557] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,443] {gcp_dataflow_hook.py:109} WARNING - 'Common.egg-info/PKG-INFO' not a regular file -- skipping
[2018-05-18 19:27:30,557] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,443] {gcp_dataflow_hook.py:109} WARNING - 'Common.egg-info/SOURCES.txt' not a regular file -- skipping
[2018-05-18 19:27:30,558] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,445] {gcp_dataflow_hook.py:109} WARNING - 'Common.egg-info/dependency_links.txt' not a regular file -- skipping
[2018-05-18 19:27:30,558] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,445] {gcp_dataflow_hook.py:109} WARNING - 'Common.egg-info/requires.txt' not a regular file -- skipping
[2018-05-18 19:27:30,559] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,445] {gcp_dataflow_hook.py:109} WARNING - 'Common.egg-info/SOURCES.txt' not a regular file -- skipping
[2018-05-18 19:27:30,560] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,446] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Starting GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/workflow.tar.gz...
[2018-05-18 19:27:30,560] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,446] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Completed GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/workflow.tar.gz
[2018-05-18 19:27:30,562] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,446] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Starting GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/Common-1.0.0.tar.gz...
[2018-05-18 19:27:30,562] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,446] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Completed GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/Common-1.0.0.tar.gz
[2018-05-18 19:27:30,563] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,446] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Starting GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/extra_packages.txt...
[2018-05-18 19:27:30,564] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,446] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Completed GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/extra_packages.txt
[2018-05-18 19:27:30,564] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,446] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Starting GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/pickled_main_session...
[2018-05-18 19:27:30,566] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,446] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Completed GCS upload to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/pickled_main_session
[2018-05-18 19:27:30,567] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,446] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Staging the SDK tarball from PyPI to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/dataflow_python_sdk.tar
[2018-05-18 19:27:30,567] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,447] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Executing command: ['/usr/local/bin/python', '-m', 'pip', 'download', '--dest', '/tmp/tmpATLVq7', 'google-cloud-dataflow==2.4.0', '--no-binary', ':all:', '--no-deps']
[2018-05-18 19:27:30,568] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,448] {gcp_dataflow_hook.py:109} WARNING - You are using pip version 9.0.1, however version 10.0.1 is available.
[2018-05-18 19:27:30,568] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,448] {gcp_dataflow_hook.py:109} WARNING - You should consider upgrading via the 'pip install --upgrade pip' command.
[2018-05-18 19:27:30,568] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,448] {gcp_dataflow_hook.py:109} WARNING - INFO:root:file copy from /tmp/tmpATLVq7/google-cloud-dataflow-2.4.0.tar.gz to gs://us-central1-devbeta-c264d68e-bucket/dataflow_runner/staging/etl-test-20180518191047464214.1526670647.665535/dataflow_python_sdk.tar.
[2018-05-18 19:27:30,569] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,448] {gcp_dataflow_hook.py:109} WARNING - /usr/local/lib/python2.7/site-packages/apache_beam/io/gcp/gcsio.py:166: DeprecationWarning: object() takes no parameters
[2018-05-18 19:27:30,569] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,448] {gcp_dataflow_hook.py:109} WARNING -   super(GcsIO, cls).__new__(cls, storage_client))
[2018-05-18 19:27:30,569] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,448] {gcp_dataflow_hook.py:109} WARNING - INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
[2018-05-18 19:27:30,570] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,448] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Create job: <Job
[2018-05-18 19:27:30,570] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,448] {gcp_dataflow_hook.py:109} WARNING -  createTime: u'2018-05-18T19:11:09.365991Z'
[2018-05-18 19:27:30,571] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,448] {gcp_dataflow_hook.py:109} WARNING -  currentStateTime: u'1970-01-01T00:00:00Z'
[2018-05-18 19:27:30,571] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING -  id: u'2018-05-18_12_11_08-16189764197094343612'
[2018-05-18 19:27:30,571] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING -  location: u'us-central1'
[2018-05-18 19:27:30,572] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING -  name: u'etl-test-20180518191047464214'
[2018-05-18 19:27:30,572] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING -  projectId: u'ivikramtiwari'
[2018-05-18 19:27:30,572] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING -  stageStates: []
[2018-05-18 19:27:30,572] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING -  steps: []
[2018-05-18 19:27:30,573] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING -  tempFiles: []
[2018-05-18 19:27:30,573] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING -  type: TypeValueValuesEnum(JOB_TYPE_BATCH, 1)>
[2018-05-18 19:27:30,573] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Created job with id: [2018-05-18_12_11_08-16189764197094343612]
[2018-05-18 19:27:30,574] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING - INFO:root:To access the Dataflow monitoring console, please navigate to https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-05-18_12_11_08-16189764197094343612?project=ivikramtiwari
[2018-05-18 19:27:30,574] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Pipeline running
[2018-05-18 19:27:30,574] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Job 2018-05-18_12_11_08-16189764197094343612 is in state JOB_STATE_PENDING
[2018-05-18 19:27:30,575] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:08.503Z: JOB_MESSAGE_DETAILED: Autoscaling is enabled for job 2018-05-18_12_11_08-16189764197094343612. The number of workers will be between 1 and 1000.
[2018-05-18 19:27:30,575] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:08.541Z: JOB_MESSAGE_DETAILED: Autoscaling was automatically enabled for job 2018-05-18_12_11_08-16189764197094343612.
[2018-05-18 19:27:30,575] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:11.227Z: JOB_MESSAGE_DETAILED: Checking required Cloud APIs are enabled.
[2018-05-18 19:27:30,576] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:11.410Z: JOB_MESSAGE_DETAILED: Checking permissions granted to controller Service Account.
[2018-05-18 19:27:30,576] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:12.288Z: JOB_MESSAGE_BASIC: Worker configuration: n1-standard-1 in us-central1-b.
[2018-05-18 19:27:30,577] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:12.917Z: JOB_MESSAGE_DETAILED: Expanding CoGroupByKey operations into optimizable parts.
[2018-05-18 19:27:30,577] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:12.969Z: JOB_MESSAGE_DETAILED: Expanding GroupByKey operations into optimizable parts.
[2018-05-18 19:27:30,577] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,450] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:13.010Z: JOB_MESSAGE_DETAILED: Lifting ValueCombiningMappingFns into MergeBucketsMappingFns
[2018-05-18 19:27:30,578] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,451] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:13.055Z: JOB_MESSAGE_DEBUG: Annotating graph with Autotuner information.
[2018-05-18 19:27:30,578] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,452] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:13.109Z: JOB_MESSAGE_DETAILED: Fusing adjacent ParDo, Read, Write, and Flatten operations
[2018-05-18 19:27:30,578] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,452] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:13.153Z: JOB_MESSAGE_DETAILED: Fusing consumer Run ETL into Begin pipeline/Read
[2018-05-18 19:27:30,579] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,452] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:13.199Z: JOB_MESSAGE_DEBUG: Workflow config is missing a default resource spec.
[2018-05-18 19:27:30,579] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,452] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:13.243Z: JOB_MESSAGE_DEBUG: Adding StepResource setup and teardown to workflow graph.
[2018-05-18 19:27:30,580] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,452] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:13.289Z: JOB_MESSAGE_DEBUG: Adding workflow start and stop steps.
[2018-05-18 19:27:30,580] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,452] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:13.336Z: JOB_MESSAGE_DEBUG: Assigning stage ids.
[2018-05-18 19:27:30,580] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,452] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:13.515Z: JOB_MESSAGE_DEBUG: Executing wait step start3
[2018-05-18 19:27:30,581] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,452] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:13.593Z: JOB_MESSAGE_BASIC: Executing operation Begin pipeline/Read+Run ETL
[2018-05-18 19:27:30,581] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,452] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:13.651Z: JOB_MESSAGE_DEBUG: Starting worker pool setup.
[2018-05-18 19:27:30,582] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,452] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:13.697Z: JOB_MESSAGE_BASIC: Starting 1 workers in us-central1-b...
[2018-05-18 19:27:30,582] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,452] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Job 2018-05-18_12_11_08-16189764197094343612 is in state JOB_STATE_RUNNING
[2018-05-18 19:27:30,582] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,452] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:20.962Z: JOB_MESSAGE_DETAILED: Autoscaling: Raised the number of workers to 0 based on the rate of progress in the currently running step(s).
[2018-05-18 19:27:30,583] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,452] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:31.506Z: JOB_MESSAGE_DETAILED: Autoscaling: Raised the number of workers to 1 based on the rate of progress in the currently running step(s).
[2018-05-18 19:27:30,583] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,454] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:11:51.645Z: JOB_MESSAGE_DETAILED: Workers have started successfully.
[2018-05-18 19:27:30,583] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,454] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:25:43.233Z: JOB_MESSAGE_DEBUG: Executing success step success1
[2018-05-18 19:27:30,584] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,456] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:25:43.371Z: JOB_MESSAGE_DETAILED: Cleaning up.
[2018-05-18 19:27:30,584] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,457] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:25:43.450Z: JOB_MESSAGE_DEBUG: Starting worker pool teardown.
[2018-05-18 19:27:30,584] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,457] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:25:43.498Z: JOB_MESSAGE_BASIC: Stopping worker pool...
[2018-05-18 19:27:30,585] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,457] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:27:21.182Z: JOB_MESSAGE_DETAILED: Autoscaling: Resized worker pool from 1 to 0.
[2018-05-18 19:27:30,585] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,457] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:27:21.227Z: JOB_MESSAGE_DETAILED: Autoscaling: Would further reduce the number of workers but reached the minimum number allowed for the job.
[2018-05-18 19:27:30,585] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,457] {gcp_dataflow_hook.py:109} WARNING - INFO:root:2018-05-18T19:27:21.291Z: JOB_MESSAGE_DEBUG: Tearing down pending resources...
[2018-05-18 19:27:30,586] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,457] {gcp_dataflow_hook.py:109} WARNING - INFO:root:Job 2018-05-18_12_11_08-16189764197094343612 is in state JOB_STATE_DONE
[2018-05-18 19:27:30,586] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,457] {gcp_api_base_hook.py:72} INFO - Getting connection using `gcloud auth` user, since no key file is defined for hook.
[2018-05-18 19:27:30,586] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,463] {__init__.py:44} WARNING - file_cache is unavailable when using oauth2client >= 4.0.0
[2018-05-18 19:27:30,587] {base_task_runner.py:98} INFO - Subtask: Traceback (most recent call last):
[2018-05-18 19:27:30,587] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/googleapiclient/discovery_cache/__init__.py", line 41, in autodetect
[2018-05-18 19:27:30,587] {base_task_runner.py:98} INFO - Subtask:     from . import file_cache
[2018-05-18 19:27:30,588] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/googleapiclient/discovery_cache/file_cache.py", line 41, in <module>
[2018-05-18 19:27:30,590] {base_task_runner.py:98} INFO - Subtask:     'file_cache is unavailable when using oauth2client >= 4.0.0')
[2018-05-18 19:27:30,591] {base_task_runner.py:98} INFO - Subtask: ImportError: file_cache is unavailable when using oauth2client >= 4.0.0
[2018-05-18 19:27:30,591] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,464] {discovery.py:267} INFO - URL being requested: GET https://www.googleapis.com/discovery/v1/apis/dataflow/v1b3/rest
[2018-05-18 19:27:30,591] {base_task_runner.py:98} INFO - Subtask: [2018-05-18 19:27:30,465] {transport.py:157} INFO - Attempting refresh to obtain initial access_token
[2018-05-18 19:27:30,735] {base_task_runner.py:98} INFO - Subtask: Traceback (most recent call last):
[2018-05-18 19:27:30,736] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/bin/airflow", line 27, in <module>
[2018-05-18 19:27:30,736] {base_task_runner.py:98} INFO - Subtask:     args.func(args)
[2018-05-18 19:27:30,737] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 392, in run
[2018-05-18 19:27:30,737] {base_task_runner.py:98} INFO - Subtask:     pool=args.pool,
[2018-05-18 19:27:30,737] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/utils/db.py", line 50, in wrapper
[2018-05-18 19:27:30,738] {base_task_runner.py:98} INFO - Subtask:     result = func(*args, **kwargs)
[2018-05-18 19:27:30,738] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1492, in _run_raw_task
[2018-05-18 19:27:30,738] {base_task_runner.py:98} INFO - Subtask:     result = task_copy.execute(context=context)
[2018-05-18 19:27:30,739] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/contrib/operators/dataflow_operator.py", line 313, in execute
[2018-05-18 19:27:30,739] {base_task_runner.py:98} INFO - Subtask:     self.py_file, self.py_options)
[2018-05-18 19:27:30,739] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 188, in start_python_dataflow
[2018-05-18 19:27:30,740] {base_task_runner.py:98} INFO - Subtask:     label_formatter)
[2018-05-18 19:27:30,740] {base_task_runner.py:98} INFO - Subtask:   File "/usr/local/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 159, in _start_dataflow
[2018-05-18 19:27:30,740] {base_task_runner.py:98} INFO - Subtask:     _DataflowJob(self.get_conn(), variables['project'],
[2018-05-18 19:27:30,741] {base_task_runner.py:98} INFO - Subtask: KeyError: 'project'
```

Balázs Bartl

unread,
Jun 12, 2018, 10:35:42 AM6/12/18
to cloud-composer-discuss
I had a same issue. My problem was, that I defined an own job name in the pipeline options, which overwrote the one provided by Composer, and it broke the state reporting.

Crystal Qian

unread,
Jun 16, 2018, 4:49:35 PM6/16/18
to balazs...@doctusoft.com, cloud-compo...@googlegroups.com
Hi Vikram,

Did Balázs' response help? Are you still having this issue?

Crystal

Reply all
Reply to author
Forward
0 new messages