Run scheduled Dataflow job from AppEngine web application (flask) - ImportError: No module named dataflow_pipeline

283 views
Skip to first unread message

Leonid SpiralSolutions

unread,
Sep 3, 2017, 12:13:21 PM9/3/17
to Google App Engine

The application is deployed successfully and can be scheduled/ launched via cron.
it creates the dataflow job successfully and then the job fails.

The flow reads data from Bigquery, filters rows and writes to Bigquery.
The filter step fails (beam.Filter(....))

error details from google error report:
ImportError: No module named dataflow_pipeline
at _import_module (/usr/local/lib/python2.7/dist-packages/dill/dill.py:767)
at load_reduce (/usr/lib/python2.7/pickle.py:1133)
at load (/usr/lib/python2.7/pickle.py:858)
at load (/usr/local/lib/python2.7/dist-packages/dill/dill.py:266)
at loads (/usr/local/lib/python2.7/dist-packages/dill/dill.py:277)
at loads (/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py:225)
at dataflow_worker.operations.DoOperation.start (dataflow_worker/operations.c:9775) (operations.py:289)
at dataflow_worker.operations.DoOperation.start (dataflow_worker/operations.c:10574) (operations.py:284)
at dataflow_worker.operations.DoOperation.start (dataflow_worker/operations.c:10680) (operations.py:283)
at execute (/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py:166)
at do_work (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:581)

when running dataflow manually from client, works fine!
any suggestions and comments would be really helpful!

Yannick (Cloud Platform Support)

unread,
Sep 4, 2017, 1:47:52 PM9/4/17
to Google App Engine
Hello, if I understand correctly you have an App Engine application with code to create and run a Dataflow pipeline. When you use this application to run the pipeline manually it works with no issues, but a pipeline created by the exact same code and with the same pipeline parameters fails when its creation is triggered through cron?

If so that definitely shouldn't be happening. The most likely scenario is an issue somewhere in your code, and the best place to diagnose that would be on Stack Overflow using one of the tags monitored by our community technical support team.

If you have clear indications that the issue is actually linked to a bug in Dataflow then you should report it on our Public Issue Tracker.

Leonid SpiralSolutions

unread,
Sep 10, 2017, 4:56:20 AM9/10/17
to Google App Engine
found the problem
forgot to apply setup_file parameter in pipeline options with my package module.
Reply all
Reply to author
Forward
0 new messages