Cloud dataflow python3 job not solving dependencies

309 views
Skip to first unread message

Santiago Del Valle

unread,
Sep 3, 2019, 9:58:46 AM9/3/19
to Google App Engine

I have a simple apache beam project using python 3 to transform some data and write to big query, it uses a package called texstat, if I run locally everything works, but when I run on dataflow I get the following error:


NameError: name 'textstat' is not defined [while running 'generatedPtransform-441']



This is my current setup.py file:


import setuptools

REQUIRED_PACKAGES
= ['textstat==0.5.6']
    PACKAGE_NAME
= 'my_package'
    PACKAGE_VERSION
= '0.0.1'
    setuptools
.setup(
    name
=PACKAGE_NAME,
    version
=PACKAGE_VERSION,
    description
='Example project',
    install_requires
=REQUIRED_PACKAGES,
    packages
=setuptools.find_packages(),
)



and this are my pipeline args


pipeline_args = [
   
'--project={}'.format('etl-example'),
   
'--runner={}'.format('Dataflow'),
   
'--temp_location=gs://dataflowtemporal/',
   
'--setup_file=./setup.py',
]



and I run it like this


pipeline_options = PipelineOptions(pipeline_args)
pipeline_options
.view_as(StandardOptions).streaming = True
pipeline
= beam.Pipeline(options=pipeline_options)
#The actual pipelines it is running
pipeline
.run()



I also tried with running this on the terminal before running the job:


python setup.py sdist --formats=gztar



but I get the same results of texstat not being found. Another thing I tries was without setup.py and only with the argument


--requirements_file=./requirements.txt



But again, texstat is not found

At this point I don't know what else to try.

PS: Sorry, I deleted the last post because the code was all wonky and unreadable, sorry for the spam

Reply all
Reply to author
Forward
0 new messages