SQLAlchemy (v.1.3.22) can not find Teradata engine inside Amazon Glue Job's script in Amazon environment

831 views
Skip to first unread message

Anhelina Rudkovska

unread,
Feb 1, 2021, 11:50:13 AM2/1/21
to sqlalchemy

Everything works as expected on local machine.

Code fails where SQLAlchemy db engine initializes. We use latest (17.0.0.8) release of https://pypi.org/project/teradatasqlalchemy/ library to provide DB engine for SQLAlchemy. SQLAlchemy reports that it can not load plugin teradatasql. I attached screenshot with error and piece of code which is used to establish connection.
Seems like library pkg_resources which is called inside SQLAlchemy can't resolve teradatasql inside Amazon environment from .zip. Site-packages shipped for Amazon as site-packages.zip placed on AWS s3.
Direct imports of teradatasql or pkg_resources work fine (or teradatasqlalchemy which is located in .zip with site-packages on s3 too). Site-packages in archive look same as
site-packages in their directory on local machine (i.e. where python located or in virtual env, or inside filesystem of docker container).

To develop script and run ETL job locally we use container (as described here https://aws.amazon.com/blogs/big-data/developing-aws-glue-etl-jobs-locally-using-a-container/)
created from our image (installation of python libraries for script from requirements.txt in Dockerfile was added) which inherits from amazon/aws-glue-libs.
 
I also notified Amazon Support.
Screenshot 2021-02-01 182015.png

Simon King

unread,
Feb 2, 2021, 12:04:00 PM2/2/21
to sqlal...@googlegroups.com
SQLAlchemy uses setuptools entry points to load database drivers.
Here's the definition for the teradata dialect:

https://github.com/Teradata/sqlalchemy-teradata/blob/master/setup.py#L25

For that to work, you would normally have a directory called something
like "sqlalchemy_teradata-0.1.0.dist-info" (or possible .egg-info) in
your site-packages. The directory would contain an entry_points.txt
file that points to the dialect class.

Does your site-packages.zip contain that dist info directory with the
entry_points file inside?

Simon
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
> ---
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/87ce8dc9-b015-4ef0-a92a-5c5c10b528fan%40googlegroups.com.

Anhelina Rudkovska

unread,
Feb 4, 2021, 5:42:59 AM2/4/21
to sqlalchemy
Thanks a lot for your answer and for explanations, Simon!
According to your suggestions I checked yesterday site-packages.zip and yes, directory teradatasqlalchemy-17.0.0.0.dist-info contains entry_point.txt. I also compared all content of site-packages.zip with the same libs' directories in python site-packages folder and everything the same. Also I was trying to use site-packages.zip created in different systems: windows through git bash, macos and inside linux-based docker container with mounted volume to save .zip I got in container - so in every try it was created using command line interfaces.
Some screenshots of .zip file content and corresponding .dist-info content were attached.

Also I found out the way to resolve problem mentioned in this topic yesterday, there were such steps:
> from sqalchemy.dialects import registry
> registry.register("teradatasql", "teradatasqlalchemy.dialect", "TeradataDialect")

Engine was finally successfully created inside AWS Glue job's env, but then OSError with teradatasql.so file was occurred (also there is attached screenshot). In local env there is no such problem.

I believe this is not related to SQAlchemy itself, but to Spark context and Amazon environment. https://stackoverflow.com/questions/61931878/running-teradatasql-driver-for-python-code-using-spark here is the same topic, but everything in my site-packages.zip identical to ordinary site-packages in python dir.

I 99% sure that it's not the question to SQAlchemy team, but to summarize the state of an issue I mentioned it. And probably I also hope that someone accidentally know something about it. A big sorry for wasting your time and thank you for an effort you did! 

BR, Anhelina
вторник, 2 февраля 2021 г. в 19:04:00 UTC+2, Simon King:
Screenshot 2021-02-04 at 11.12.42.png
Screenshot 2021-02-04 at 11.12.15.png
Screenshot 2021-02-04 at 11.55.10.png

Simon King

unread,
Feb 4, 2021, 6:10:43 AM2/4/21
to sqlal...@googlegroups.com
Ah, OK, so the real problem is that the teradata package is trying to
load a .so file from site-packages.zip and failing. This presumably
happens when the module is imported, and Python is catching the
underlying exception and raising an ImportError instead.

It sounds like the teradata package is not expecting to be loaded from
a zip file at all, so you might have to find a different way of
packaging it for the Amazon environment. As you suspected, it's not an
SQLAlchemy question any more.

Good luck!

Simon
> To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/9bb5f6ce-2b70-4dff-831d-4a4ff0dcbfecn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages