PostgresHook and SqlAlchemy

1,895 views
Skip to first unread message

Randy How

unread,
Feb 18, 2016, 12:15:15 PM2/18/16
to Airflow
Greetings,

I have an existing Postgres/PostGIS database and data access layer that uses SqlAlchemy in an application. There's also dependencies on GeoAlchemy in the DAL. The database connection for DAL utilizes the methods described in the SqlAlchemy documentation (i.e. Engine and Connection). 

I want to expose the works flows in Airflow. I have setup a DAG and Task with a DB connection string in the Admin, and the connection_id is acquired and passed around. 

I see that both SqlAlchemy and the Airflow PostgresHook use psycopg2 under the hood. I wondering if I can acquire the db connection in Airflow and utilize it in SqlAlchemy.

Before beating my head on this I thought I'd reach out for any insight or direction.

Thanks, Randy

Maxime Beauchemin

unread,
Feb 21, 2016, 10:29:31 AM2/21/16
to Airflow
I think for that purpose it'd be nice to have a generic SqlAlchemyHook and SqlAlchemyOperator. In theory you could write ETL with it that would be portable. It's nice for people who'd rather use the API instead of raw SQL.

Max

George Leslie-Waksman

unread,
Mar 16, 2016, 2:10:38 PM3/16/16
to Airflow
We've been doing pretty much exactly what you're talking about. We just wrote a small shim between airflow connections and sqlalchemy engines, which we then use inside PythonOperator tasks and our own custom tasks.

It ends up being, roughly:

def get_engine(conn_id):
    connection = hooks.BaseHook.get_connection(conn_id)
    connection_uri = '{c.conn_type}://{c.login}:{c.password}@{c.host}:{c.port}/{c.schema}'.format(c=connection)
    return sqlalchemy.create_engine(connection_uri)

I had to remove a couple extra pieces, so I might have mistyped something, but that should be a pretty good starting point.
Reply all
Reply to author
Forward
0 new messages