[Question] Why not passing Connection URL query parameters to the dialect?

nicolas...@jobteaser.com

unread,

Jan 6, 2022, 8:48:01 AM1/6/22

to sqlalchemy

Hi !

While working on some improvements to PyAthena, I was looking into means to pass some parameters to the dialect. Going through the code of the `create_engine()` function code, I saw that dialects `__init__()` where given dialect kwargs passed as kwargs to the create_engine() function. But the dialect does not have access to the connection URL.

E.g. you can do:

e = create_engine('<url>', dialect_kwarg1=<value>, dialect_kwarg2='<value>', ...)

But not:

e = create_engine('<url>?dialect_kwarg1=<value>&dialect_kwarg2=<value>', ...)

# or

e = create_engine('<url>?kwarg1=<value>&kwarg2=<value>', ...)

# though I guess cause you can pass other kind of args, like pool args, you'd like to keep # the `<dialect>_` prefix

I was wondering why? Particularly given that since the connection URL is what determines the dialect, keeping dialect specific stuff in the URL does not seem that far fetch. Or am I overlooking something?

Why does it matters? I find that passing arguments through the URL very handy. Allows to easily override certain configuration parameters, with touching any code. It also makes it easy to exchange settings with other people.

If there are no particular reasons to not do this, would you accept a PR to deal with this?

Thanks,

Nicolas.

Mike Bayer

unread,

Jan 6, 2022, 12:18:49 PM1/6/22

to noreply-spamdigest via sqlalchemy

hey there -

database URLs do support query string parameters, however they have a specific meaning which is that they are consumed by the DBAPI in use, not the dialect directly. Please review the docs at https://docs.sqlalchemy.org/en/14/core/engines.html#custom-dbapi-connect-arguments-on-connect-routines for background on how these arguments are used.

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/f185f29c-9ec2-48b8-a5de-fe6112ab25c5n%40googlegroups.com.

nicolas...@jobteaser.com

unread,

Jan 7, 2022, 5:26:22 AM1/7/22

to sqlalchemy

Hi !

Ok. So if I understand you correctly, you want to keep query parameters solely for DBAPI drivers connection parameters and would hence not accept a PR that would implement something that changes that.

There are other reasons though for which I was looking into this. In particular, what I am mentioning is already sort of done by PyAthena. They use at least two query parameters that help tell where the data is stored.

One (`s3_staging_prefix`) tells where query results are stored and fits nicely amongst the connection parameters.

The second (`s3_prefix`) is used to tall where data should be stored when a table is created and does not fit so well.

It does not fit because you end-up relying on SchemaItem to be bound to a connection to get back those parameters, but in many case this binding is not done.

In particular DDL statements compilation just blows in your face. A statement like:

Table('name', MetaData(), Column('c', Integer)).create(bind=engine)

Fails with:

File "~/pyathena/sqlalchemy_athena.py", line 313, in post_create_table
raw_connection = table.bind.raw_connection()
AttributeError: 'NoneType' object has no attribute 'raw_connection'

Table('name', MetaData(), Column('c', Integer)).create(bind=engine)

I guess the storage location of a table does fit in the table dialect kwargs:

Table('<name>', MetaData(), ..., awsathena_location='s3://...')

Initially I thought it could be useful, e.g. when building ETL pipelines that moves data around, to be able to bind a table with the actual storage location as late as possible (to reuse a Table object).

But generally other bits in the table definition needs to change too, like the name of the schema. So there is no real benefit and one has to create several Table objects anyway.

And the use of the connection is just an unfortunate hack... And this is an issue that should be addressed in PyAthena.

Thanks for your input, helps choosing the better fix for this.

Regards,

Nicolas

Mike Bayer

unread,

Jan 7, 2022, 11:01:12 AM1/7/22

to noreply-spamdigest via sqlalchemy

the idea of Table objects being linked to a database is something I thought was a good idea in 2006, which is why for the last 15 years there's been this notion of "bound metadata" that associates a specific engine with Table objects. however, probably by 2009 if not earlier, the limited and misleading nature of this idea was pretty apparent not the least of which because modern applications quite often need a certain Table object to apply to lots of different databases, different kinds of databases, etc, and then people were trying to hack around "bound metadata" not doing any of these things, even though by then "bound metadata" was fully optional. but since the pattern was there, people were confused, "bound metadata" was present, why aren't we using it then, why is it there, etc.

Here we are and when SQLAlchemy 2.0 betas are released hopefully in a few months you'll see the notion of linking a Table directly to anything regarding anything to do with locating a specific database is gone.

If your Table has things to do with it that you need to know when you execute queries, which are invariant no matter what the database URL is, you can put those things in the table.info dictionary.

Overall, URLs are meant to refer to "where is a particular database, get me in" and that's it. things that are *in* the database, tuning parameters, etc. that all goes in config. An app will usually have config that is more than just a single URL argument.

To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/eedd1454-e953-4a6b-9f51-b270984c65cfn%40googlegroups.com.

Mike Bayer

unread,

Jan 7, 2022, 11:11:49 AM1/7/22

to noreply-spamdigest via sqlalchemy

also if you really want your app to have just one URL with all kinds of config in it, then just use that. get the URL object using the make_url() API, pull out the configuration you need from URL.query, make a new URL from that one that is for your database, then connect. it's all public API that's there:

https://docs.sqlalchemy.org/en/14/core/engines.html?highlight=url#sqlalchemy.engine.URL

as far as specific dialects, these tend not to have too much in the way of dialect-specific options that are not part of what's needed to connect to the DBAPI. SQLAlchemy's dialects have a few which usually refers to some SQL generation behaviors, but most of these are themselves derived from inspecting the database itself, after the connection has been established.

To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/8ec61c87-b762-406e-9e3e-8ad779407bfb%40www.fastmail.com.

Jonathan Vanasco

unread,

Jan 7, 2022, 12:04:57 PM1/7/22

to sqlalchemy

> Ok. So if I understand you correctly, you want to keep query parameters solely for DBAPI drivers connection parameters and would hence not accept a PR that would implement something that changes that.

Just adding: the standard across programming languages and database products/projects is to utilize a database connection url as SQLAlchemy currently implements. departing from this concept could create too many compatibility issues in the future, and has the potential for breaking existing integrations today.

nicolas...@jobteaser.com

unread,

Jan 7, 2022, 3:20:51 PM1/7/22

to sqlalchemy

Don't get me wrong, I only have praises for the work currently being done on removing all the `bind`.

It was one of the things that had me confused with SQLAlchemy when I started working with it some years back and also caused me a few headaches.

And honestly after weighing the pros and cons, the advantages of having those parameters in the URL are gone (as mentioned earlier). It's just not the "right" place to put those.

Just two notes:

The need for the hack I am pointing out in PyAthena, arises from packages using SQLAlchemy (here Pandas, as pointed out by the dialect maintainer in the issue I opened, and the sligthly disappointing fix I provided). I haven't dug into pandas code to see if there are other, better ways to work the problem, but I have my doubts. I don't know about other dialects that could have resorted to the same kind of hacks. And I am surely not implying you should change course. It is just a warning about issues you might see coming in a not so far future, seeing that 2.0 is already in beta.

Detail, but in the docs (and in the code), the Table.bind attribute (for example) is not yet marked as deprecated, even in the latest 1.4. This is probably not encouraging people to get rid of its use. Though looking at the main branch, it is indeed gone. I could work on a PR to mark the remaining bind on 1.4 as deprecated if that would be helpful. I have been stuck with 1.3 because of lagging dependencies in my projects, so that could help me catch up with the coming changes.