When building a query, option to only join once

16 views
Skip to first unread message

Michael Elsdörfer

unread,
Nov 26, 2021, 11:50:53 PM11/26/21
to sqlalchemy
I keep running into this issue where I have a complex set of filter parameters, some of which need a join, and it would be nice to have an ergonomic way to say: join this table, but not if there already is a join.

I found this old post on the subject: https://groups.google.com/g/sqlalchemy/c/ooFYsED4CI8

I was just curious if there have been further developments since then; in particular, using new-style querying in 1.4, I guess there isn't a way to add a custom, de-duplicating  `join()` method onto the `select()` object?

Michael

Mike Bayer

unread,
Nov 27, 2021, 10:48:39 AM11/27/21
to noreply-spamdigest via sqlalchemy
well as the email noted, it referred to issue https://github.com/sqlalchemy/sqlalchemy/issues/3225 , which was an ambitious proposal for a richly featured inspection API on the Query object.   which would mean you'd still need to write your own logic that pokes around in the query, decides what FROM clauses are needed, and then adds them.   it then also specified that it would build in nicely with a way to automate the transformations, using the now-deprecated before_compile() hook.

that issue didn't move because of a sense that Query, before_compile() etc. didn't really feel "right" enough to start building new towers on top of it, and sure enough three or four years later I finally came around that it was time to "get rid" of Query (it's not going away, it's just legacy) as it always bothered me that select() and Query() seemed 95% the same but then were completely different at the same time, and the introduction of caching (another idea that was dreamed about for many years) killed off "before_compile()" as a viable hook.

So today, the story with this use case is that we have a very good hook for altering ORM queries which is the do_orm_execute() event: https://docs.sqlalchemy.org/en/14/orm/events.html?highlight=do_orm_execute#sqlalchemy.orm.SessionEvents.do_orm_execute

and then to suit what was actually the much more requested feature for "query inspection", to be able to add WHERE criteria, instead of doing an "inspection" system we made a new ORM option https://docs.sqlalchemy.org/en/14/orm/query.html?highlight=with_loader_criteria#sqlalchemy.orm.with_loader_criteria that is much more "declarative" than a system to poke around the query and do things manually.

so all of that above is because of the link to that google group message, which is ultimately about WHERE criteria triggering a series of JOINs..

I still come up with the same gut response which is, "track them in a set".    the join() / join_from() methods itself accept a combinations of left side, right side, and ON clause, in different ways.   To support some mode "only add if not already present" means that all of that criteria has to be matched up and that there would be a whole new series of error modes, like what if I said select(Foo).join(Bar, Foo.id == Bar.id)  and then .join_from(Foo, Bar, Foo.other_id == Bar.id), we throw an error?  do we have to compare the "Foo.id == Bar.id" to "Foo.other_id == Bar.id" and determine "nope, those aren't the same, this is ambiguous", as well as see "Foo.id == Bar.id" and know that this is the same as "Bar.id == Foo.id"?  Now we need a more elaborate operator heuristic / hashing system else people will complain about edge cases like this.   This feels like we are bolting on a userland case into the API that would have just enough complexity and weird edge cases that we'd keep getting a lot of bug reports and potential "oops, we did the API wrong" types of situations which would draw our focus away from what I'm starting to call "big, mainstream" problems like getting the whole library fully type annotated, improving performance, and simplifying the ORM.   Put another way, the use case still makes me nervous that there's something short-sighted about my thinking on this that isn't going to hold up well, the same hunch I had with #3225 that something wasn't going to hold up (which proved to be correct - weeks / months of effort were saved that I didn't do #3225 and would have had to reverse / revise all of that for 1.4 / 2.0.).

I know it's hard to defend "gut feelings" in a programming forum but for this use case, "track them in a set" seems very simple and keeps the library from having to take on a problem that's very simple in the "we just need this" kind of sense but might be very thorny in the "has to work in all possible scenarios forever" sense.      I would add that we do support having example scripts on the wiki and in the examples/ folder of the distribution / documentation, this seems like it could be a good wiki example to start with under "ORM Querying" (https://github.com/sqlalchemy/sqlalchemy/wiki/UsageRecipes) .

I'm playing around with some code here for this kind of thing and it's looking 95% the same as what I had in my response at https://groups.google.com/g/sqlalchemy/c/ooFYsED4CI8 .    "dedupe joins" is not a simple problem in the general case because joins have an open ended ON clause.  An end-user recipe that assumes a single ON clause OTOH is straightforward.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages