PROPOSAL: whack query.select(), selectfirst(), selectone(), select_by(), selectfirst_by(), selectone_by(), get_by(), "auto-join" feature

68 views
Skip to first unread message

Michael Bayer

unread,
Jun 3, 2007, 11:35:10 AM6/3/07
to sqlal...@googlegroups.com

This is being discussed in a different thread and it was suggested I
put this up in its own brightly lit thread so that people can notice it.

now that the Query object in 0.3 has merged an enhanced set of
capabilities from the SelectResults extension, namely that it behaves
in a "generative" fashion whereby criteria and other modifiers are
applied via filter(), filter_by(), join(), order_by(), etc., its time
to create "one and preferably only one obvious way to do it" for
Query. the current documentation at "Basic Data Mapping" spends a
lot of effort explaining the dichotomy between "generative" and "non-
generative" methods, being able to say query.select() which returns
results immediately, or query.filter().list() which allows
generation. I spent a lot of time getting that doc to make some
semblance of sense but it was pretty clear that the API had become
muddied, with a better idea moving in and the older idea seeming kind
of redundant.

the details of the proposal would be:

- the methods select(), selectfirst(), selectone(), select_by(),
selectfirst_by(), selectone_by() and get_by() would be deprecated.
this means they will remain present on the Query object but the
documentation would be reorganized to talk only about filter(),
filter_by(), list(), scalar(), and a new method called one() which is
like scalar() but ensures that only one row was returned. By
deprecating the methods but retaining their exact behavior, the
majority of upgrade paths for those still using these methods is
painless. the methods would remain present until version 0.5.

- the ability to place ClauseElements at the start of filter_by()
would be removed.

- filter_by() would gain a new positional argument, which is
optional, which is either a string or list of strings indicating
property names, indicating a join path. when this argument is
present, a self-contained join among those mapped properties will be
generated, and the keyword criterion of the filter_by() would be
expressed against the endpoint of that join. by "self-contained", i
mean that the tables used in the join will be *aliased* anonymously,
so that different filter_by()s which specify join paths that overlap
can be used together. the existing behavior of being able to say
query.join(['x', 'y', 'z']).filter_by(**kwargs) remains as well (but
is different in that it doesnt create aliases).

- for execution of a completely literal text statement, as well as a
fully constructed select() object, a method query.from_statement()
will be added, and the older select_text() method (that i dont think
anyone knows about) will be deprecated like the others. executions
would look like query.from_statement("select * from table").scalar()
as well as query.from_statement(sometable.select()).list()

- the behavior of join() changes slightly, such that each call to join
() will reset the "joinpoint" of the resulting query back to the
beginning before creating the join. this means that join() will
always build its joins from the original queried class. join(None)
resets the joinpoint back to the beginning for subsequent filter()/
filter_by() calls.

- assignmapper would also keep all of its current methods with
regards to selecting/filtering. it seems like tools like Elixir are
going to move away from assignmapper anyway which is a good thing.

- the behavior of select_by(), get_by(), join(), filter_by(), and
others right now is such that when you name an attribute which is not
immediately present on the queried class, a search is performed
through all the mapped properties of the class, into its subclasses,
until it finds a property of that name, upon which it constructs a
join from the initial table to the target table, adding in a
comparison criterion for the key selected. this feature would remain
in the methods select_by() and the other deprecated _by() methods,
but would be removed from filter_by() as well as join().

- for those who have built MapperExtensions into select() and
select_by(), we'll put some MapperExtension hooks into list() which
is the execution point for queries.

- SelectResults remains deprecated until 0.5 when its also removed.

and now for the reasons:

- the select() methods are now redundant in light of the new
generative methods. all of the various flags which you can use with
select() are present generatively, i.e. order_by(), limit(),
with_lockmode(), filter(), select_from(), etc. By favoring the
generative methods we get a more flexible interface and only one way
to do it.

- by removing the word "select" from Query's API entirely, the
timeless confusion of "sql.select()? or query.select()?" goes away.
the identifier "select" at last loses its ambiguity. this is a big win.

- the "multi-dispatch" behavior of select() is also little known and
confusing. Did you know that select() can take not just a "where"
criterion, but also a fully constructed select() object ? neither
did anyone else. now we'll have from_statement() to express "ive
constructed a full statement and dont want any statement compilation
to occur".

- join() will reset the join point before joining because then you
can build multiple joins starting from the root of the query chain.
so query.join('a').join('b') builds two joins against the root table,
query.join(['a', 'b']) builds a join from root to 'a' to 'b'. the
meaning of "joinpoint" is that any subsequent filter_by() call will
construct its criterion against the endpoint of the join most
recently constructed. the new approach doesnt remove any
capabilities but adds the capability for multiple joins off the
root. a join of join(None) brings the joinpoint back to the beginning.

- the ability to place ClauseElements inside of select_by() stems
from the need to query based on keyword criterion *as well as* table
based criterion at the same time. but now that we have generative
behavior, you can just use individual filter() and filter_by()
methods to combine those. The "multiple-use" nature of various
methods, which is also very confusing (and hard to document), is not
necessary with the generative approach. So it will be removed from
filter_by(), where Im hoping people havent been putting
ClauseElements inside of too much. (the deprecated methods of course
keep it for backwards compatibility).

- the new "positional" argument of filter_by() is based on Gaetan's
idea, as well as some light that was shed on the IRC channel. it
looks like this:

session.query(User).filter_by(['orders', 'items', 'keywords'],
keyword_name='foo').filter_by(['orders', 'items'],
item_price=49.50).list()

above, we have created two joins from the "users" table, one of which
joins from 'orders' to 'items' to 'keywords', and the other from
'orders' to 'items'. the two sets of joins are constructed using
anonymous aliases, so that you get all users who have purchased an
item that has the 'foo' keyword, and have also purchased an item that
costs 49.50. the two paths across 'orders' and 'items' are isolated
from each other.

if aliases were not used above, the redundant 'orders' and 'items'
tables would converge into one FROM clause each in the final
statement since thats the behavior of select(), but then the
semantics of the query would then be "all users who purchased items
that cost 49.50 and also have the 'foo' keyword".

the above capability is currently available by constructing your own
Alias object and building join criterion manually. But it was
pointed out on IRC that the Django ORM constructs filtering queries
using aliases as above. for a lot of cases the approach saves a lot
of lines of code spent creating manual aliases.

- "auto-join": In the docs, the example of select_by() searching
through all mapped properties looking for a match is given in the
many-to-many page with something like session.query(Article).select_by
(keyword_name='foo'), where a join is constructed from "articles" to
"article_keywords", to "keywords" and then adding criterion
"keywords.keyword_name='foo'". It was known since day one, and
eventually pointed out by a few users in case it wasnt known (which
it was), that this feature does not have "deterministic" behavior.
if multiple attribute names of "keyword_name" exist, its going to
pick the first one that it finds, and all subsequent occurences are
forever ignored. if mapped properties happen to be organized
differently, then you get a different result - or, if you are
querying from a different point in the "join" chain, the sudden
presence of the same attribute name elsewhere suddenly changes the
result. the reason this feature was created, and why its stuck
around, is that before we added generative behavior, there was really
no way to join among properties without spelling out the join
explicitly using tables. the anemic methods join_via() and join_to
() were added to assist in constructing these joins but I knew from
the beginning i didnt like those. but with generative behavior, we
got join(), and now we even get the enhanced filter_by() idea. so
the "automatic join" idea isnt needed as there are now explicit ways
to construct a join as desired. again, hoping people arent relying
on this with filter_by() or join() already (the deprecated methods of
course keep it for backwards compatibility).

Cons:

upgrade path: the only pain in this upgrade path is if you are using
filter_by() with either ClauseElements or with "auto-join", using join
() with "auto-join", or using multiple join()'s strung together to
build a single join from the root. filter_by() and join() are fairly
new methods so i dont think their adoption is so deep as of yet, and
also the features to be removed aren't so intuitive with them anyway
so im betting they arent used much. any current use of select(),
select_by(), etc. would all continue to work identically.

more typing: it is not as succinct to say query.filter_by(id=10).one
(), instead of query.get_by(id=10). but removing seven redundant
methods which are not as flexible, making the purpose of each method
much more focused, and taking out five or six paragraphs out of the
docs thereby leaving room for deeper exposition of whats there, seems
to be worth it.


so......whattaya say ?


Michael Bayer

unread,
Jun 3, 2007, 11:53:22 AM6/3/07
to sqlal...@googlegroups.com
p.s. - the proposal is for version 0.4 of SQLAlchemy. 0.3 series
doesnt change.


Eric Ongerth

unread,
Jun 3, 2007, 1:45:47 PM6/3/07
to sqlalchemy
Well, that answers my next N anticipated questions on this list plus
essentially every unasked-yet one in my backlog. These consistency/
predictability-of-syntax/redundancy points cut to the core of every
issue I've had and/or every time I've had to hit the docs for more
than a brief reminder.

+1 oh yeah

On Jun 3, 8:35 am, Michael Bayer <mike...@zzzcomputing.com> wrote:
> {super duper SQLA proclamation}

Gaetan de Menten

unread,
Jun 3, 2007, 4:54:38 PM6/3/07
to sqlal...@googlegroups.com
On 6/3/07, Michael Bayer <mik...@zzzcomputing.com> wrote:

> - the methods select(), selectfirst(), selectone(), select_by(),
> selectfirst_by(), selectone_by() and get_by() would be deprecated.
> this means they will remain present on the Query object but the
> documentation would be reorganized to talk only about filter(),
> filter_by(), list(), scalar(), and a new method called one() which is
> like scalar() but ensures that only one row was returned.

I'm unsure about this but wouldn't "one" be redundant with "scalar" and [0]?
Couldn't we have only one of either "one" or "scalar" (and have the
check there) and otherwise let people use [0].

> so......whattaya say ?

Except from that little tweak, everything sounds great (you knew my
opinion already but I couldn't resist saying it again)!

--
Gaëtan de Menten
http://openhex.org

Mike Orr

unread,
Jun 3, 2007, 5:29:00 PM6/3/07
to sqlal...@googlegroups.com
On 6/3/07, Michael Bayer <mik...@zzzcomputing.com> wrote:
> - the methods select(), selectfirst(), selectone(), select_by(),
> selectfirst_by(), selectone_by() and get_by() would be deprecated.
> this means they will remain present on the Query object but the
> documentation would be reorganized to talk only about filter(),
> filter_by(), list(), scalar(), and a new method called one() which is
> like scalar() but ensures that only one row was returned.

+1

> - by removing the word "select" from Query's API entirely, the
> timeless confusion of "sql.select()? or query.select()?" goes away.
> the identifier "select" at last loses its ambiguity. this is a big win.

For this reason. SQLAlchemy has too many ways to do the same thing,
and too many ways to access the same object.

I would suggest renaming .list() to .all(). It seems funny having a
method with the same name and same behavior as list(query) -- I can
never decide which to use.

There is the concern about building dozens of intermediate query
objects that you immediately throw away, but that would be a good
target for optimization. For instance, I'm not sure if it clones a
query by rebuilding the criteria from scratch, or if it just copies
one list of immutable (sharable) objects. If you can guarantee that
the existing query won't be used anymore you can just reassign (share)
the criteria as single unit, but I guess you can't guarantee that.

> - assignmapper would also keep all of its current methods with
> regards to selecting/filtering. it seems like tools like Elixir are
> going to move away from assignmapper anyway which is a good thing.

It would still be worth a separate proposal to reform assignmapper;
i.e., delete the query methods .foo() that duplicate
MyClass.query().foo(). A lot of non-Exilir people use assignmapper,
and it's frustrating that .select() exists but .filter() doesn't, so
either add the missing methods or delete the redundant ones. On the
other hand, this can be handled in the documentation by emphasizing
.query() and deprecating the query methods.

--
Mike Orr <slugg...@gmail.com>

Michael Bayer

unread,
Jun 3, 2007, 10:33:05 PM6/3/07
to sqlal...@googlegroups.com

On Jun 3, 2007, at 4:54 PM, Gaetan de Menten wrote:

>
> On 6/3/07, Michael Bayer <mik...@zzzcomputing.com> wrote:
>
>> - the methods select(), selectfirst(), selectone(), select_by(),
>> selectfirst_by(), selectone_by() and get_by() would be deprecated.
>> this means they will remain present on the Query object but the
>> documentation would be reorganized to talk only about filter(),
>> filter_by(), list(), scalar(), and a new method called one() which is
>> like scalar() but ensures that only one row was returned.
>
> I'm unsure about this but wouldn't "one" be redundant with "scalar"
> and [0]?
> Couldn't we have only one of either "one" or "scalar" (and have the
> check there) and otherwise let people use [0].
>

one() raises an exception when theres not exactly one result, scalar
() does not. currently we have selectfirst() and selectone(), people
seem to like the dichotomy. [0] specifically adds "LIMIT 1 OFFSET 0"
to the query and i dont think scalar() or one() would do that (LIMIT
is specifically a problem with DB's like Oracle that dont directly
support it...more complex oracle queries cant handle it).

Michael Bayer

unread,
Jun 3, 2007, 10:36:04 PM6/3/07
to sqlal...@googlegroups.com

On Jun 3, 2007, at 5:29 PM, Mike Orr wrote:

> I would suggest renaming .list() to .all(). It seems funny having a
> method with the same name and same behavior as list(query) -- I can
> never decide which to use.

list() has been around awhile and is also derived from hibernate's
similar interface, not sure if this is a good time to change that.

>
> There is the concern about building dozens of intermediate query
> objects that you immediately throw away, but that would be a good
> target for optimization. For instance, I'm not sure if it clones a
> query by rebuilding the criteria from scratch, or if it just copies
> one list of immutable (sharable) objects. If you can guarantee that
> the existing query won't be used anymore you can just reassign (share)
> the criteria as single unit, but I guess you can't guarantee that.

im giong to look into optimizing the cloning. as ive said,
hibernate's criteria object behaves "generatively" but doesnt
actually copy the object; several folks here seem to want the
"generativeness". ive been considering sneaking in a flag/method
that would turn off the "generativeness" but id have to make sure you
dont notice it :).

> It would still be worth a separate proposal to reform assignmapper;
> i.e., delete the query methods .foo() that duplicate
> MyClass.query().foo(). A lot of non-Exilir people use assignmapper,
> and it's frustrating that .select() exists but .filter() doesn't, so
> either add the missing methods or delete the redundant ones.

filter() and filter_by() were added in 0.3.8.

Mike Orr

unread,
Jun 4, 2007, 12:01:33 AM6/4/07
to sqlal...@googlegroups.com
On 6/3/07, Michael Bayer <mik...@zzzcomputing.com> wrote:
> im giong to look into optimizing the cloning. as ive said,
> hibernate's criteria object behaves "generatively" but doesnt
> actually copy the object; several folks here seem to want the
> "generativeness". ive been considering sneaking in a flag/method
> that would turn off the "generativeness" but id have to make sure you
> dont notice it :).

Does that mean returning the results immediately, or modifying the
query in place and returning it?

Maybe modifying the query in place and returning it isn't such a bad
idea after all, considering that this is kind of a special case,
having to call so many methods to build up a query. Most other OO
systems don't require so many method calls to build up a meaningful
object, but most other OO systems are not SQL queries either. And if
it provides a way to get away from "q = q.filter(...)" in favor of
"q.filter(...)", that would be an advantage. It gets tiring assigning
the same variable to itself again and again when assignment isn't
really the nature of what's going on.

> > It would still be worth a separate proposal to reform assignmapper;
> > i.e., delete the query methods .foo() that duplicate
> > MyClass.query().foo(). A lot of non-Exilir people use assignmapper,
> > and it's frustrating that .select() exists but .filter() doesn't, so
> > either add the missing methods or delete the redundant ones.
>
> filter() and filter_by() were added in 0.3.8.

Hooray. I've been following the trunk and reading the CHANGELOG but I
didn't notice that feature.

--
Mike Orr <slugg...@gmail.com>

Neil Blakey-Milner

unread,
Jun 4, 2007, 3:27:54 AM6/4/07
to sqlal...@googlegroups.com
On 6/3/07, Michael Bayer <mik...@zzzcomputing.com> wrote:
> - filter_by() would gain a new positional argument, which is
> optional, which is either a string or list of strings indicating
> property names, indicating a join path. when this argument is
> present, a self-contained join among those mapped properties will be
> generated, and the keyword criterion of the filter_by() would be
> expressed against the endpoint of that join. by "self-contained", i
> mean that the tables used in the join will be *aliased* anonymously,
> so that different filter_by()s which specify join paths that overlap
> can be used together. the existing behavior of being able to say
> query.join(['x', 'y', 'z']).filter_by(**kwargs) remains as well (but
> is different in that it doesnt create aliases).

...

> - the new "positional" argument of filter_by() is based on Gaetan's
> idea, as well as some light that was shed on the IRC channel. it
> looks like this:
>
> session.query(User).filter_by(['orders', 'items', 'keywords'],
> keyword_name='foo').filter_by(['orders', 'items'],
> item_price=49.50).list()
>
> above, we have created two joins from the "users" table, one of which
> joins from 'orders' to 'items' to 'keywords', and the other from
> 'orders' to 'items'. the two sets of joins are constructed using
> anonymous aliases, so that you get all users who have purchased an
> item that has the 'foo' keyword, and have also purchased an item that
> costs 49.50. the two paths across 'orders' and 'items' are isolated
> from each other.
>
> if aliases were not used above, the redundant 'orders' and 'items'
> tables would converge into one FROM clause each in the final
> statement since thats the behavior of select(), but then the
> semantics of the query would then be "all users who purchased items
> that cost 49.50 and also have the 'foo' keyword".
>
> the above capability is currently available by constructing your own
> Alias object and building join criterion manually. But it was
> pointed out on IRC that the Django ORM constructs filtering queries
> using aliases as above. for a lot of cases the approach saves a lot
> of lines of code spent creating manual aliases.

Hi there,

I like where things are going, but I think I should clarify how the
Django ORM does things to my knowledge.

The filter method in the Django ORM only supports keyword args a la
filter_by. Every query is based off a mapped object (like Post).
Each filter can access other tables' columns via the relations, using
a construct like:

.filter(jobtitle__content__txt__icontains = keyword,
domain__str__content__txt__icontains = keyword)

Both jobtitle and domain__str (the domain relation and then the "str"
relation) are of a mapped object Strids type, which maps to the strids
table. This has a relation content to a table with contains the
actual text to use for the current locale.

(__icontains is there to perform a LIKE '%keyword%', since there's no
ClauseElement equivalent. I'm not suggesting supporting that.)

The joined tables are aliased as the path from the base object. In
this case, the "jobtitle" join is:

INNER JOIN "strids" AS "positionprofiles__jobtitle" ON
"positionprofiles"."jobtitleid" = "positionprofiles__jobtitle"."id"

The "domain__str__content" join is:

INNER JOIN "dimensions" AS "positionprofiles__domain" ON
"positionprofiles"."domainid" = "positionprofiles__domain"."id" INNER
JOIN "strids" AS "positionprofiles__domain__str" ON
"positionprofiles__domain"."strid" =
"positionprofiles__domain__str"."id" INNER JOIN "content" AS
"positionprofiles__domain__str__content" ON
"positionprofiles__domain__str"."id" =
"positionprofiles__domain__str__content"."strid"

ie, it joins on domain, then str, then content, giving each one an
alias based on the path from the mapped object.

I'm a little worried that using anonymous aliases will make some
queries impossible. For example, if you want to list all purchases of
product type B made by a manufacturer with a particular discount
agreement, since one would require a filter_by(['product', 'store'],
...) and the other will require a filter_by(['product', 'type'], ...).
In the Django ORM, since the path to 'product' is the same, it's the
same join.

Not being able to rejoin on the same path may make certain kinds of
query building impossible. Then again, using anonymous aliases for
each kind can solve different types of problems that can't be done
without them.

I think the principle of least astonishment would be to not have a new
join for the exact same path. But, given enough documentation, one
can avoid astonishment.

I hope this made the situation clearer, and not more confusing.

Neil
--
Neil Blakey-Milner
http://nxsy.org/
n...@nxsy.org

svilen

unread,
Jun 4, 2007, 6:07:22 AM6/4/07
to sqlal...@googlegroups.com
all seems ok.
and ORM will become looking less SQLish (as .select being the most
SQLish word IMO)

Paul Kippes

unread,
Jun 4, 2007, 9:44:56 AM6/4/07
to sqlal...@googlegroups.com
These sound like a good progression of the library.

Would there be a way to turn on warnings if one were to use a
deprecated interface? Or if the programmer wants to be even more
strict, raise an exception? This wouldn't be the default, but rather
an available option.

Michael Bayer

unread,
Jun 4, 2007, 9:56:42 AM6/4/07
to sqlal...@googlegroups.com

On Jun 3, 2007, at 10:33 PM, Michael Bayer wrote:

> one() raises an exception when theres not exactly one result, scalar
> () does not. currently we have selectfirst() and selectone(), people
> seem to like the dichotomy. [0] specifically adds "LIMIT 1 OFFSET 0"
> to the query and i dont think scalar() or one() would do that (LIMIT
> is specifically a problem with DB's like Oracle that dont directly
> support it...more complex oracle queries cant handle it).
>

scratch that partially, selectfirst() and selectone() *do* add the
limit. however in the case of selectone(), it adds a limit of
*two*. if a second row is present-> exception.


Rick Morrison

unread,
Jun 4, 2007, 9:56:53 AM6/4/07
to sqlal...@googlegroups.com
The use of scalar() here seems out of place with both the common CS usage of the word (e.g. scalar == single-valued), and the use of scalar() in the SQL layer. Single row results in the ORM are rows, not a single datatype. It's another potential point of confusion, like the ORM .select() is/was.

I would say drop scalar() in the ORM namespace, and for single-row results, use

.first() -- returns first row
.one() -- returns first row, raise exception if more than one result

Michael Bayer

unread,
Jun 4, 2007, 10:30:42 AM6/4/07
to sqlal...@googlegroups.com

On Jun 4, 2007, at 9:56 AM, Rick Morrison wrote:

> The use of scalar() here seems out of place with both the common CS
> usage of the word (e.g. scalar == single-valued), and the use of
> scalar() in the SQL layer. Single row results in the ORM are rows,
> not a single datatype. It's another potential point of confusion,
> like the ORM .select() is/was.
>
> I would say drop scalar() in the ORM namespace, and for single-row
> results, use
>
> .first() -- returns first row
> .one() -- returns first row, raise exception if more than one result
>
>

then we sort of have to rename list() to be all().

its all fine with me if everyone out there doesnt mind changing their
code by the time 0.5 comes out....

Rick Morrison

unread,
Jun 4, 2007, 10:47:36 AM6/4/07
to sqlal...@googlegroups.com
Well, I don't really see the connection between .scalar() and .list(), other than they are kind of antonyms, but FWIW I always thought that .list() was an odd duck anyway, seeing as how the arguably more Pythonic list(query) works fine.

Rick Morrison

unread,
Jun 4, 2007, 11:03:05 AM6/4/07
to sqlal...@googlegroups.com
Well, I don't really see the connection between .scalar() and .list(), other than they are kind of antonyms, but FWIW I always thought that .list() was an odd duck anyway, seeing as how the arguably more Pythonic list(query) works fine.

(expanding on this a bit)

More generally, it's the iterative nature of a query object that makes a lot of the .first() and so on more semantic sugar than anything else anyway. I'll routinely write things like:

    for row in query:
        process(row)
or
    results = [process(r) for r in query]

instead of:
    results = query.list()
    for row in results:
        process(row)

I would even be OK with losing all the .select() and .first() and etc. completely and deal with queries using a library of function STL-like first-order functions like

   first(query)
   one(query)
   etc.

so color me radical, I guess.

Rick

 

Michael Bayer

unread,
Jun 4, 2007, 11:06:41 AM6/4/07
to sqlal...@googlegroups.com

On Jun 4, 2007, at 10:47 AM, Rick Morrison wrote:

> Well, I don't really see the connection between .scalar() and .list
> (), other than they are kind of antonyms, but FWIW I always thought
> that .list() was an odd duck anyway, seeing as how the arguably
> more Pythonic list(query) works fine.

I really like having a function on the end to give me all the
results. having to go all the way back to the beginning of the line
and put list() around the whole thing is not always convenient.

Michael Bayer

unread,
Jun 4, 2007, 11:12:54 AM6/4/07
to sqlal...@googlegroups.com

On Jun 4, 2007, at 3:27 AM, Neil Blakey-Milner wrote:

> The joined tables are aliased as the path from the base object. In
> this case, the "jobtitle" join is:
>
> INNER JOIN "strids" AS "positionprofiles__jobtitle" ON
> "positionprofiles"."jobtitleid" = "positionprofiles__jobtitle"."id"
>
> The "domain__str__content" join is:
>
> INNER JOIN "dimensions" AS "positionprofiles__domain" ON
> "positionprofiles"."domainid" = "positionprofiles__domain"."id" INNER
> JOIN "strids" AS "positionprofiles__domain__str" ON
> "positionprofiles__domain"."strid" =
> "positionprofiles__domain__str"."id" INNER JOIN "content" AS
> "positionprofiles__domain__str__content" ON
> "positionprofiles__domain__str"."id" =
> "positionprofiles__domain__str__content"."strid"
>
> ie, it joins on domain, then str, then content, giving each one an
> alias based on the path from the mapped object.

those two joins are exclusive to each other (i.e. "strids" has a
different alias in the second join than the first). this is the
equivalent to what im proposing except i didnt have a deterministic
(well, human-readable) naming convention in mind for tha alias names
themselves. that naming scheme django is using cant really work
anyway since it will very quickly go over the character limit of
databases like oracle and firebird.

>
> I'm a little worried that using anonymous aliases will make some
> queries impossible. For example, if you want to list all purchases of
> product type B made by a manufacturer with a particular discount
> agreement, since one would require a filter_by(['product', 'store'],
> ...) and the other will require a filter_by(['product', 'type'], ...).
> In the Django ORM, since the path to 'product' is the same, it's the
> same join.

i think you need to illustrate a clearer example since I dont
understand the meaning of "positionprofiles" or "strids" above in
relation to the names used in filter() (and what happened to Post?).
from what I can see, the alias names are based on where the join is
*going*, not where it came from so its not clear to me how this
example would be accomplished.

anyway, the filter on product/store and product/type example you just
mention presents a third scenario, which I wouldnt consider to fall
under the filter_by([attributes]) functionality, it would fall under
the regular join() call which creates joins *without* aliasing. but,
its not supported by the current attribute-joining functionality,
because it asks to generate two joins from a common joinpoint that is
*not* the root. the two ideas with regards to join() are the current
way (join() moves the joinpoint which then never recedes) and the
newer way (join() starts from the beginning each time).

I just tried out an example of "rejoining from the middle" with the
0.4 code, with a mapping of User->Order->Keywords, Items

create_session().query(User).join(['orders', 'items']).filter_by
(id=5).join(['orders','keywords']).filter_by(id=7).list()

and I got this:

SELECT users.id AS users_id \nFROM users JOIN orders ON users.id =
orders.user_id JOIN items ON orders.id = items.order_id JOIN orders
ON users.id = orders.user_id JOIN keywords ON orders.id =
keywords.order_id \nWHERE ((items.id = ?) AND (keywords.id = ?))
ORDER BY users.oid

which is wrong, it joined on 'orders' twice (throws SQL error). but
it seems like if join() were made a little smarter to see that
'orders' was already in there, which is pretty much required here
since throwing an error is the least desireable option, youd get:

SELECT users.id AS users_id \nFROM users JOIN orders ON users.id =
orders.user_id JOIN items ON orders.id = items.order_id JOIN keywords
ON orders.id = keywords.order_id \nWHERE ((items.id = ?) AND
(keywords.id = ?)) ORDER BY users.oid

where you can see the JOIN keywords is joining against the previous
'orders' table. I think that would work here and would be the
approach for that scenario.

svilen

unread,
Jun 4, 2007, 11:18:21 AM6/4/07
to sqlal...@googlegroups.com
On Monday 04 June 2007 18:03:05 Rick Morrison wrote:
> > Well, I don't really see the connection between .scalar() and
> > .list(), other than they are kind of antonyms, but FWIW I always
> > thought that .list() was an odd duck anyway, seeing as how the
> > arguably more Pythonic list(query) works fine.
>
> (expanding on this a bit)
>
> More generally, it's the iterative nature of a query object that
> makes a lot of the .first() and so on more semantic sugar than
> anything else anyway.
exactly. "All()" is much better than "list()", in the means of list
being just the current way/implementation of using ordered sequences.
i guess it could return another container as well (like relation()).

even if all these are gone:


for row in query:
process(row)

break

will give u .first()

and

for row in query:
process(row)

break
else: raise someError

will give u .one()

svil

Neil Blakey-Milner

unread,
Jun 4, 2007, 12:09:10 PM6/4/07
to sqlal...@googlegroups.com
On 6/4/07, Michael Bayer <mik...@zzzcomputing.com> wrote:

Great! That's basically what I was trying to say. Sorry that I can't
convey it in a more useful form.

What do you think of me putting up the full pathological-case code
from Django ORM and SQLAlchemy on the SQLAlchemy wiki, and maybe see
if there are improvements to be found for that as well (since it uses
ILIKE and not equality and a bunch of AND and OR statements, the new
filter_by improvements can't be used for it). Just need to whittle it
down to the bare minimum in terms of tables and code first.

A bunch of real-life queries and the best code to achieve them would
be invaluable on the wiki. Then, when changes to the API are
discussed, the examples can serve as a baseline for the types of
things that are being made easier or harder.

Michael Bayer

unread,
Jun 4, 2007, 12:36:17 PM6/4/07
to sqlal...@googlegroups.com

On Jun 4, 2007, at 12:09 PM, Neil Blakey-Milner wrote:

>
> What do you think of me putting up the full pathological-case code
> from Django ORM and SQLAlchemy on the SQLAlchemy wiki, and maybe see
> if there are improvements to be found for that as well (since it uses
> ILIKE and not equality and a bunch of AND and OR statements, the new
> filter_by improvements can't be used for it).

hmmmm...i dont like ILIKE (which isnt even present i every database)
by default. i think its best special operators just remain as
clauseelements, i.e.

filter(Class.c.attr.like('foo'))

on the subject of case sensitivity, there is an aging but ambitious
ticket to address it in a generic way, #487.

> A bunch of real-life queries and the best code to achieve them would
> be invaluable on the wiki. Then, when changes to the API are
> discussed, the examples can serve as a baseline for the types of
> things that are being made easier or harder.

yes that would be awesome. let me know what i can do to help.


Mike Orr

unread,
Jun 4, 2007, 3:11:15 PM6/4/07
to sqlal...@googlegroups.com
On 6/4/07, Rick Morrison <rickmo...@gmail.com> wrote:
> The use of scalar() here seems out of place with both the common CS usage of
> the word (e.g. scalar == single-valued), and the use of scalar() in the SQL
> layer. Single row results in the ORM are rows, not
> a single datatype. It's another potential point of
> confusion, like the ORM .select() is/was.
>
> I would say drop scalar() in the ORM namespace, and for
> single-row results, use
>
> .first() -- returns first row
> .one() -- returns first row, raise exception if more than one result

What if there are zero rows? Return None or raise an exception? I
find the former useful enough, but I imagine some people prefer the
latter.

--
Mike Orr <slugg...@gmail.com>

Rick Morrison

unread,
Jun 4, 2007, 3:33:08 PM6/4/07
to sqlal...@googlegroups.com
That's one of the reasons I prefer to treat Query() like an iterator and roll my own first-order functions like first() & etc. -- see my other message regarding this.

Michael Bayer

unread,
Jun 4, 2007, 3:36:47 PM6/4/07
to sqlalchemy

On Jun 4, 3:11 pm, "Mike Orr" <sluggos...@gmail.com> wrote:
>
> > .first() -- returns first row
> > .one() -- returns first row, raise exception if more than one result
>
> What if there are zero rows? Return None or raise an exception? I
> find the former useful enough, but I imagine some people prefer the
> latter.

one() raises an exception if not exactly one result.

Gaetan de Menten

unread,
Jun 5, 2007, 3:41:01 AM6/5/07
to sqlal...@googlegroups.com
On 6/4/07, Michael Bayer <mik...@zzzcomputing.com> wrote:
>
>

I know, that's why I suggested to scrap the one without the check
(named first() now). If people like it, that's ok with me, it's just
that when we'll deprecate the non-generative aggregate methods (which
is something I think should be done), the code for the "first" method
will boil down to "return self[0]"...

Jonathan Ellis

unread,
Jun 12, 2007, 11:37:46 AM6/12/07
to sqlal...@googlegroups.com
Cleaning out my inbox...

FWIW I'm +1 on removing the old-style methods, +1 on .first instead of
.scalar, +1 on adding .one, and +0 on renaming .list to .all.

Did you make a decision for 0.4 Mike?

-J

Michael Bayer

unread,
Jun 12, 2007, 12:33:10 PM6/12/07
to sqlalchemy
we have, in fact, made a tip of the hat to SAT analogy questions
("selecting everything is to all() as selecting just the first row is
to: a. scalar() b. first() c. list()[0]").

everyone on this thread go check out

http://svn.sqlalchemy.org/sqlalchemy/branches/rel_0_4

and give it a whirl.

code to look at:

examples:
test/orm/query.py

all of the old deprecated select etc. methods expressed in terms of
the new functionality, at the bottom of:

sqlalchemy/orm/query.py

the next big 0.4 step is to add the generative functionality to
select() + refactor select() to be more malleable. then i think the
interface will be nailed down and we can start thinking about moving
over to it. theres currently 29 open tickets on the 0.4 milestone,
but i think a lot of those are going to get bumped to point releases
underneath 0.4 or later. id like to try case-insensitive operations,
maybe two-phase commit hooks, and we have some collection enhancements
coming in. other than those im pretty antsy to get 0.4 to be
standardized because i really want to rip down those 0.3 datamapping
docs from the site (i.e. as the default "documentation"), start
getting all the tutorials updated, etc.

Gaetan de Menten

unread,
Jun 12, 2007, 12:45:05 PM6/12/07
to sqlal...@googlegroups.com
On 6/12/07, Michael Bayer <zzz...@gmail.com> wrote:
>
> we have, in fact, made a tip of the hat to SAT analogy questions
> ("selecting everything is to all() as selecting just the first row is
> to: a. scalar() b. first() c. list()[0]").

I've already said it earlier but since you didn't comment on that,
maybe you didn't see my remark: shouldn't the non-generative aggregate
methods be deprecated? I feek it would be much more coherent this way.

Jonathan Ellis

unread,
Jun 12, 2007, 1:51:52 PM6/12/07
to sqlal...@googlegroups.com
On 6/12/07, Michael Bayer <zzz...@gmail.com> wrote:
> the next big 0.4 step is to add the generative functionality to
> select() + refactor select() to be more malleable. then i think the
> interface will be nailed down and we can start thinking about moving
> over to it. theres currently 29 open tickets on the 0.4 milestone,
> but i think a lot of those are going to get bumped to point releases
> underneath 0.4 or later. id like to try case-insensitive operations,
> maybe two-phase commit hooks, and we have some collection enhancements
> coming in. other than those im pretty antsy to get 0.4 to be
> standardized because i really want to rip down those 0.3 datamapping
> docs from the site (i.e. as the default "documentation"), start
> getting all the tutorials updated, etc.

You could just make 0.4 a "py3k" kind of thing and just rip out the
cruft, then add features later.

(At least that was how py3k was originally described, it seems to be
suffering from feature creep atm. :)

Michael Bayer

unread,
Jun 12, 2007, 2:26:34 PM6/12/07
to sqlalchemy

On Jun 12, 12:45 pm, "Gaetan de Menten" <gdemen...@gmail.com> wrote:
>
> I've already said it earlier but since you didn't comment on that,
> maybe you didn't see my remark: shouldn't the non-generative aggregate
> methods be deprecated? I feek it would be much more coherent this way.> --

yes id be fine with that.

Reply all
Reply to author
Forward
0 new messages