ticket 2392, use column keys in ResultProxy and as default column labels

32 views
Skip to first unread message

Michael Bayer

unread,
Feb 4, 2012, 5:08:38 PM2/4/12
to sqlalche...@googlegroups.com
I added a patch up today to #2392 and the plan is to put this in 0.8.

Behavioral change: the "key" you set on a Column is now available as the name in a result row, takes precedence over the "name", and is automatically labeled on the column if it differs from the name. This is to make the behavior of "key" consistent with how it behaves on table.c, as well as how an ORM result tuple behaves when you say query(Class.a, Class.b). In particular it maintains compatibility between ORM tuples and resultproxy rows when alternate column keys are in use, which is the biggest win here.

To make this work completely, there's some significant behavioral changes.

Given a table:

keyed = Table("keyed", metadata,
Column("a", Integer, key="p"),
Column("b", Integer, key="q")
)

Currently, select([keyed]) gives you this:

SELECT keyed.a, keyed.b FROM keyed

with the patch, you get this:

SELECT keyed.a AS p, keyed.b AS q

this so that "p", "q" are the "primary" labels for those columns in the result. Exposing the "key" within the SQL statement is something we've not done before, and I discuss some reservations to this below.

Both "key" and "name" are still available in a regular result set. This because we'd otherwise have a major backwards incompatibility:

# these are all accessible
row.a
row.p
row.b
row.q

The "primary" label changing from "a, b" to "p, q" affects what conditions cause a name collision - and this is backwards incompatible.

Above, if we are given another table "keyed2":

keyed2 = Table("keyed2", metadata,
Column("b", Integer),
Column("q", Integer)
)

then the statement:

select([keyed, keyed2])

Currently, you can call upon row.q without issue, but you'll get an ambiguous name error if you call "b" from the result set:

row.b -> ambiguous name, both keyed.b and keyed2.b exist
row.q -> returns keyed2.q

With the patch, the behavior is entirely reversed. row.b now refers to keyed2.b, row.q is now ambiguous:

row.b -> returns keyed2.b
row.q -> ambiguous name, both keyed.q and keyed2.q exist

That's a pretty major switch ! But I think it makes more sense. Do people agree ?

The labeling change, very significantly, impacts subquery name targeting. Given a SELECT construct:

s = select([keyed])

Both with and without the patch, the name in the namespace is "q", not "a":

s.c.a -> AttributeError
s.c.p -> OK

but the "name" of the column is now "q", and not "a":

>>> # in 0.7
>>> s.c.p.name
'a'

>>> # with the patch
>>> s.c.p.name
'p'

Note that this means s.c.p has key="p" and name="p", whereas keyed.c.p has key="p" and name="a". This because those names represent the "exported" names from the subquery:

>>> print s.select()
SELECT p, q FROM (SELECT keyed.a AS p, keyed.b AS q FROM keyed)

Above, we can see the subquery exports the names "p", and "q" only. So it's natural that the columns on the outside are fully named "p" and "q", the "a" and "b" names are gone.

So that above is another somewhat major change, though I don't know that real world code would normally be affected by it.

But a SELECT with apply_labels(), this change would definitely break on existing code. A select construct as:

s = select([keyed]).apply_labels()

Without the patch, we get to "a" via the name "keyed_a":

s.c.keyed_a

With the patch, now it's "keyed_p" - "keyed_a" is gone:

s.c.keyed_p

That's a really big change, totally not backwards compatible. Then again, I don't think the usage of a select() with apply_labels() being used in a subquery context like that is super common, and I think again that the "key" is consistently used between a select() without apply_labels() and with is more consistent- that is, s.c.p/s.c.keyed_p, versus s.c.p/s.c.keyed_a.

Just using the "key" as a label itself is a big deal, as a SELECT that normally doesn't label as anything now is rendering labels. It's also exposing the "key" name, which to date has never been exposed at the SQL level, to the database as a label. In the vast majority of cases I think this is fine, unless they are on a platform with broken unicode support and they're doing PEP-3131 style non-ascii identifiers for the "key". Thoughts on this particular contingency would be appreciated. I'd consider flattening out the identifiers but reading pep3131 I see NFKC is the normalization they're using for comparisons. We already target columns based on lowercasing the name so perhaps further flattening by normalizing out non-ASCII characters altogether wouldn't raise any issues...though again would be backwards incompatible for someone using unicode identifiers with differentiations at the non-ASCII level right now (seems like the narrowest of all edge cases to me though...).

The promotion of "column.key" to be used as the label seems necessary to me, as we use these names to access a column unambiguously in a result set (see the example below). If we remained using column.name and tried to "guess" how to get at the .key, then we can't get clean behavior for name collisions as illustrated below. Unless, we totally switched to using positions to target columns in a result set, which is ticket #918. #918 would be a very major change to how statements are handled, would have a whole new set of unhandled edge cases and affect everyone, so I'd rather not get into it. It always seems like a great idea for 30 seconds until I start thinking about it again.

As it stands, applications that don't make use of "key" are not at all affected by this change. ORM-only applications that do use "key" are also not affected. Even ORM queries that query across multiple tables with name conflicts and keys aren't affected since ORM queries always qualify against table names. It's applications that use select() constructs directly, in conjunction with alternate "key" names, where conflicts might be present, or where apply_labels() is used and the resulting table-qualified names are accessed directly from select.c.

Thoughts ?

Also I'm considering if 0.8 might not be called 1.0, depending on how things go - or perhaps it would make a quick jump from an initial 0.8 series to a 1.0.


Waldemar Osuch

unread,
Feb 5, 2012, 3:16:36 PM2/5/12
to sqlalche...@googlegroups.com
+1 on changing the key <-> label behaviour.

As mainly ORM user I was surprised to discover that "key" is not directly accessible in result proxy.
One less thing to explain to my co-workers :-)

Waldemar

Michael Bayer

unread,
Feb 5, 2012, 5:06:20 PM2/5/12
to sqlalche...@googlegroups.com
I've split the ticket into #2392 and #2397.      #2392 is the part that adds .key to the list of possible names you can use in the result set, without making them precedent, and is resolved for 0.7.  It does so in such a way that the "key" will never collide with the primary name for the column, so is backwards compatible.     It's very likely that with this change, nobody will ever notice the issue again since real name collisions are very rare.   Most of the tests that apply to the change are in place with @fails marked for those which require the precedence change.

#2397 is for 0.8, if at all, and specifies the switch in precedence between .key and .name, including that the "key" is used in the column labeling and for the <tablename>_<colname> attribute off of a select().    Fixes I made today in #2396 should provide that we can drop in a label mangler in a way that is transparent, so in theory we could convert unicode-character strings to be escaped or hashed if we want to keep labels in SQL as ASCII.


--
You received this message because you are subscribed to the Google Groups "sqlalchemy-devel" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy-devel/-/jaLlaTay2iAJ.
To post to this group, send email to sqlalche...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy-dev...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy-devel?hl=en.

Reply all
Reply to author
Forward
0 new messages