use "USING ( join_column [, ...] )" instead of "ON join_condition"

44 views
Skip to first unread message

Manlio Perillo

unread,
Nov 7, 2012, 6:18:05 PM11/7/12
to sqlal...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi.

When selecting against joined tables, SQLAlchemy always use the
"ON join_condition" form.

The problem with this is that when the joined columns have the same
name, they are included 2 times in the result set.


Example (PostgreSQL):

CREATE TABLE foo (
id TEXT PRIMARY KEY,
x INTEGER
);

CREATE TABLE bar (
a INTEGER PRIMARY KEY,
id TEXT REFERENCES (foo.id) NOT NULL
);

INSERT INTO foo VALUES ('ID', 777);
INSERT INTO bar VALUES (0, 'ID');

SELECT * FROM foo JOIN bar ON foo.id = bar.id;
id | x | a | id
- ----+-----+---+----
ID | 777 | 0 | ID


If we use "USING ( join_column [, ...] )", instead, only one of each
pair of equivalent columns will be included in the join output, not both.


SELECT * FROM foo JOIN bar USING (id);
id | x | a
- ----+-----+---
ID | 777 | 0


Is it possible to change SQLAlchemy to use the second form when the
joined columns have the same name?

In alternative, this can be requested explicitly, as example:

class Join(FromClause):
__visit_name__ = 'join'

def __init__(self, left, right, onclause=None, using=None,
isouter=False):
...

When specified ``using`' take precedence over `onclause`.



Thanks Manlio Perillo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlCa7CwACgkQscQJ24LbaUQeyQCeIqMJyemQWfe+OKoMU0wV7Z+y
+0oAn3dVqCIA9QeEtysbBHMTMDp4CpS2
=ey+X
-----END PGP SIGNATURE-----

Michael Bayer

unread,
Nov 7, 2012, 6:32:31 PM11/7/12
to sqlal...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Nov 7, 2012, at 6:18 PM, Manlio Perillo wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi.
>
> When selecting against joined tables, SQLAlchemy always use the
> "ON join_condition" form.
>
> The problem with this is that when the joined columns have the same
> name, they are included 2 times in the result set.

Only if you select all of those columns. There's no need to resort to side effects of unusual SQL syntaxes here, if you're already using the expression language to generate joins, then you should be specifying those columns you care about selecting when you call select(). Instead of passing "*" to select(), pass the Table object you care about. The names of the columns should be the least important thing here - the expression language deals in Table and Column objects first and foremost.

> In alternative, this can be requested explicitly, as example:
>
> class Join(FromClause):
> __visit_name__ = 'join'
>
> def __init__(self, left, right, onclause=None, using=None,
> isouter=False):
> ...
>
> When specified ``using`' take precedence over `onclause`.

you can always build a subclass of Join and use @compiles to get at whatever you want, but I think the approach is a hacky way at getting at some other behavior through loose associations.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJQmu+PAAoJEDMCOcHE2v7hkRMH/3bl3ML9azuF2dh7bN15YAFa
03yRaIRrNehDXlSOv/IFFAHhdkoS6CBMNO0c3ovpv+L3G8WoplN+6vfgzKAaZUo3
zIcpPYOOMYlDEp1A60E5HGx8n1W3NUq5X5ku7/nFMPqVJe8p+wgnimJE2A5WLjPy
MQq46Wf39SrFO8aivso7jZk4GjPLneFiM0CgFsobvIgB6NxYPshB9WH8qkPEY6DX
5soMqTKTYcFSLIsv4flnu2hTbOvjI39hDE42i/DrBuue2vt7w5TbUds2o4YCjhDG
TevUaGOtWWsQhsAVAr131UxbiHXChAIEYK5SomsLei+1GN5YMCP6pd0Sc7tqTBg=
=VHJY
-----END PGP SIGNATURE-----

Manlio Perillo

unread,
Nov 8, 2012, 6:58:28 AM11/8/12
to sqlal...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Il 08/11/2012 00:32, Michael Bayer ha scritto:
>
> On Nov 7, 2012, at 6:18 PM, Manlio Perillo wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>
>> Hi.
>
>> When selecting against joined tables, SQLAlchemy always use the
>> "ON join_condition" form.
>
>> The problem with this is that when the joined columns have the same
>> name, they are included 2 times in the result set.
>
> Only if you select all of those columns.
> There's no need to resort to side effects of unusual SQL syntaxes
> here, if you're already using the expression language to generate
> joins, then you should be specifying those columns you care
> about selecting when you call select().

I care about selecting all columns, but without having duplicates.
And this is currently not possible, unless I use `use_labels` or manual
labels.

> Instead of passing "*" to select(), pass the Table object you care
> about. The names of the columns should be the least important
> thing here - the expression language deals in Table and Column
> objects first and foremost.
>

Maybe I'm missing something:
http://paste.org/56806

when executing the output is (SQLAlchemy 0.7.7):
2012-11-08 12:49:29,274 INFO sqlalchemy.engine.base.Engine SELECT
foo.id, foo.x, bar.a, bar.id
FROM foo JOIN bar ON foo.id = bar.id
2012-11-08 12:49:29,280 INFO sqlalchemy.engine.base.Engine ()
(u'ID', 777, 0, u'ID')
[u'id', u'x', u'a', u'id']
Traceback (most recent call last):
File "ambiguous.py", line 41, in <module>
print r[0]['ID']
sqlalchemy.exc.InvalidRequestError: Ambiguous column name 'ID' in result
set! try 'use_labels' option on select statement.

How can I avoid this exception?
use_labels is not an option, for me, and manually settings labels in
order to avoid ambiguos columns is a bit of nuisance (I'm actually doing
this).



Thanks Manlio
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlCbnmQACgkQscQJ24LbaUQcWACfYezfKV3NXDnCFO4VXVBCXvwn
Tb4AmwdLa/0XtHxaSWDzl/YamInjKOHF
=mLiB
-----END PGP SIGNATURE-----

Michael Bayer

unread,
Nov 8, 2012, 9:59:46 AM11/8/12
to sqlal...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Nov 8, 2012, at 6:58 AM, Manlio Perillo wrote:

>
> Maybe I'm missing something:
> http://paste.org/56806

use the reduce_columns() convenience method:

query = foo.join(bar).select().reduce_columns()
r = conn.execute(query).fetchall()

this is a replacement for the older version of this, "fold_equivalents":

query = foo.join(bar).select(fold_equivalents=True)

reduce_columns() is a much improved version: http://docs.sqlalchemy.org/en/latest/core/expression_api.html#sqlalchemy.sql.expression.Select.reduce_columns

also, you need to not use UPPERCASE names when targeting result columns unless the real column name is not also uppercase - in 0.8 (which you'll need to get reduce_columns()), "case insensitivity" is now off by default:

print r[0]
print r[0].keys()
print r[0]['id']


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJQm8jjAAoJEDMCOcHE2v7hC9kIAKcG8DeuOcCiMIZ53k+Ep/2i
b81Rqs+DrN01LkewGJR18vks7/+prgJxQ1EDqah26BOBp39BEJrexb90GCSU+Xni
vZdE1F1AOKjn/vBxz2uKuCOFhfaKRBi/mQlob4jE9M7CPb1rhHaTg3psKLtAjgPh
AWJU1vZOnBRkk6rPqkvh17nIg74iUPjY0EBBck1kwWsxxR1K3/XrFxGMQp13dHoz
sOJJF1NrxlXWJU7kzJ5dWUSHJTbrci3K4wPwEwWqJs/kcFq5E5tvkANsEe+KXS3j
RmDZD2qUnf8BcRjgL16x0zCLi5LU+YDpn6jvzeFivfo7js9Q7fDomIjD+STh61g=
=rV9z
-----END PGP SIGNATURE-----
Reply all
Reply to author
Forward
0 new messages