Quoting in tables clause

Russell Keith-Magee

unread,

Nov 29, 2005, 10:00:32 PM11/29/05

to Django users

Hi all,

When you use a "tables" kwarg in a database query method (get_list,
etc), you provide a list of strings which correspond to the database
names of the tables you want to have joined. You can then specify a
"where" clause to narrow the join.

When the provided table name is used in the generated SQL query, it is
quoted. This preserves case, etc, in the table name, and all is good
with the world..

However, this quoting behaviour does prevent some other tricks. If the
table name was unquoted, you could nominate a subselect query as a join
point, rather than just naming a table. To give an artificial example
using the Django tutorial database:

polls.get_list(
tables=[
"(SELECT * FROM polls_choices
WHERE polls_choices.votes > 10) AS popular_choices"
]
where=[
"polls_polls.id=popular_choices.poll_id"
]
)

As Django stands, the quoting process puts the entire subselect inside
quotes, which turns into a syntax error in SQL. If the quoting was
removed from 'tables' strings, this would be a valid SQL query
operating over an inner join with a subselect.

I can see why the quoting is beneficial, as it removes leakage of SQL
quoting syntax from Python code space. However, you already have to
nominate SQL names ('polls_choices', not just 'choices') in order to
use the tables and where clause, so there is already partial leakage.

1) Is the quoting behaviour a feature or a bug? (I suspect feature, but
I thought I'd check)

2) Is there room for a smarter quoting scheme? - e.g., don't quote a
table name if the first character is a '('

3) Is there room to add a distinct 'subselect' kwarg to queries? This
kwarg would be identical to 'tables', but wouldn't quote its contents

4) Is there some other way to construct a subselect join that I am
missing (including, ideally, a left outer join)?

Many thanks,
Russ Magee

hugo

unread,

Nov 30, 2005, 5:44:31 AM11/30/05

to Django users

>1) Is the quoting behaviour a feature or a bug? (I suspect feature, but
>I thought I'd check)

Yep, it is. The reason are different handlings of reserved words. Your
column names might conflict with stuff defined by the SQL database and
by quoting the names this conflict doesn't arise.

>2) Is there room for a smarter quoting scheme? - e.g., don't quote a
>table name if the first character is a '('

I think that's the way to go. We already have some quoting-prevention
for other situations like 'select' kwargs (where you put a select in as
a field). So the 'table' kwarg should just be handled the same way. The
'select' kwarg is just run through quote_only_if_word (a l ocal
function) - the function would have to be propagated up one scope and
could be used in those places where quote_word would break stuff
because it's argument isn't allways a field name. This is all happening
in django/core/meta/__init__.py

>3) Is there room to add a distinct 'subselect' kwarg to queries? This
>kwarg would be identical to 'tables', but wouldn't quote its contents

I don't think that would be needed, as 'table' and 'select' both
provide subselects in the right places.

>4) Is there some other way to construct a subselect join that I am
>missing (including, ideally, a left outer join)?

It might - depending on your problem - work with the 'select' kwarg.

bye, Georg

Russell Keith-Magee

unread,

Nov 30, 2005, 7:11:08 AM11/30/05

to Django users

Ah - thanks. I had a poke around in django/core/meta/__init__.py, but
managed to miss the definition of quote_only_if_word. Your reply was
just enough to point me in the right direction.

I've put together a patch that uses smart quoting for the tables
clause. It's submitted as ticket #967.

However, in looking at the code for function_get_sql_clause() and
working up the patch, something occurred to me. This quoting fix could
go a lot further.

The purpose of the function_get_sql_clause() is to turn a kwargs
dictionary into an SQL query. At the moment, the assumption is that all
the kwargs keys (select, tables, etc) are tied to lists or dictionaries
that contain strings. If the string is a table/column name, it is
quoted; otherwise, it has a space in it, it gets used verbatim where a
single table/column name would be valid - so if your string is a block
of SQL, you can get better results. The purpose of quote_only_if_word
is to protect these potentially useful SQL statements.

It occurred to me that you should be able to modify
quote_only_if_word() to use function_get_sql_clause() recursively. If
you provide a string to quote_only_if_word(), then insert it (either
quoted or verbatim). However, if you find a tuple, then use its values
as the values for opts and kwargs in a recursive call to
function_get_sql_clause(). The recursive call returns a string, which
can get inserted as required. This would allow you to build almost
arbitrarily complex nested SQL queries using Django DB syntax.

For example, the example for the 'select' keyword in the database model
documentation would no longer need the inner SQL clause - it would
become something like :

polls.get_list(
select={
'choice_count': {
db_table='choices',
db_columns=['COUNT(*)']
kwargs={
where=['poll_id=polls.id']
}
}
}
)

I'm not completely happy with the syntax for
db_table/db_columns/kwargs. I'm open to suggestions.