Differentiate ANSIIdentifierPreparer.format_column select clause from where clause??

3 views
Skip to first unread message

Troy

unread,
Feb 16, 2007, 3:18:11 AM2/16/07
to sqlalchemy
Is there a way in the format_column method of ANSIIdentifierPreparer
to determine if the column is part of the select clause, where clause,
order clause, etc?

What I'm attempting to do is override the default postgres dialect to
format string columns as 'lower(colname)' only when the column is not
being used as part of the select clause.

Also, if interested, I can submit a patch to url.py that imports the
dialect from the full module path if the dialect is not found. This
would allow someone to override a dialect, or create a proprietary new
one and not have to worry about the dialect residing in alchemy's
databases module, i.e. :

pg_db = create_engine('myapp.database.postgres://scott:tiger@localhost:
5432/mydatabase')

Thanks,

Troy

Michael Bayer

unread,
Feb 16, 2007, 4:43:11 PM2/16/07
to sqlal...@googlegroups.com

On Feb 16, 2007, at 3:18 AM, Troy wrote:

>
> Is there a way in the format_column method of ANSIIdentifierPreparer
> to determine if the column is part of the select clause, where clause,
> order clause, etc?
>
> What I'm attempting to do is override the default postgres dialect to
> format string columns as 'lower(colname)' only when the column is not
> being used as part of the select clause.

i suppose this is not terribly hard since it would just mean some
extra argument passed along. of course the question is *why* would
you need this. if its some kind of case-insensitive resultproxy
issue, id rather do something on that end.

> Also, if interested, I can submit a patch to url.py that imports the
> dialect from the full module path if the dialect is not found. This
> would allow someone to override a dialect, or create a proprietary new
> one and not have to worry about the dialect residing in alchemy's
> databases module, i.e. :
>
> pg_db = create_engine('myapp.database.postgres://
> scott:tiger@localhost:
> 5432/mydatabase')

so the rule would be, look for an SA dialect first, else try to
import module ? would we want to do this in a more indirect way,
i.e. "register_dialect("postgres", mydialect)" so that one can
install custom dialects across the board for a given type ?


Troy

unread,
Feb 18, 2007, 6:37:10 PM2/18/07
to sqlalchemy
> i suppose this is not terribly hard since it would just mean some
> extra argument passed along. of course the question is *why* would
> you need this. if its some kind of case-insensitive resultproxy
> issue, id rather do something on that end.

It's not about case in the resultproxy, it's about case-insensitive
for server-side compare, such as in the where clause and when ordering
results.

> so the rule would be, look for an SA dialect first, else try to
> import module ? would we want to do this in a more indirect way,
> i.e. "register_dialect("postgres", mydialect)" so that one can
> install custom dialects across the board for a given type ?

I like the convention, but it could cause some havoc in same-process/
multi-app environments such as CherryPy and a twisted/python version
of Mongrel (the http server gaining popularity as the defacto Rails
deployment method) I'm getting ready to release.

Thanks,

Troy

P.S. This is my second response, I'm using the google groups web UI
and it appears that my replies are not being posted (I checked a few
other threads that I have replied to here, and at the Mako group).
Not sure why, it says from me to "sqlalchemy", but I apologize if your
getting this as an e-mail instead of a post to the group.

Michael Bayer

unread,
Feb 18, 2007, 7:08:39 PM2/18/07
to sqlal...@googlegroups.com

On Feb 18, 2007, at 6:37 PM, Troy wrote:

>
>> i suppose this is not terribly hard since it would just mean some
>> extra argument passed along. of course the question is *why* would
>> you need this. if its some kind of case-insensitive resultproxy
>> issue, id rather do something on that end.
>
> It's not about case in the resultproxy, it's about case-insensitive
> for server-side compare, such as in the where clause and when ordering
> results.
>

comparisons....like literal text injected into the SQL? why not use
bind parameters ? there are cases where literal text should not be
quoted, i.e. if you hardcoded "where x = 'SomeString'"..but its
possible that the quoting is too eager right now due to some recent
issues. in any case need an example what you mean.

>> so the rule would be, look for an SA dialect first, else try to
>> import module ? would we want to do this in a more indirect way,
>> i.e. "register_dialect("postgres", mydialect)" so that one can
>> install custom dialects across the board for a given type ?
>
> I like the convention, but it could cause some havoc in same-process/
> multi-app environments such as CherryPy and a twisted/python version
> of Mongrel (the http server gaining popularity as the defacto Rails
> deployment method) I'm getting ready to release.
>

ok

> P.S. This is my second response, I'm using the google groups web UI
> and it appears that my replies are not being posted (I checked a few
> other threads that I have replied to here, and at the Mako group).
> Not sure why, it says from me to "sqlalchemy", but I apologize if your
> getting this as an e-mail instead of a post to the group.
>

google has a spam filter but theres no messages in it.

Troy

unread,
Feb 19, 2007, 4:52:53 AM2/19/07
to sqlalchemy
> > It's not about case in the resultproxy, it's about case-insensitive
> > for server-side compare, such as in the where clause and when ordering
> > results.
>
> comparisons....like literal text injected into the SQL? why not use
> bind parameters ? there are cases where literal text should not be
> quoted, i.e. if you hardcoded "where x = 'SomeString'"..but its
> possible that the quoting is too eager right now due to some recent
> issues. in any case need an example what you mean.

It's not just the literal text or the bind param. It is how the
server compares character based data. In MySQL, MS-SQL, Sybase --
case does not matter. In Postgres, Oracle and DB2 it does. DB2 and
Oracle (since version 10 I think) have some server-side settings to
help, but Postgres does not.

Assuming I have a table named "people":

fname lname
========= ============
Troy Kruthoff
albert einstein

and the query: "select fname from people order by fname asc"

in MySQL:
-> albert
-> Troy

in Postgres:
-> Troy
-> albert

and the query: "select fname from people where fname='troy'"

in MySQL:
-> Troy

in Postgres:
-> [no records found/returned because "Troy"!="troy"]

So, we need to tell postgres: "select fname from people order by
lower(fname) asc" and "select fname from people where
lower(fname)='troy'"

Notice I am not needing to lower (or upper) the column as part of the
select list, because I want the data to return to the app as it exists
in the db server.

Does this help? Maybe I can buy you a beer at PyCon and we can talk
it through?

Thanks,

Troy

Michael Bayer

unread,
Feb 19, 2007, 12:55:46 PM2/19/07
to sqlal...@googlegroups.com

On Feb 19, 2007, at 4:52 AM, Troy wrote:
> It's not just the literal text or the bind param. It is how the
> server compares character based data. In MySQL, MS-SQL, Sybase --
> case does not matter. In Postgres, Oracle and DB2 it does. DB2 and
> Oracle (since version 10 I think) have some server-side settings to
> help, but Postgres does not.
>

OK totally different issue. MySQL does have a case-sensitivity
setting using COLLATE. not sure what MS-SQL has and we dont yet have
sybase support. we've had people report the whole lower() issue in
the past and im not sure that should be "automatic" within SA. You
can just explicitly say lower() across the board for case-insensitive
comparisons. if SA implemented some operator for this (which could
be doable, such as table.c.compare_insensitive(foo)), it would still
have to use lower() for every dialect since in particular for MySQL
we have no idea what the COLLATE rules are on the given table.


Troy

unread,
Feb 19, 2007, 2:32:02 PM2/19/07
to sqlalchemy
> OK totally different issue. MySQL does have a case-sensitivity
> setting using COLLATE. not sure what MS-SQL has and we dont yet have
> sybase support. we've had people report the whole lower() issue in
> the past and im not sure that should be "automatic" within SA. You
> can just explicitly say lower() across the board for case-insensitive
> comparisons. if SA implemented some operator for this (which could
> be doable, such as table.c.compare_insensitive(foo)), it would still
> have to use lower() for every dialect since in particular for MySQL
> we have no idea what the COLLATE rules are on the given table.

Even though MySQL allows the setting of COLLATE, it does not support
functional indexes, so if your code explicitly calls lower you
technically now have code that will work for both MySQL and Postgres,
but MySQL is going to take a big performance hit and perform table
scans regardless of any created indexes. Realistically, I can't see
anyone setting MySQL collate rules to case-sensitive without the
support of functional indexes, which MySQL does not have.

If Postgres allowed case-insensitive collation all would be good. But
it does not, and I've never seen anyone make MySQL case-sensitive and
deal with the headaches (nor should they, I think case-sensitivity in
a database server is a throw-back), for this reason I think it should
somehow be approachable from the dialect to deal with how different db
servers handle collation support.

In simple terms, if the Postgres dialect supported
compare_insensitive=True|False|[upper|lower]? then apps written in
sqlalchemy easily support Postgres and MySQL with the same code in out-
of-the-box configurations. But even if it didn't, if there was a way
to override the default postgres dialect I'd be a happy camper.
Infact, that is exactly what I have done. I added a dialect called
"lowergres", but I'm stuck because I can not seem to find the
appropriate hook to alter the column to a func.lower when the column
is not part of the selected column list. format_column looks to me
like the right place to do it without converting a column to a
function, but the column object in that function has no context as to
where it lies in the sql statement.

I'm curious as to others experiences with writing an app that supports
both Postgres and MySQL with sqlalchemy, because if someone else is
doing this then I must be missing something, or maybe not. At first,
our unit tests all passed, then when we added real world data with
mixed case, tests started to fail on everything doing sorts and
where's on character data.

How about a Pepsi (at PyCon)?

Troy

Michael Bayer

unread,
Feb 19, 2007, 3:17:38 PM2/19/07
to sqlalchemy

On Feb 19, 2:32 pm, "Troy" <tkruth...@gmail.com> wrote:
> Even though MySQL allows the setting of COLLATE, it does not support
> functional indexes, so if your code explicitly calls lower you
> technically now have code that will work for both MySQL and Postgres,
> but MySQL is going to take a big performance hit and perform table
> scans regardless of any created indexes. Realistically, I can't see
> anyone setting MySQL collate rules to case-sensitive without the
> support of functional indexes, which MySQL does not have.

right...so some explicitness is required, somewhere (since SA cant
just put lower() across the board).

>
> In simple terms, if the Postgres dialect supported
> compare_insensitive=True|False|[upper|lower]?

on a Column-by-Column or even per-operation context is probably more
useful.

> sqlalchemy easily support Postgres and MySQL with the same code in out-
> of-the-box configurations. But even if it didn't, if there was a way
> to override the default postgres dialect I'd be a happy camper.
> Infact, that is exactly what I have done. I added a dialect called
> "lowergres", but I'm stuck because I can not seem to find the
> appropriate hook to alter the column to a func.lower when the column
> is not part of the selected column list.
format_column looks to me
> like the right place to do it without converting a column to a
> function, but the column object in that function has no context as to
> where it lies in the sql statement.

for this approach, id advise setting "state" on the compiler when you
are processing the column clause of a SELECT (or dealing with insert/
update column lists too), vs. when you are processing WHERE, ORDER BY,
etc. that flag can be used to determine where youre traversing. the
compiler knows the context since its the one generating the statement.

a more accurate way to do is to wrap the Column itself, as it goes
into a comparison operation or ORDER BY, in a new construct such as
"CaseInsensitive"...heres some pseudo-ish code based on the __eq__()
method you see in sqlalchemy/sql.py _CompareMixin (assume "self" is a
Column):

def __eq__(self, other):
return _BooleanExpression(CaseInsensitive(self), other, '==')

Better yet its configurable on Column:

def __eq__(self, other):
if self.case_insensitive_compare:
return _BooleanExpression(CaseInsensitive(self), other, '==')
else:
return _BooleanExpression(self, other, '==')

CaseInsensitive looks a lot like _Function and is just:

class CaseInsensitive(_CalculatedClause):
def __init__(self, target):
self.target = target
def accept_visitor(self, visitor):
self.target.accept_visitor(visitor)
visitor.visit_case_insensitive(self)

ansicompiler provides the string representation of the underlying
target with no modification:

def visit_case_insensitive(self, object):
self.strings[object] = self.strings[object.target]

postgres compiler with case_insensitive provides it as:

def visit_case_insensitive(self, object):
self.strings[object] = "lower(%s)" % self.strings[object.target]

other dialects can have whatever flags to turn on/off the "lower()"
wrapping as well.

what we're really talking about here is a func.lower() that has
special meaning to the compiler, i.e. that it should be conditionally
applied based on dialect flags. i think the flag on Column to have
the wrapper applied might be pretty slick.

>
> I'm curious as to others experiences with writing an app that supports
> both Postgres and MySQL with sqlalchemy, because if someone else is
> doing this then I must be missing something, or maybe not. At first,
> our unit tests all passed, then when we added real world data with
> mixed case, tests started to fail on everything doing sorts and
> where's on character data.

yeah i dont think anyone has gotten too far with this issue, also ive
no idea what the giant universe of Hibernate users do either (i think
there just arent db-platform-neutral J2EE apps).

>
> How about a Pepsi (at PyCon)?

sure thing !

Troy

unread,
Feb 21, 2007, 10:25:44 AM2/21/07
to sqlalchemy
Mike,

Based on your excellent feedback and trying a couple different
approaches I came up with the following. I have a diff file, but I'm
new to contributing to open source so I'm not sure the best method to
get this to you (so I'll copy paste below). The overview:

* added visit_case_insensitive and overrode in PGCompiler to use
"lower"
* sub-classed ClauseList with _OrderByClause and _GroupByClause.

This was done to add some context as to where in the statement the
clauses belong, because we do not want to mess with the case of the
results being returned, only when the character data is compared at
the server. I need to do some more work to peg HAVING, but this works
well otherwise.

* added _CaseInsensitiveCompare like you mentioned.
* modified _compare_self instead of __eq__, and added case_compare
param to the String type

I think this is better then added case_compare at the column level,
because this only affects character data. Dates, Int's, etc do not
have the same collation issues and the issues with Date that do exist
are already addressed.

Let me know if you need me to run in a different direction with this,
or what I need to do differently to get this considered to be included
in sqlalchemy.

Troy


====== Diff File ======

Index: sqlalchemy/sql.py
===================================================================
--- sqlalchemy/sql.py (revision 2346)
+++ sqlalchemy/sql.py (working copy)
@@ -351,6 +351,7 @@
def visit_clauselist(self, list):pass
def visit_calculatedclause(self, calcclause):pass
def visit_function(self, func):pass
+ def visit_case_insensitive(self, func):pass
def visit_cast(self, cast):pass
def visit_label(self, label):pass
def visit_typeclause(self, typeclause):pass
@@ -947,6 +948,17 @@
else:
return False

+class _OrderByClause(ClauseList):
+ def append(self, clause):
+ if _is_literal(clause):
+ clause = _TextClause(str(clause))
+ elif isinstance(clause,ColumnElement) and not
getattr(clause.type,'compare_case',True):
+ clause = _CaseInsensitiveCompare(clause)
+ self.clauses.append(clause)
+
+class _GroupByClause(_OrderByClause):
+ pass
+
class _CompoundClause(ClauseList):
"""represents a list of clauses joined by an operator, such as
AND or OR.
extends ClauseList to add the operator as well as a from_objects
accessor to
@@ -1037,6 +1049,16 @@
c.accept_visitor(visitor)
visitor.visit_function(self)

+class _CaseInsensitiveCompare(ColumnElement):
+ def __init__(self, clause):
+ self.clause = clause
+ self.type = clause.type
+ def accept_visitor(self, visitor):
+ self.clause.accept_visitor(visitor)
+ visitor.visit_case_insensitive(self)
+ def _get_from_objects(self):
+ return self.clause._get_from_objects()
+
class _Cast(ColumnElement):
def __init__(self, clause, totype, **kwargs):
if not hasattr(clause, 'label'):
@@ -1364,18 +1386,18 @@
return True
def order_by(self, *clauses):
if len(clauses) == 1 and clauses[0] is None:
- self.order_by_clause = ClauseList()
+ self.order_by_clause = _OrderByClause()
elif getattr(self, 'order_by_clause', None):
- self.order_by_clause =
ClauseList(*(list(self.order_by_clause.clauses) + list(clauses)))
+ self.order_by_clause =
_OrderByClause(*(list(self.order_by_clause.clauses) + list(clauses)))
else:
- self.order_by_clause = ClauseList(*clauses)
+ self.order_by_clause = _OrderByClause(*clauses)
def group_by(self, *clauses):
if len(clauses) == 1 and clauses[0] is None:
- self.group_by_clause = ClauseList()
+ self.group_by_clause = _GroupByClause()
elif getattr(self, 'group_by_clause', None):
- self.group_by_clause = ClauseList(*(list(clauses)
+list(self.group_by_clause.clauses)))
+ self.group_by_clause = _GroupByClause(*(list(clauses)
+list(self.group_by_clause.clauses)))
else:
- self.group_by_clause = ClauseList(*clauses)
+ self.group_by_clause = _GroupByClause(*clauses)
def select(self, whereclauses = None, **params):
return select([self], whereclauses, **params)
def _get_from_objects(self):
Index: sqlalchemy/databases/postgres.py
===================================================================
--- sqlalchemy/databases/postgres.py (revision 2346)
+++ sqlalchemy/databases/postgres.py (working copy)
@@ -463,7 +463,7 @@

table.append_constraint(ForeignKeyConstraint(constrained_columns,
refspec, conname))

class PGCompiler(ansisql.ANSICompiler):
-
+
def visit_insert_column(self, column, parameters):
# all column primary key inserts must be explicitly present
if column.primary_key:
@@ -502,7 +502,11 @@
if isinstance(binary.type, sqltypes.String) and
binary.operator == '+':
return '||'
else:
- return ansisql.ANSICompiler.binary_operator_string(self,
binary)
+ return ansisql.ANSICompiler.binary_operator_string(self,
binary)
+
+ def visit_case_insensitive(self, object):
+ self.strings[object] = "lower(%s)" %
self.strings[object.clause]
+

class PGSchemaGenerator(ansisql.ANSISchemaGenerator):

Index: sqlalchemy/schema.py
===================================================================
--- sqlalchemy/schema.py (revision 2346)
+++ sqlalchemy/schema.py (working copy)
@@ -497,6 +497,12 @@
for constraint in self.constraints:
constraint.accept_schema_visitor(visitor,
traverse=True)
visitor.visit_column(self)
+
+ def _compare_self(self):
+ if isinstance(self.type,types.String) and not
self.type.compare_case:
+ return sql._CaseInsensitiveCompare(self)
+ else:
+ return self


class ForeignKey(SchemaItem):
Index: sqlalchemy/types.py
===================================================================
--- sqlalchemy/types.py (revision 2346)
+++ sqlalchemy/types.py (working copy)
@@ -169,11 +169,18 @@
return super(String, cls).__new__(cls, *args, **kwargs)
else:
return super(String, TEXT).__new__(TEXT, *args, **kwargs)
- def __init__(self, length = None):
+ def __init__(self, length = None, compare_case = False):
+ """Initializes new string instance
+
+ compare_case - determines if case affects comparisons,
sorting, etc
+ """
self.length = length
+ self.compare_case = compare_case
def adapt(self, impltype):
return impltype(length=self.length)
def convert_bind_param(self, value, dialect):
+ if not self.compare_case and hasattr(value,'lower'):
+ value = value.lower()
if not dialect.convert_unicode or value is None or not
isinstance(value, unicode):
return value
else:
@@ -200,7 +207,7 @@
return value.decode(dialect.encoding)
else:
return value
-
+
class Integer(TypeEngine):
"""integer datatype"""
def get_dbapi_type(self, dbapi):
Index: sqlalchemy/ansisql.py
===================================================================
--- sqlalchemy/ansisql.py (revision 2346)
+++ sqlalchemy/ansisql.py (working copy)
@@ -225,6 +225,10 @@
else:
self.strings[column] =
self.preparer.format_column_with_table(column)

+ def visit_case_insensitive(self, object):
+ #override this in databases that default to case insesitive
collation
+ self.strings[object] =
self.strings[object.clause]
+
def visit_fromclause(self, fromclause):
self.froms[fromclause] = fromclause.name

@@ -257,7 +261,7 @@
self.strings[compound] = "(" + s + ")"
else:
self.strings[compound] = s
-
+
def visit_clauselist(self, list):
if list.parens:
self.strings[list] = "(" + string.join([s for s in
[self.get_str(c) for c in list.clauses] if s is not None], ', ') + ")"

Michael Bayer

unread,
Feb 21, 2007, 8:24:27 PM2/21/07
to sqlal...@googlegroups.com
I put up ticket #487 for this. theres a whole batch of sql-related
tickets im going to try to line up for a marathon session at some
future date in the next few weeks.
Reply all
Reply to author
Forward
0 new messages