After some thought on what I need in a Python ORM (multiple primary keys, complex joins, case statements etc.), and after having built these libraries for other un-named languages, I decided to start at the bottom. What seems to plague many ORM systems is the syntactic confusion and string-manipulation required to build the SQL Statements. If you want to do a Left Outer Join, support nested functions, and a nested conditional clause, you'd be building more of a string library than an ORM. Let's not even mention syntactical differences between databases, data types, and such.
My solution is sqlstring. A single-purpose library: to create SQL statement objects. These objects (such as sqlstring.Select), represent complex SQL Statements, but as Python objects. The benefit is that you can, at run-time, "build" the statement pythonically, without getting bogged down in String Manipulation. The theory is that once in use, things that were complex (string magic) become simpler, and allow the program to worry about higher-level issues.
An Example:
>>> import sqlstring >>> model = sqlstring.TableFactory() >>> print model.person
SELECT person.* FROM [person] person
>>> person_smith = model.person \
** (model.person.first_name == "Smith")
>>> print person_smith
SELECT person.* FROM [person] person WHERE person.first_name = 'Smith'
< < model.address ** (\ model.address.party_id == model.person.party_id) SELECT party_id party_id, first_name first_name, address.* FROM [person] person LEFT OUTER JOIN [address] address ON address.party_id = person.party_id WHERE person.first_name = 'Smith'
Things can get much more interesting than this, including nested sub-selects (anywhere), conditional groups (and/or groups nested using & and |) and even CASE statements.
Some of this stuff has been around for a while (using "magic" objects to build where clauses, etc.). But I'm trying to take it all the way--to a legit Select statement.
While still in the early stages, it does work with a great many sql statements, as seen in the test suite. Currently supported are CASE statements, Nested conditional clauses, nested queries and most join types. At this point, I'm interested in getting feedback from the community on several fronts:
1. The Operator Overload model. I've chosen to overload Python's operators to give a short-hand syntax to most of the things you'd want to do with a select statement. The rest are accessable via methods. Currently ** is the "where" operator, // is the "in" operator, % the "like" operator and ^ aliases columns. Other overloads are as you'd expect- + / - * == all result in Expression Objects that dish out the right SQL string. The question is, is the "leap" in syntax to confusing? Is there a cleaner way to do this? (Functions for example) 2. How to best add further sql function support? Adding magic callable objects to columns came to mind, but this has it's own set of issues. I'm leaning towards a magic object in the sqlstring module. For example:
sqlstring.F.substring(0, 4, person.first_name)
would result in: substring(0, 4, person.first_name). the F object could be put in the local scope for short-hand. 3. I'm undecided on how best to handle database specific overwrites. I want this to be as easy as possible. I'm thinking about subclassing Expressions with a naming scheme on the Sub-Class (such as CaseExpression_oracle). Then the __init__ factory could dish out the right version of the object based on the the requestor. This brings up lots of questions, such as how to support multiple types of databases at the same time.
Eventually the library should include all of the basic SQL Statements, including UPDATE, INSERT and CREATE statements. This is mostly for completeness, though. SELECT statements tend to be the most complex.
On Oct 19, 2005, at 9:18 PM, gru...@gmail.com wrote:
<snip>
> My solution is sqlstring. A single-purpose library: to create SQL > statement objects. These objects (such as sqlstring.Select), represent > complex SQL Statements, but as Python objects.
<snip>
First of all, I like this idea. I've been thinking about doing something similar but am stuck with SQLObject for the moment. The ability to construct complex expressions in pieces and then mix and match them would be killer.
I think some operator overloading, especially the obvious cases like ==, is cleaner than using only functions because it lets you order things normally. But some of the operator choices are non-intuitive. Personally, I would make something like 'alias' a function or class, rather than overloading XOR. Not sure about ** for where.
Using // for 'in' looks really weird, too. It's too bad you can't overload Python's 'in' operator. (Can you? It seems to be hard-coded to iterate through an iterable and look for the value, rather than calling a private method like some other builtins do.)
Also, you have to be really careful of operator precedence.
Have you ever used Pyparsing? Its use of a combination of classes and operator overloading works pretty well.
For SQL function calls, don't you also want to support stored procedures? In that case, you don't want pre-set magic functions so much as a function that takes a string parameter for a function name and then a list of function parameters.
> < < model.address ** (\ > model.address.party_id == model.person.party_id) > SELECT > party_id party_id, > first_name first_name, > address.* > FROM > [person] person > LEFT OUTER JOIN [address] address ON > address.party_id = person.party_id > WHERE person.first_name = 'Smith'
See, this is where I'm not sure about operator precedence. If << binds tighter than **, it works, because the << operator would make a 'join' object and then the ** could be interpreted as an 'on' clause. But if ** binds tighter you get an extra 'where' clause on the address table, and then a join...
> I think some operator overloading, especially the obvious cases like > ==, is cleaner than using only functions because it lets you order > things normally. But some of the operator choices are non-intuitive. > Personally, I would make something like 'alias' a function or class, > rather than overloading XOR. Not sure about ** for where.
My strategy is to do both. Have a "where" method that could be called instead of ** (ie. person.where(person.last_name=="smith"), but also allow for the ** syntax. After using it for a while, I'm finding the ** and ^ for alias very clear. Alias function brings up the issue of name-space (since table.alias could be a column if not a special method.) I'm assuming people don't use where, select, or other SQL key words as column names, and if they do, they have a table["column"] syntax to fall back on. But I digress. A method makes sense, though.
> Using // for 'in' looks really weird, too. It's too bad you can't > overload Python's 'in' operator. (Can you? It seems to be hard-coded > to iterate through an iterable and look for the value, rather than > calling a private method like some other builtins do.)
// was a bit of a stretch. I'd initially thought it for the "where" clause, becuase it's lower precedence than ** (I think), and really late at night // kind of looks like a W. I decided against it because it looks to close to a comment in some other languages.
Python "in" clause doesn't seem exploitable in any way--probably a good thing. I did add a "in_" method (name is arguable), which does the same thing, also a not_in.
> Have you ever used Pyparsing? Its use of a combination of classes and > operator overloading works pretty well.
I took a look at it. Seems like a happy balance there--with the overloading. <a href="http://www.aminus.org/blogs/index.php/fumanchu/2005/08/11/where_dejav...">Dejavu</a> is another cool solution--using Lambda expressions. But it goes into scary bytecode stuff (though it's well contained).
> For SQL function calls, don't you also want to support stored > procedures? In that case, you don't want pre-set magic functions so > much as a function that takes a string parameter for a function name > and then a list of function parameters.
This is what I had in mind (echo.py in the distribution). The only issue becomes knowing when to quote the parameters (is a string always a string?). I 've ended up quoting everything, and forcing the user to supply table.column notation if they don't want it quoted.
> See, this is where I'm not sure about operator precedence. If << > binds tighter than **, it works, because the << operator would make a > 'join' object and then the ** could be interpreted as an 'on' clause. > But if ** binds tighter you get an extra 'where' clause on the > address table, and then a join...
There's a little dark magic going on with the precedence here. The ** is higher precedence, so it happens first, and is applied to the address table object, which is then joined into person (with the << operator), and pulling in the existing where on address. This sounds bad, but it allows for where clauses to trickle up through the python expression, which is how I get a + b >> c to behave as expected.
>>Using // for 'in' looks really weird, too. It's too bad you can't >>overload Python's 'in' operator. (Can you? It seems to be hard-coded >>to iterate through an iterable and look for the value, rather than >>calling a private method like some other builtins do.)
[snip]
> Python "in" clause doesn't seem exploitable in any way
gru...@gmail.com wrote: > These objects (such as sqlstring.Select), represent > complex SQL Statements, but as Python objects. The benefit is that you > can, at run-time, "build" the statement pythonically, without > getting bogged down in String Manipulation. The theory is that once in > use, things that were complex (string magic) become simpler, and allow > the program to worry about higher-level issues. > ... > Some of this stuff has been around for a while (using "magic" objects > to build where clauses, etc.). But I'm trying to take it all the > way--to a legit Select statement.
> While still in the early stages, it does work with a great many sql > statements, as seen in the test suite. Currently supported are CASE > statements, Nested conditional clauses, nested queries and most join > types. At this point, I'm interested in getting feedback from the > community on several fronts:
> 1. The Operator Overload model. I've chosen to overload Python's > operators to give a short-hand syntax to most of the things you'd > want to do with a select statement. The rest are accessable via > methods. Currently ** is the "where" operator, // is the "in" > operator, % the "like" operator and ^ aliases columns. Other > overloads are as you'd expect- + / - * == all result in Expression > Objects that dish out the right SQL string. The question is, is the > "leap" in syntax to confusing? Is there a cleaner way to do this? > (Functions for example)
The big operator question will be: how will "and" and "or" be implemented? This is always a sticking point because of Python's short-circuiting behaviors regarding them (the resultant bytecode will include a JUMP).
An alternative is to stuff the representation into a string, which can then be parsed however one likes.
For Dejavu (http://projects.amor.org/dejavu), I didn't do either one--instead I used lambdas to express the where clause, so that:
f = logic.Expression(lambda x: ('Rick' in x.Name) or (x.Birthdate == datetime.date(1970, 1, 1))) units = sandbox.recall(Person, f)
might produce, in the bowels of the ORM:
"SELECT * FROM [Person] WHERE [Person].[Name] Like '%Rick%' or [Person].[Birthdate] = #1/1/1970#"
Note that the tablename is provided in a separate step. The translation is based on the codewalk.py and logic.py modules, which are in the public domain if you want to use any part of them. See http://projects.amor.org/dejavu/svn/trunk/
> 2. How to best add further sql function support? Adding magic > callable objects to columns came to mind, but this has it's own set > of issues. I'm leaning towards a magic object in the sqlstring > module. For example:
> sqlstring.F.substring(0, 4, person.first_name)
> would result in: substring(0, 4, person.first_name). the F object > could be put in the local scope for short-hand.
This is a hard problem, since your sqlstring module doesn't control the result sets, and so can't provide fallback mechanisms if a given database does not support a given function (or operator, or minute detail of how a function or operator works; for example, LIKE is case-insensitive in MS SQL Server but case-sensitive in PostgreSQL). If you're going to use subclasses to handle "database-specific overwrites" (below), then you'll probably want to stick such functions in that base class (and override them in subclasses), as well.
> 3. I'm undecided on how best to handle database specific > overwrites. I want this to be as easy as possible. I'm thinking about > subclassing Expressions with a naming scheme on the Sub-Class (such as > CaseExpression_oracle). Then the __init__ factory could dish out the > right version of the object based on the requestor. This brings up > lots of questions, such as how to support multiple types of databases > at the same time.
See the Adapter and SQLDecompiler classes in http://projects.amor.org/dejavu/svn/trunk/storage/db.py (and the store*.py modules) for some examples of using subclassing to produce database-specific syntax. There, it's one Adapter class per supported DB-type; you might consider keeping the Expression objects themselves free from SQL, and transform the Expressions to SQL in a separate class, which you could then subclass.
Just a couple of thoughts from someone who's done the string-manipulation dance once before. ;) I must admit I've always punted when it came time to produce complex joins or CASE statements--Dejavu simply doesn't provide that level of expressivity, preferring instead to hide it behind the object layer.
Robert Brewer System Architect Amor Ministries fuman...@amor.org
Jason Stitt wrote: > On Oct 19, 2005, at 9:18 PM, gru...@gmail.com wrote:
> <snip>
>>My solution is sqlstring. A single-purpose library: to create SQL >>statement objects. These objects (such as sqlstring.Select), represent >>complex SQL Statements, but as Python objects.
> <snip>
> First of all, I like this idea. I've been thinking about doing > something similar but am stuck with SQLObject for the moment. The > ability to construct complex expressions in pieces and then mix and > match them would be killer.
> I think some operator overloading, especially the obvious cases like > ==, is cleaner than using only functions because it lets you order > things normally. But some of the operator choices are non-intuitive. > Personally, I would make something like 'alias' a function or class, > rather than overloading XOR. Not sure about ** for where.
> Using // for 'in' looks really weird, too. It's too bad you can't > overload Python's 'in' operator. (Can you? It seems to be hard-coded > to iterate through an iterable and look for the value, rather than > calling a private method like some other builtins do.)
>>> class inplus(object): ... def __contains__(self, thing): ... print "Do I have a", thing, "?" ... return True ... >>> x = inplus() >>> "Steev" in x Do I have a Steev ? True >>> [...]
> Jason Stitt wrote: >> Using // for 'in' looks really weird, too. It's too bad you can't >> overload Python's 'in' operator. (Can you? It seems to be hard-coded >> to iterate through an iterable and look for the value, rather than >> calling a private method like some other builtins do.)
>>>> class inplus(object):
> ... def __contains__(self, thing): > ... print "Do I have a", thing, "?" > ... return True > ...
I stand corrected. <excuse>Python.org was intermittently down yesterday</excuse> so I was trying to play around with the interactive interpreter and missed it.
>> Using // for 'in' looks really weird, too. It's too bad you can't >> overload Python's 'in' operator. (Can you? It seems to be hard-coded >> to iterate through an iterable and look for the value, rather than >> calling a private method like some other builtins do.)
> // was a bit of a stretch. I'd initially thought it for the "where" > clause, becuase it's lower precedence than ** (I think), and really > late at night // kind of looks like a W. I decided against it because > it looks to close to a comment in some other languages.
> Python "in" clause doesn't seem exploitable in any way--probably a > good > thing. I did add a "in_" method (name is arguable), which does the > same thing, also a not_in.
What about modifying the overloaded == to produce 'in' if the right- hand side is a list? Then you can more easily generate statements dynamically:
As long as there is no case where you might actually want to test if a column value equals a list, it should work. Is there? Some DBs support an Array type, but in general that might be better handled with an Array class, anyway.
- Jason, fingers crossed that all this black magic doesn't affect one's chances in the afterlife ;)
> The big operator question will be: how will "and" and "or" be > implemented? This is always a sticking point because of Python's > short-circuiting behaviors regarding them (the resultant bytecode will > include a JUMP).
I'm using the Boolean | and & operators for logical groups, eg (a | b | (b & c)). This might seem ugly to pureists, but solves all of the short-circuit issues. It does require the user to use excessive parentheses, becuase | evaluates before ==. Another option is to use functions-- AND(EQ(a, 1), OR(IN(B,(1,2,3)))) -- But I find this hard to read. But mixing the two is sometimes clean: EQ(a,1) & LT(b,2). But having too many ways of doing things doesn't seem very pythonic.
> An alternative is to stuff the representation into a string, which can > then be parsed however one likes.
> For Dejavu (http://projects.amor.org/dejavu), I didn't do either > one--instead I used lambdas to express the where clause, so that:
> f = logic.Expression(lambda x: ('Rick' in x.Name) or > (x.Birthdate == datetime.date(1970, 1, 1))) > units = sandbox.recall(Person, f)
> might produce, in the bowels of the ORM:
> "SELECT * FROM [Person] WHERE [Person].[Name] Like '%Rick%' or > [Person].[Birthdate] = #1/1/1970#"
> Note that the tablename is provided in a separate step. The translation > is based on the codewalk.py and logic.py modules, which are in the > public domain if you want to use any part of them. See > http://projects.amor.org/dejavu/svn/trunk/
This is a very elegant solution, so much so that I almost didn't go down the path of sqlstring. Having support for lambda expressions is still an option, though I wanted to try object operator overloading/methods first--too see if I could avoid the Bytecode issue.
> > 2. How to best add further sql function support? Adding magic > > callable objects to columns came to mind, but this has it's own set > > of issues. I'm leaning towards a magic object in the sqlstring > > module. For example:
> > would result in: substring(0, 4, person.first_name). the F object > > could be put in the local scope for short-hand.
> This is a hard problem, since your sqlstring module doesn't control the > result sets, and so can't provide fallback mechanisms if a given > database does not support a given function (or operator, or minute > detail of how a function or operator works; for example, LIKE is > case-insensitive in MS SQL Server but case-sensitive in PostgreSQL). If > you're going to use subclasses to handle "database-specific overwrites" > (below), then you'll probably want to stick such functions in that base > class (and override them in subclasses), as well.
Good point. These things should be able to be "intercepted" in the database specific modules, so the library has a documented way functions should be used (ANSI if applicable), but database specific overwrites allow us to deal with issues or hacks (to emulate a function) in databases.
> See the Adapter and SQLDecompiler classes in > http://projects.amor.org/dejavu/svn/trunk/storage/db.py (and the > store*.py modules) for some examples of using subclassing to produce > database-specific syntax. There, it's one Adapter class per supported > DB-type; you might consider keeping the Expression objects themselves > free from SQL, and transform the Expressions to SQL in a separate > class, which you could then subclass.
Thanks. Your approach here had already inspired me, I'll take a look at it again. Pulling the SQL out of the Expression objects is double sided, but might be a way to cleanly support db syntax nuances. I'll keep you posted.
> >> Using // for 'in' looks really weird, too. It's too bad you can't > >> overload Python's 'in' operator. (Can you? It seems to be hard-coded > >> to iterate through an iterable and look for the value, rather than > >> calling a private method like some other builtins do.)
> > // was a bit of a stretch. I'd initially thought it for the "where" > > clause, becuase it's lower precedence than ** (I think), and really > > late at night // kind of looks like a W. I decided against it because > > it looks to close to a comment in some other languages.
> > Python "in" clause doesn't seem exploitable in any way--probably a > > good > > thing. I did add a "in_" method (name is arguable), which does the > > same thing, also a not_in.
> What about modifying the overloaded == to produce 'in' if the right- > hand side is a list? Then you can more easily generate statements > dynamically:
> As long as there is no case where you might actually want to test if > a column value equals a list, it should work. Is there? Some DBs > support an Array type, but in general that might be better handled > with an Array class, anyway.
This is a great idea, and should be the default behaviour for lists. It does present a problem if the right hand expression is a SELECT object, though. Both of these are valid syntax:
id = (select max(id) from table) id in (select id from table)
Also, SQLite allows for column in table_name syntax. I've never seen that before, but I wanted to support that, there'd be no way of knowing in vs. ==.
On this line of thought, what about the += operator? That might be more intuative than //. I could even use -= for not in.
> My solution is sqlstring. A single-purpose library: to create SQL > statement objects. These objects (such as sqlstring.Select), represent > complex SQL Statements, but as Python objects. The benefit is that you > can, at run-time, "build" the statement pythonically, without > getting bogged down in String Manipulation. The theory is that once in > use, things that were complex (string magic) become simpler, and allow > the program to worry about higher-level issues.
With the same starting point - I don't like writing SQL strings inside Python code either - I have tested a different approach : use the Python list comprehension / generator expression syntax for the select requests
s = query(r.name for r in planes if r.speed > 500) for item in s: print s
query is a class whose instances are created with the generator expression as argument. The matching SQL request is built in the __init__ method, here :
SELECT r.name FROM planes AS r WHERE r.speed > 500
On two tables :
s=query(r.name for r in planes for c in countries if r.country == c.country and c.continent == 'Europe')
is translated into :
SELECT r.name FROM countries AS c ,plane AS r WHERE (r.country = c.country AND c.continent = 'Europe')
For the moment the implementation is not very elegant, especially for getting the source code of the generator expression (it would be nice if they had an attribute for that !), and I'm not sure if it could work for all the forms of the SELECT syntax. But it should cover at least the most usual kinds of requests, with a Pythonic syntax
On Thu, 20 Oct 2005, gru...@gmail.com wrote: > On this line of thought, what about the += operator? That might be more > intuative than //. I could even use -= for not in.
You're going to have to explain to me how using an assignment operator for something other than assignment is intuitive!
-1 on this one from me, i'm afraid.
Using 'in' would be good. It does require some truly puke-inducing contortions, though; since 'in' calls __contains__ on the right-hand operand, and that's likely to be a list, or some other type that's not under your control, you have to cross your fingers and hope that whatever it is implements __contains__ with equality tests with the probe object on the left-hand side and the candidates on the right (as lists do, at least in 2.4.1). then, you just have to make your table names do the right thing when compared to strings.
It's a shame (sort of) that you can't define entirely new operators in python. What we need is a __operate__(self, op, arg) special method, so you could do:
On Thu, 20 Oct 2005, Pierre Quentel wrote: > gru...@gmail.com a écrit :
>> My solution is sqlstring. A single-purpose library: to create SQL >> statement objects.
> With the same starting point - I don't like writing SQL strings inside Python > code either - I have tested a different approach : use the Python list > comprehension / generator expression syntax for the select requests
> For instance :
> s = query(r.name for r in planes if r.speed > 500) > for item in s: > print s
> query is a class whose instances are created with the generator > expression as argument. The matching SQL request is built in the > __init__ method, here :
> SELECT r.name FROM planes AS r WHERE r.speed > 500
Tom Anderson wrote: > On Thu, 20 Oct 2005, gru...@gmail.com wrote:
> > On this line of thought, what about the += operator? That might be more > > intuative than //. I could even use -= for not in.
> You're going to have to explain to me how using an assignment operator for > something other than assignment is intuitive!
> -1 on this one from me, i'm afraid.
Point. I do think it looks strange, because we're used to seeing += in code. But the concept is more along the lines of the == and != comparison operators. Python does expose other nice things, such as &= and %=, which (since people aren't used to seeing them used much), might be better candidates. Does %= seem more agreeable? (I'm already using % for a like statement).
select * from person where person.type_id = 'customer' and person.id in (select person_id from phone)
> Using 'in' would be good. It does require some truly puke-inducing > contortions, though; since 'in' calls __contains__ on the right-hand > operand, and that's likely to be a list, or some other type that's not > under your control, you have to cross your fingers and hope that whatever > it is implements __contains__ with equality tests with the probe object on > the left-hand side and the candidates on the right (as lists do, at least > in 2.4.1). then, you just have to make your table names do the right thing > when compared to strings.
__contains__, while allowing side-effects on the object in question (even if though it's on the right), only returns true/false (not a custom object) afaik, so it breaks in a complex expression -- (a == b) & (c in d), won't work. You could modify D, but you can't pass that value to the whole Condition Expression.
Nevermind. This doesn't work because all of the X= operators in question are assignment operators, and therefore generate a Syntax Error if in a nested expression. I think I've settled on just doing a table.column.IN(blah) syntax. This should be obvious to anyone reading the code, and doesn't require mangling of the name (since it's capitalized). Then we'd have similar functions for other non intuitive things, such as LIKE, EXISTS (on the table) and even a WHERE:
>>>> import sqlstring >>>> model = sqlstring.TableFactory() >>>> print model.person >SELECT >person.* >FROM >[person] person
The [bracket] syntax is unique to Microsoft. Everyone else, including Microsoft SQL Server, uses "double quotes" to protect special characters in identifiers. -- - Tim Roberts, t...@probo.com Providenza & Boekelheide, Inc.