I think QuerySet.update works to inflexible. In SQL you can define
expressions containing the current value or the value of an other
column in the update clause, for Example to incrementing a value in the
result set. This is not possible by the ORM at the moment.
I developed a possible solution, by introducing combinable Expression
objects. Look at http://code.djangoproject.com/ticket/7210.
This patch allows you to do following:
from django.db.models.sql.expressions import *
qset = model.objects.all()
# Equivalent to qset.update(foo=42)
qset.update(foo=LiteralExpr(42))
# Increment column 'foo' by one.
qset.update(foo=CurrentExpr() + LiteralExpr(1))
# Swap the value of the column 'foo' and 'bar'.
qset.update(foo=ColumnExpr('bar'), bar=ColumnExpr('foo'))
Does somebody thinks there are reasons to don't merge this patch? Or
ideas ti even improve this patch? Thanks.
Regards
Sebastian Noack
The basic idea that you are proposing has been made before - for
example, several of the discussions about adding aggregations to the
ORM have included a capability similar to the one you propose.
However, this is the first time to my knowledge that code has been
offered.
> I developed a possible solution, by introducing combinable Expression
> objects. Look at http://code.djangoproject.com/ticket/7210.
>
> This patch allows you to do following:
>
> from django.db.models.sql.expressions import *
>
> qset = model.objects.all()
>
> # Equivalent to qset.update(foo=42)
> qset.update(foo=LiteralExpr(42))
> # Increment column 'foo' by one.
> qset.update(foo=CurrentExpr() + LiteralExpr(1))
> # Swap the value of the column 'foo' and 'bar'.
> qset.update(foo=ColumnExpr('bar'), bar=ColumnExpr('foo'))
>
> Does somebody thinks there are reasons to don't merge this patch? Or
> ideas ti even improve this patch? Thanks.
A few quick comments:
* I don't see why LiteralExpr is required. Surely a literal should
continue to work as is; if a literal is involved in an expression with
a ColumnExpr(), the definition of ColumnExpr should be smart enough to
merge the literal into the resultant expression.
* foo=CurrentExpr() is really just the same as foo=ColumnExpr('foo'),
isn't it? Why include two ways of providing the reference?
* ColumnExpr is a bit cumbersome for something that is really exposing
the primitives of the underlying SQL. Given that LiteralExpr and
CurrentExpr aren't really required, you only need one object for
expressions - the ability to reference a field. I would suggest
adopting the syntax from the earlier ORM aggregation discussions: F()
(for 'Field'). This follow the lead of Q() to describe Query objects.
* The examples you give are driven by use cases in the update()
clause. This is certainly a useful application of the idea, but it
would be good to ensure that the same syntax can be used elsewhere -
most notably, in filter() and exclude() clauses.
On top of these criticisms, there is at least one very good reason
that this patch won't get merged as-is - it doesn't contain any
documentation or test cases. What you are proposing is a fairly big
feature addition - documentation and tests would be mandatory before
the patch was accepted. I can understand why you might have left off
documentation until your idea was accepted, but I (and the other core
developers) are rarely convinced that code works unless there is a
comprehensive test suite to back it up.
Yours,
Russ Magee %-)
I know the aggregation stuff and used it as example when I wrote this
code. As you can see I provide also a base class with an as_sql method
which is overridden in derived classes. And I adopted the way the qn
function is passed.
> * I don't see why LiteralExpr is required. Surely a literal should
> continue to work as is; if a literal is involved in an expression with
> a ColumnExpr(), the definition of ColumnExpr should be smart enough to
> merge the literal into the resultant expression.
You can still do model.objects.update(foo=42) with my patch, because
of 42 is casted to a LiteralExpr under the hood. I could even make it
possible to do model.objects.update(foo=CurrentExpr() + 42). But there
is no way to enable model.objects.update(foo=42 + CurrentExpr()),
because of in this case int.__add__ instead of Expression.__add__ is
called, so I decided to introduce LiteralExpr. But I agree that it
would be cool, if you could use a literal as it is.
> * foo=CurrentExpr() is really just the same as foo=ColumnExpr('foo'),
> isn't it? Why include two ways of providing the reference?
Following statements are indeed equivalent:
model.objects.update(foo=CurrentExpr() + LiteralExpr(1))
model.objects.update(foo=ColumnExpr('foo') + LiteralExpr(1))
But first is less code, because of you don't have to specify the column
and I think second does violate the DRY slightly. But the really
interesting thing is that you can do following:
increment_expr = CurrentExpr() + LiteralExpr(1)
model_1.objects.update(foo=increment_expr)
model_1.objects.update(bar=increment_expr)
model_2.objects.update(foo=increment_expr)
There is actually no need for CurrentExpr, it is just there because of
DRY, but I already thought about using ColumnExpr() (a ColumnExpr
instance without a given column) instead of CurrentExpr().
> * ColumnExpr is a bit cumbersome for something that is really exposing
> the primitives of the underlying SQL. Given that LiteralExpr and
> CurrentExpr aren't really required, you only need one object for
> expressions - the ability to reference a field. I would suggest
> adopting the syntax from the earlier ORM aggregation discussions: F()
> (for 'Field'). This follow the lead of Q() to describe Query objects.
I also don't like the name *Expr. I already considered to use something
similar to Q(), but I think this is to confusing as long we have
multiple Expression objects. If we will decide to drop LiteralExpr and
CurrentExpr, I would like to use F(), as you proposed instead of.
> * The examples you give are driven by use cases in the update()
> clause. This is certainly a useful application of the idea, but it
> would be good to ensure that the same syntax can be used elsewhere -
> most notably, in filter() and exclude() clauses.
I think this is a good idea. The only reason why I only supporting
update() at the moment is, why I need this feature for a modified
pre-ordered tree traversal, app I am currently working on. And if you
know mptt, you know that you have to increment or decrement
values ordinary over the half table, when you change anything, which
really sucks if you can not do this with a single UPDATE statement. ;)
> On top of these criticisms, there is at least one very good reason
> that this patch won't get merged as-is - it doesn't contain any
> documentation or test cases. What you are proposing is a fairly big
> feature addition - documentation and tests would be mandatory before
> the patch was accepted. I can understand why you might have left off
> documentation until your idea was accepted, but I (and the other core
> developers) are rarely convinced that code works unless there is a
> comprehensive test suite to back it up.
I know this. But there is no sense do this at the moment. As you
already noted there are still a few points to discuss and probably
things to change on this code. So I will write documentation and tests
when this is finished, because of otherwise I have to do this work
twice.
Regards
Sebastian Noack
I figured that was probably the reason. It's a bit of an edge case,
but I can see where it will be required, especially with division
(since you can't reverse the operands).
As a matter of style, I would suggest calling the clause "Literal"
rather than "LiteralExpr" - a constant isn't really an expression,
except in the context of Church numerals. L() is another option, but
is perhaps a little _too_ brief for a relativey infrequent edge case.
>> * foo=CurrentExpr() is really just the same as foo=ColumnExpr('foo'),
>> isn't it? Why include two ways of providing the reference?
>
> Following statements are indeed equivalent:
>
> model.objects.update(foo=CurrentExpr() + LiteralExpr(1))
> model.objects.update(foo=ColumnExpr('foo') + LiteralExpr(1))
>
> But first is less code, because of you don't have to specify the column
> and I think second does violate the DRY slightly.
Less code is not the metric you should use. Clarity is the more
important feature. Sure, clarity and code length often coincide, but
there comes a point where you can say too little, and end up being
ambiguous or confusing.
Also - it's only a violation of DRY if you are specifying the same
thing twice. In this case, you have to specify two different things -
a source and a target. The fact that they both happen to be 'foo' is
an interesting fact about the query, not a duplication of effort.
> But the really
> interesting thing is that you can do following:
>
> increment_expr = CurrentExpr() + LiteralExpr(1)
> model_1.objects.update(foo=increment_expr)
> model_1.objects.update(bar=increment_expr)
> model_2.objects.update(foo=increment_expr)
I'm not sure I'm convinced that this is a good idea. Explicit is
better than implicit.
> I also don't like the name *Expr. I already considered to use something
> similar to Q(), but I think this is to confusing as long we have
> multiple Expression objects. If we will decide to drop LiteralExpr and
> CurrentExpr, I would like to use F(), as you proposed instead of.
I have no problem dropping CurrentExpr(). This leaves "Literal()" and
"F()", which seems like a manageable API to me.
>> On top of these criticisms, there is at least one very good reason
>> that this patch won't get merged as-is - it doesn't contain any
>> documentation or test cases. What you are proposing is a fairly big
>> feature addition - documentation and tests would be mandatory before
>> the patch was accepted. I can understand why you might have left off
>> documentation until your idea was accepted, but I (and the other core
>> developers) are rarely convinced that code works unless there is a
>> comprehensive test suite to back it up.
>
> I know this. But there is no sense do this at the moment. As you
> already noted there are still a few points to discuss and probably
> things to change on this code. So I will write documentation and tests
> when this is finished, because of otherwise I have to do this work
> twice.
This is true for documentation, but not for test cases. How do you
know that your code works? How do I know that your code works? How do
I know how comprehensive you have been in your personal testing?
Tests should be the first thing you write, not the last. If nothing
else, writing the tests can help to clarify the ideas that you have
and help to identify the difficulties you may experience during
development.
Yours,
Russ Magee %-)
You can handle this case by defining Expression.__radd__ (see http://docs.python.org/ref/numeric-types.html)
--
Sean
Btw, I still have to implement the Expression stuff into
Query.add_filter to support filter() and exclude(), as proposed by
Russel and I have to write test cases. I can't do this now, but I
think I will do it in the next days.
But it would be cool if somebody can already look on my latest patch,
if it is ok. Thanks.
Regards
Sebastian Noack
What I have done since the last version of this patch:
(1) Added F object support to filter and exclude.
(2) Implemented __radd__, __rsub__, __rmul__, etc.
(3) Renamed L (formerly LiteralExpr) to Literal.
(4) Got rid of get_placeholder calls.
(5) Literal.as_sql is using get_db_prep_lookup now.
(6) Removed logical connection types (AND, OR).
(7) Added bitwise connection types.
(1) This was the hardest part, but it works now, basically.
(2) Now you can do 1 + F('foo') and therewith the user does not have to
know anything about the Literal object.
(3) With the previous commit I renamed LiterExpr to L ollow the lead of
Q and F, but because of now it only used under the hood I named it to
Literal.
(4) UpdateQuery and DeleteQuery, were looking for a get_placeholder
method in the corresponding field. But there is no field in trunk
defining such a method. Furthermore isn't this the task of
get_db_prep_lookup anyway?
(5) get_db_prep_lookup is called when calling Literal.as_sql by now. So
it called whenever you deal with Literal objects. Furthermore this
solved a FIXME column in UpdateQuery.add_update_fields.
(6) I decided to remove the logical connetion types from
the ExpressionNode, because of F('foo') & 42 would do something
completely different as 24 & 42.
(7) I added the bitwise operators &, |, <<, >> and ~, for completeness.
I still have to write tests and documentation, but I would like to hear
from Russel or the other maintainers, what they think about the code.
Regards
Sebastian Noack
What do I think? The code doesn't look fundamentally bad, but I'm not
going to take a serious look until there are some test cases.
Consider this from my point of view. I need to validate that your code
works as advertised. As your patch currently stands, I need to write
my own test cases, including coming up with some models, some test
data, and thinking about the edge cases that need to be checked. This
takes time - time that could be spent looking at someone else's code,
merging someone else's patch. Given that my time is limited, and there
are multiple demands on my time, it makes sense for me to concentrate
on those patches that make my life easy, and enable me to process as
many tickets as my limited time allows.
Like I said last time, test cases are how you prove you are serious.
It's how you prove to me that you've done more than a trivial analysis
of the problem. It's how you prove to me that you've considered the
edge cases. And for your own sanity, it's how you prove to yourself
than when you make one of those "minor, this couldn't possibly affect
anything" changes, that it actually doesn't change anything.
Yours,
Russ Magee %-)
I know that tests are essential and if you look at the ticket, you will
see that I have added a new version of my patch including doc tests. ;)
I know what you think about patches adding new functionality but don't
providing test cases. And I completely agree. But you should see my
previous patches on this ticket just as proof of concept to help as to
discuss the functionality, its implementation and API.
During writing the tests i came up with a few bugs and fixes:
(1) I figured out that <<, >> (left and right shift) and ~ (inversion)
is (at least in MySQL) not supported, when selecting from tables. So I
decided to drop support for this corner case.
(2) I ensured that when using update(), get_db_prep_save() is called.
get_db_prep_lookup() is called anyway, but for example when using
DateTimeField and MySQL, get_db_prep_save() (as opposed to
get_db_prep_lookup) truncates microseconds because of missing support
in MySql of microseconds, so this must be done also on update.
(3) I added a sanity check to update(), preventing updating sliced
query sets, because of calling update() on a sliced query set will
update all objects. Do not think about implemeting this feature.
Although some DBMS might supports UPDATE with LIMIT and OFFSET, many
don't or support only LIMIT (as MySQL for example).
Note that (2) and (3) are bugs not introduced, but fixed by my patch.
Another reason for merging it. ;)
Regards
Sebastian Noack
> I just updated to your latest patch and there appears to be 2 test
> failures: http://dpaste.com/52137/ , the first one appears to be
> caused by the fact that different database engines do the rounding
> differently, my suggestion would be to change to something that
> divides evenly for the integer value.
Can you tell me which db engine you was using when the test failed?
> The second test failure looks to be caused by something being changed
> in the ORM, unfortunately, I'm not sure what caused it.
I am pretty sure that this is not related to my patch. I get the same
failure, and as far as I know I got it even before I wrote the patch.
Thanks for testing.
Regards
Sebastian Noack
I am sorry. You was right, but I have fixed it.
> > > I just updated to your latest patch and there appears to be 2 test
> > > failures:http://dpaste.com/52137/, the first one appears to be
> > > caused by the fact that different database engines do the rounding
> > > differently, my suggestion would be to change to something that
> > > divides evenly for the integer value.
I changed the tests, that the results aren't something like *.5 anymore,
which might become rounded up- or downwards depending on the db engine.
The result of the division test is 2.8 now. This should actually work.
Furthermore I am using the operator module instead of eval, now. Thanks
for this advice. I just forgot that I can use it for this test.
Regards
Sebastian Noack
This will work, because of a foreign key is just an integer in the
database, which are covered by the tests I have written. But I don't
think that a separate test case is required for this use case.
> - What about date fields? (increase the 'next_contact_date' by a
> week)
The tests ensure that the corresponding SQL is created as it works for
integers and floats. The only reason why it could fail is, when the
used DBMS does not support the operator for the given type (in this
case timestamp). And if the DBMS don't support the used operator for
the corresponding column type, I think it is ok, that a database error
is raised. I would like to not implement DBMS dependent switches to
handle column types as varchar. As far as I know timestamp will work,
but if not it is only because of missing support of the underlying
DBMS.
> - I can see lots of use of F() in update() clauses, but what about
> filter()? (give me all the companies where the number of employees is
> greater than the number of chairs owned. Don't worry about doing this
> with aggregates - just keep n_chairs and n_employees as attributes)
I don't found such a use case, when I wrote the tests, but I would like
to add such a test case. Thanks for the example.
> - What about F() using joins? This may be outside the scope of what
> is easily possible, and I'd be happy to defer this till later, but it
> would be interesting if you could populate a field based upon a joined
> column.
This isn't possible at the moment. But I also think that it is outside
of the current scope. I would like to implement it later (when the
patch with the given features is merged), too.
> > (1) I figured out that <<, >> (left and right shift) and ~
> > (inversion) is (at least in MySQL) not supported, when selecting
> > from tables. So I decided to drop support for this corner case.
>
> This doesn't particularly concern me. I'm having a hard time thinking
> of an occasion where I would need those operators.
Me too. I just added it because of completeness. But I dropped it as
soon I noted that it is badly supported by MySQL (and probably other
DBMS, too).
> > Note that (2) and (3) are bugs not introduced, but fixed by my
> > patch. Another reason for merging it. ;)
>
> These second two points are bug fixes that are independent of this
> ticket. The large feature described by #7210 might take a little while
> to get into trunk; obvious bug fixes should be applied as soon as
> possible. It helps us to keep the commits atomic, so we can tell
> exactly what changes (and why) with each commit. Please open tickets
> for each of these problems (if there aren't already tickets), and
> split the parts of the #7210 patch that fix those issues (and the
> relevant tests) into separate patches.
I think you are right about (3). But (2) is not a fatal bug, but rather
a corner case and handled deep into the new code.
Regards
Sebastian