GSOC proposal: improve F expresions and DateField lookups

5 views
Skip to first unread message

Max Veytsman

unread,
Apr 1, 2009, 12:31:44 AM4/1/09
to Django developers
Hello everyone,

My name is Maxim Veytsman and I am a third year computer science
student at the University of Toronto.

There was a project idea in the wiki called "Improve Query Expressions
(F() syntax."

I have a list of syntactical improvements to handling of F expressions
especially with DateTimeFields, and I'm hoping on some feedback before
I submit my application.

Since I'm talking about syntactical improvements I think it's best to
have a demonstrative example for each one. Suppose I have the
following models:

class Item(models.Model):
subject = models.CharField(max_length=200)
text = models.TextField()
datetime_posted = models.DateTimeField(auto_now_add=True)

class Author:
name = models.CharField(max_length=40)
birthday = models.DateField()


Improvements to Handling of Dates
==================================

1) Syntax for comparisons using fields derived from DateField

For instance to select all of the items that were written before the
10th of the month I would do

Item.objects.filter(datetime_posted__day__lt=10)

2) Allow F expressions to access fields derived from DateField.

I.E. for all Items written by an author during his/her birthmonth:
Item.objects.filter(datetime_posted__month=F
('author__birthdate__month')

3) Add the __date lookup to DateTimeFields
I.E. for all Items written on the author's birthday
Item.objects.filter(date_poststed__date=F(author__birthday))

This is to compare a DateTimeField with a DateField
This is covered by the following ticket:
http://code.djangoproject.com/ticket/9596http://code.djangoproject.com/ticket/9596
which provides a patch, although it may not work for all backends. I
will make sure it does, and provide more rigorous testing.

4) Allow for annotation by fields derived from F expressions
This will probably be the meat of the project.

I.E. to group Items by date:
Item.objects.annotate(date=F('datetime__date')).values('date').values
(num_items=Count('id'))

This is covered in ticket #10302(http://code.djangoproject.com/ticket/
10302)

Allowing F expressions to be used in annotate will allow a lot of
power in queries, not just related to date manipulation.

5) This already has a patch but I think I should mention it. Ticket
10514(http://code.djangoproject.com/ticket/10154) allows for
timedeltas to be added or subtracted from F expressions.

I.E.
For all Items posted within 5 days after the author's birthday
Item.objects.filter(datetime_posted__date__gte=F
('author__birthday'),datetime_posted__date__lte=F('author__birthday')
+datetime.timedelta(days=5))

If anyone has any comments about how the solution there can be
improved, please let me know as this fits into what I am doing.

Improvements to Handling of Strings
===================================

I also want to allow for string F expressions.
I'm going to stop using the example for these as they are simpler.

If the field foo contains a string
1) F('foo')+"bar" should concatenate the two strings

2) F('foo')*N should repeat N times

3) I would like some feedback on this one: would allowing for splicing
be a good idea?

I can see how an expression like __startswith=F('foo')[0:3] could be
useful to have. Is implementing this a good idea?

That's pretty much what I want to do over the summer. I hope that in
sum this is substantial enough for a GSOC project. I would also love
for any other recommendations, and for ways to improve the timedelta
functionality.

Thank you for your time,
Maxim Veytsman

Russell Keith-Magee

unread,
Apr 1, 2009, 10:29:03 AM4/1/09
to django-d...@googlegroups.com
On Wed, Apr 1, 2009 at 12:31 PM, Max Veytsman <maxim.v...@gmail.com> wrote:

Hi Maxim,

> There was a project idea in the wiki called "Improve Query Expressions
> (F() syntax."
>

> Improvements to Handling of Dates
> ==================================
>
> 1) Syntax for comparisons using fields derived from DateField
>
> For instance to select all of the items that were written before the
> 10th of the month I would do
>
> Item.objects.filter(datetime_posted__day__lt=10)

I'm not sure this is the best approach. It isn't consistent with any
other filter operator in Django - for all other operators, the last
part (__lt in this case) is the operator, and all other parts are
considered to be fields that need to be joined.

My preferred approach here would be to allow for annotations that are
non-aggregates. This way, you would annotate the 'day' onto the item
instances, then filter on that annotation:

Item.objects.annotate(posted_day=Day('datetime_posted')).filter(posted_day__lt=10)

This has the added advantage that it allows for user-specified
annotations functions.

> 2) Allow F expressions to access fields derived from DateField.
>
> I.E. for all Items written by an author during his/her birthmonth:
> Item.objects.filter(datetime_posted__month=F
> ('author__birthdate__month')

Again, this should be handled in a way similar to (1). Coming up with
a clean syntax for the RHS of this expression could be interesting,
however.

> 3) Add the __date lookup to DateTimeFields
> I.E. for all Items written on the author's birthday
> Item.objects.filter(date_poststed__date=F(author__birthday))
>
> This is to compare a DateTimeField with a DateField
> This is covered by the following ticket:
> http://code.djangoproject.com/ticket/9596http://code.djangoproject.com/ticket/9596
> which provides a patch, although it may not work for all backends.  I
> will make sure it does, and provide more rigorous testing.

+1. This isn't strictly related to F() expressions - it's just another
filter clause - but it is one worth having.

> 4) Allow for annotation by fields derived from F expressions
> This will probably be the meat of the project.

> I.E. to group Items by date:
> Item.objects.annotate(date=F('datetime__date')).values('date').values
> (num_items=Count('id'))
>
> This is covered in ticket #10302(http://code.djangoproject.com/ticket/
> 10302)

This isn't what #10302 is describing. #10302 is describing the ability
to force a GROUP BY on a computed field. Again, once you solve the
annotation of computed fields problem, this becomes relatively
trivial.

> 5) This already has a patch but I think I should mention it.  Ticket
> 10514(http://code.djangoproject.com/ticket/10154) allows for
> timedeltas to be added or subtracted from F expressions.

There is a patch, but I'm not sure I'm 100% happy with it. I'm
slightly uncomfortable introducing a completely separate expression
node for dates. The implication is that every new data type will
require its own Node subclass - so String handling, for example, will
require a new node type; any geometry handling will require its own
node type, and so on. I'd rather see a generic approach extend the
base Node definition to carry the extra logic required to support
non-numeric expressions.

> Improvements to Handling of Strings
> ===================================
>
> I also want to allow for string F expressions.
> I'm going to stop using the example for these as they are simpler.
>
> If the field foo contains a string
> 1) F('foo')+"bar" should concatenate the two strings
>
> 2) F('foo')*N should repeat N times
>
> 3) I would like some feedback on this one: would allowing for splicing
> be a good idea?

Meh. I can't say this is a big one for me, but turning Python slicing
syntax into the underlying SQL would be a neat trick.

My bigger concern is F('foo') + F('bar'). This is possibly implied by
(1), but it's worth saying explicitly. However, once you've solved the
'generic expression node' problem, this shouldn't be too hard.

> That's pretty much what I want to do over the summer. I hope that in
> sum this is substantial enough for a GSOC project.  I would also love
> for any other recommendations, and for ways to improve the timedelta
> functionality.

The ideas you are presenting here (once you modify the xx__day to
Day('xx') issues) aren't controversial, and they would be good
additions to Django. From a purely idea point of view, this is all
win.

What is missing from this proposal is any indication that you have
given any thought to these problems beyond "Hey, that ticket looks
cool". Have you made estimates of how long each of these tasks will
take? How did you arrive at those estimates? What makes you think that
your estimates are correct? What complexities do you expect to find?
How do you plan to mitigate problems when they arise?

Remember - we know nothing about you. The ideas you have proposed are
mostly from Django's own Trac instance and wiki, so we haven't learned
a lot about you from the ideas you propose (other than your taste in
problems). Why should we trust you with a GSoC slot?

Yours,
Russ Magee %-)

Nicolas Lara

unread,
Apr 1, 2009, 11:02:05 AM4/1/09
to django-d...@googlegroups.com
On Wed, Apr 1, 2009 at 2:29 PM, Russell Keith-Magee <freakb...@gmail.com> wrote:

On Wed, Apr 1, 2009 at 12:31 PM, Max Veytsman <maxim.v...@gmail.com> wrote:

Hi Maxim,

> There was a project idea in the wiki called "Improve Query Expressions
> (F() syntax."
>
> Improvements to Handling of Dates
> ==================================
>
> 1) Syntax for comparisons using fields derived from DateField
>
> For instance to select all of the items that were written before the
> 10th of the month I would do
>
> Item.objects.filter(datetime_posted__day__lt=10)

I'm not sure this is the best approach. It isn't consistent with any
other filter operator in Django - for all other operators, the last
part (__lt in this case) is the operator, and all other parts are
considered to be fields that need to be joined.

My preferred approach here would be to allow for annotations that are
non-aggregates. This way, you would annotate the 'day' onto the item
instances, then filter on that annotation:

Item.objects.annotate(posted_day=Day('datetime_posted')).filter(posted_day__lt=10)

This has the added advantage that it allows for user-specified
annotations functions.

The Day object syntax would result in a subquery. An interesting case here would be to create a generic subquering annotation function that could be extended or even a generic way of subquering. I'm not sure if this goes completely out of scope. AFAIK, it has been discussed before but without concluding about a syntax for generic subquering.
 



--
Nicolas Lara
Linux user #380134
http://nicolas-lara.blogspot.com/
Public key id: 0x152e7713 at http://keyserver.noreply.org/

Russell Keith-Magee

unread,
Apr 1, 2009, 11:25:21 AM4/1/09
to django-d...@googlegroups.com

Why does it result in a subquery? This is really no different to F() -
it's just a column reference, but with an SQL function wrapped around
it that can extract the date value from the database column.

The complication is with the implementation of annotate(). Under this
approach, you can no longer assume that a clause in an annotate is
automatically an aggregation, and should be treated as such. However,
this should be a fairly minor implementation detail.

Yours,
Russ Magee %-)

Reply all
Reply to author
Forward
0 new messages