Building a library of SQL functions into Django

516 views
Skip to first unread message

Josh Smeaton

unread,
Jun 17, 2014, 9:52:11 PM6/17/14
to django-d...@googlegroups.com
Over the last 6 months we've been working on a fairly large refactor to expressions. As a brief catch up, expressions are currently F() expressions. I've expanded their scope to include Aggregates. As a by-product of this we can now use any kind of expressions in the .annotate() function, not just aggregates. Tim Martin has done some work based on this patch that extends support to .order_by() also. See the bottom of this post for links to the various discussions and patches. I highly recommend reading them - and for 3rd party backend maintainers - trying them out.

Although this functionality hasn't been merged into core yet, Tim and I would like to start putting together a bunch of  SQL functions to be included in django. There exists a type called Func in the new implementation that allows us to create very simple classes that implement SQL functions:

class Lower(Func):
    function = 'LOWER'

The API documentation goes a long way in explaining all the various ways that custom expressions and functions can be built, so I'll avoid that for now. What we are looking for is some interest in the implementation and what we should try to include. I'm not sure, but this might be the perfect opportunity to create a DEP.

1. What SQL structures and functions should we try to support for initial release? We need to compose a list of everything we should implement and commit to. Some examples:
LOWER/UPPER/COALESCE/CASE,WHEN or IF/SUBSTRING etc. I believe we should be able to implement analytical/window functions too, but that might be best left to a future release or even a third party library.

2. Where should these functions live? Currently everything exists in the django.db.models.expressions namespace (aggregates are still in models.aggregates). I'm thinking along the lines of django.db.models.functions?

3. How should we support the built in backends where differences in implementation arise? As a bad example, what if Oracle requires uppercase `LOWER()` and Postgres requires lowercase `lower()`. Expressions have the concept of `as_{vendor}`, which allows 3rd party backends to provide their own implementation of an expression. Should we use this extension point internally, or should we continue to put any differences we find directly in the backend? I think it'll be easier to support the as_vendor structure, and will automatically support 3rd party backends rather than forcing them to implement lower_or_uppercase_function_feature per backend.

4. Should these functions be part of core, or should they be built as third party or even contrib apps? I think backend specific functions could find a home in contrib (like the postgres kickstarter) or as third party libraries. But we should implement the basics/most common.

5. Are we jumping the gun and should we wait until the patches land before even discussing the above?

I'll leave it there for now. Hopefully we can get some good feedback and go from there.

- Josh

Anssi Kääriäinen

unread,
Jun 18, 2014, 3:09:53 AM6/18/14
to django-d...@googlegroups.com
On Wednesday, June 18, 2014 4:52:11 AM UTC+3, Josh Smeaton wrote:
Over the last 6 months we've been working on a fairly large refactor to expressions. As a brief catch up, expressions are currently F() expressions. I've expanded their scope to include Aggregates. As a by-product of this we can now use any kind of expressions in the .annotate() function, not just aggregates. Tim Martin has done some work based on this patch that extends support to .order_by() also. See the bottom of this post for links to the various discussions and patches. I highly recommend reading them - and for 3rd party backend maintainers - trying them out.

Not just 3rd party backend maintainers, but also for 3rd party library writers. The new way could break your code if you depend on current implementation of ExpressionNode, aggregates or SQLEvaluator.

I think finding a small subset of commonly used functions and including them in core or contrib is a good idea - these should be building blocks every Django application can depend on. The main problem is how to limit that subset.

5. Are we jumping the gun and should we wait until the patches land before even discussing the above?

Maybe a bit - maybe it is better to wait and see what happens to the expressions refactor idea. It is interesting to see that the expressions refactor patch has led to multiple discussions about how to use it for other new features. So, expressions refactor is definitely doing something right.

 - Anssi

Curtis Maloney

unread,
Jun 18, 2014, 8:48:07 AM6/18/14
to django-d...@googlegroups.com
Would it be possible to have some sort of registration pattern, allowing people to import per-backend and per-project libs to override and extend the various available functions?

I realise this is nothing more than a sanctioned form of monkey patching, but it would at least provide a common pattern for people to do so, and hopefully make the whole process more manageable and predictable.

--
Curtis



--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/daf867fe-bedf-4134-bc92-728e2fa2c17f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Josh Smeaton

unread,
Jun 18, 2014, 9:19:51 AM6/18/14
to django-d...@googlegroups.com
At the moment the API sanctions monkey patching by providing the as_vendor() method (from the tests):

def lower_case_function_override(self, compiler, connection):
                sql, params = compiler.compile(self.expressions[0])
                substitutions = dict(function=self.function.lower(), expressions=sql)
                substitutions.update(self.extra)
                return self.template % substitutions, params
            setattr(Sum, 'as_' + connection.vendor, lower_case_function_override)

Though this isn't as nice as the registration pattern used by the lookups/transforms that landed for 1.7. Libraries can register their overrides using AppConfigs, and users can import backend specific functions from libraries. Is this the kind of thing you were thinking of? 

Josh

Aymeric Augustin

unread,
Jun 18, 2014, 9:31:39 AM6/18/14
to django-d...@googlegroups.com
2014-06-18 15:19 GMT+02:00 Josh Smeaton <josh.s...@gmail.com>:
At the moment the API sanctions monkey patching by providing the as_vendor() method

Yes, I'm not happy with the as_<vendor>() API either, because it creates a large asymmetry between the core and non-core backends.

-- 
Aymeric.

Josh Smeaton

unread,
Jun 18, 2014, 9:57:33 AM6/18/14
to django-d...@googlegroups.com
Consider library authors:

class WindowFunction(Func):
    def as_sql(self, compiler, connection):  # standard implementation
        return compiler.compile(self.expressions)
    def as_sqlite(self, compiler, connection):  # unsupported first party backend
        raise NotSupportedException("WindowFunction not supported with SQLite")
    def as_mssql(self, compiler, connection):  # third party backend
        self.function = "SlightlyDifferentFunctionName"
        return self.as_sql(compiler, connection)

I think the as_{vendor} makes all backends equal - both core and non-core backends. We all use the same API. We can anticipate some of the backends that will use a particular function, but make it possible and easy for other backends to jump onboard. This is in contrast to the existing process of providing implementations directly inside the backend (DatabaseOperations). But this is a part that I'm struggling with. Should we push Function implementations down into DatabaseOperations, or continue with the as_vendor? I really like the as_vendor implementation for Expressions since there will be a large number of Expressions which will work across all backends with the default as_sql(), but there is an easy extension point. I see libraries providing a larger set of expressions than django does.

I don't like the setattr(Expression.. ) syntax for providing implementations though. I'd probably prefer a .register(vendor, function).

Josh

Marc Tamlyn

unread,
Jun 18, 2014, 10:08:45 AM6/18/14
to django-d...@googlegroups.com

Worth noting that Lookups now have exactly the same as_vendor API. The registration API simply allows you to overload the lookup table with a subclass rather than just monkeypatching.

This is harder here are the classes are used directly. We could provide an expressions registry I guess...

Marc

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Carl Meyer

unread,
Jun 18, 2014, 1:19:08 PM6/18/14
to django-d...@googlegroups.com
On 06/18/2014 08:08 AM, Marc Tamlyn wrote:
> Worth noting that Lookups now have exactly the same as_vendor API. The
> registration API simply allows you to overload the lookup table with a
> subclass rather than just monkeypatching.
>
> This is harder here are the classes are used directly. We could provide
> an expressions registry I guess...

Am I missing something? It seems to me that the fact that Functions are
imported and used directly makes everything easier, and reduces the need
for monkeypatching compared to Lookups. If you need a Function in your
project (whether provided by Django or by a third-party library), and
the Function doesn't natively support the database backend you're using,
you can simply subclass it, add the appropriate as_vendor method, and
use your subclass.

I guess the case that isn't handled as nicely is where a third-party app
is itself doing the query and using the Function. In my experience, this
is much less common - I very rarely use third-party apps that define
models or perform queries themselves. Perhaps it's more common for
others. I suppose pushing implementation down to DatabaseOperations
would make this case a little easier to handle (at the cost of making
the above case harder - a custom database backend is a bigger deal than
subclassing a Function). Personally I think in this situation the
third-party app should make the querying behavior
extendable/overridable, or if all else fails you just monkeypatch the
Function it's using with the necessary as_vendor.

I may be missing something, but I don't see any need for / value in a
registration API for functions. Global registries should be a reluctant
last resort in API design.

Carl

Marc Tamlyn

unread,
Jun 18, 2014, 1:40:42 PM6/18/14
to django-d...@googlegroups.com
I agree with your assessment. The problem I suppose is that 99% of django users use a third-party app with models and queries - it's called the admin. Whether this is in practice an issue in this case I don't know - I don't think there's much aggregation goes on there - but it's nonetheless an issue.

Also, at the moment 3rd party backends have the promise of "change your backend setting to this and everything will work" - not to also change a load of imports to use their aggregation functions. Obviously if you're using backend specific things then ok, but anything Django provides should have an appropriate backend hook.

Personally, I happy with the current API design now we have AppConfigs as a way for third party backends to officially monkeypatch Django - though I can see the arguments that this in itself seems like poor API design.

Marc


--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Carl Meyer

unread,
Jun 18, 2014, 2:00:13 PM6/18/14
to django-d...@googlegroups.com
On 06/18/2014 11:40 AM, Marc Tamlyn wrote:
> I agree with your assessment. The problem I suppose is that 99% of
> django users use a third-party app with models and queries - it's called
> the admin. Whether this is in practice an issue in this case I don't
> know - I don't think there's much aggregation goes on there - but it's
> nonetheless an issue.

Point taken re the admin - but I doubt the core admin will be changed to
use Functions (in a non-overridable way). Maybe I'm wrong.

> Also, at the moment 3rd party backends have the promise of "change your
> backend setting to this and everything will work" - not to also change a
> load of imports to use their aggregation functions. Obviously if you're
> using backend specific things then ok, but anything Django provides
> should have an appropriate backend hook.

Yeah, I think this is the strongest argument.

The problem is that we want to support both third-party Functions and
third-party database backends, and sometimes getting those to work
together will require integration work. The question is whether that
integration happens via a custom database backend (more work, but always
under the control of the integrator) or a customized Function (simpler
and less invasive, but requires monkeypatching if the Function is used
in third-party code).

If we choose the former, we're making third-party Functions a
second-class citizen. If we choose the latter, we're making third-party
database backends the second-class citizen.

I can see good arguments for either of those choices.

Maybe a form of registry is a reasonable answer. I wouldn't see it as a
registry of Functions; in the common case Functions shouldn't require
any registration. But it could be a registry of "Function compiler
overrides" - if your project uses a third-party Function that doesn't
output the correct SQL for your database backend, you can call a method
on your backend that registers a different SQL translation for that
particular Function. It's still a bit monkeypatchy, but it seems as good
as the other available options.

Carl

> Personally, I happy with the current API design now we have AppConfigs
> as a way for third party backends to officially monkeypatch Django -
> though I can see the arguments that this in itself seems like poor API
> design.
>
> Marc
>
>
> On 18 June 2014 18:18, Carl Meyer <ca...@oddbird.net
> <mailto:django-developers%2Bunsu...@googlegroups.com>.
> To post to this group, send email to
> django-d...@googlegroups.com
> <mailto:django-d...@googlegroups.com>.
> --
> You received this message because you are subscribed to the Google
> Groups "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-develop...@googlegroups.com
> <mailto:django-develop...@googlegroups.com>.
> To post to this group, send email to django-d...@googlegroups.com
> <mailto:django-d...@googlegroups.com>.
> Visit this group at http://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CAMwjO1H%3DFir4Gpgm%2Btw01dsNN4w6%2BXMU0wdxYjcixUmjbAwJXQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/django-developers/CAMwjO1H%3DFir4Gpgm%2Btw01dsNN4w6%2BXMU0wdxYjcixUmjbAwJXQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Aymeric Augustin

unread,
Jun 18, 2014, 4:59:33 PM6/18/14
to django-d...@googlegroups.com
2014-06-18 19:18 GMT+02:00 Carl Meyer <ca...@oddbird.net>:
If you need a Function in your
project (whether provided by Django or by a third-party library), and
the Function doesn't natively support the database backend you're using,
you can simply subclass it, add the appropriate as_vendor method, and
use your subclass.

I was thinking about third-party database backends. How are they expected
to ship customised implementations of Functions included in Django?

--
Aymeric.

Carl Meyer

unread,
Jun 18, 2014, 5:23:20 PM6/18/14
to django-d...@googlegroups.com
On 06/18/2014 02:59 PM, Aymeric Augustin wrote:
> 2014-06-18 19:18 GMT+02:00 Carl Meyer <ca...@oddbird.net
> <mailto:ca...@oddbird.net>>:
Yes, I wasn't thinking about this case in my first mail, and I agree
that this is a serious problem for the pure as_vendor() approach; it's
not acceptable IMO to expect third-party database backends to
monkeypatch their as_vendor() methods onto the built-in Function classes.

I think database backends should have a hook to override the SQL
implementation of any given Function. I don't think this needs to imply
pushing the default implementation of all Functions down into the
database backend (that just makes life unnecessarily difficult for
third-party Functions).

It may be necessary to support both as_vendor() methods and a hook in
the database backend (with the database backend taking priority). That
allows third-party database backends to Just Work with all built-in
Functions, and third-party Functions to Just Work with all built-in
database backends, with no monkeypatching needed for either. Then the
only case in which some form of monkeypatching (perhaps supported by a
public API to register additional function-implementation overrides at
ready()) would be needed would be a third-party Function used with a
third-party database backend.

Carl

Chris Wilson

unread,
Jun 18, 2014, 5:56:09 PM6/18/14
to django-d...@googlegroups.com
Hi Carl,

On Wed, 18 Jun 2014, Carl Meyer wrote:
> On 06/18/2014 02:59 PM, Aymeric Augustin wrote:
>> 2014-06-18 19:18 GMT+02:00 Carl Meyer <ca...@oddbird.net
>> <mailto:ca...@oddbird.net>>:
>>
>>> If you need a Function in your project (whether provided by Django or
>>> by a third-party library), and the Function doesn't natively support
>>> the database backend you're using, you can simply subclass it, add the
>>> appropriate as_vendor method, and use your subclass.
>>
>> I was thinking about third-party database backends. How are they
>> expected to ship customised implementations of Functions included in
>> Django?
>
> Yes, I wasn't thinking about this case in my first mail, and I agree
> that this is a serious problem for the pure as_vendor() approach; it's
> not acceptable IMO to expect third-party database backends to
> monkeypatch their as_vendor() methods onto the built-in Function classes.
>
> I think database backends should have a hook to override the SQL
> implementation of any given Function. I don't think this needs to imply
> pushing the default implementation of all Functions down into the
> database backend (that just makes life unnecessarily difficult for
> third-party Functions).

Why does it even need a hook or monkey-patching? Why not pass the
expression tree to the SQL backend (as the most natural format for
programmatically parsing), for it to render out in backend-specific SQL as
it sees fit? Perhaps relying in inherited base methods in most cases, and
overriding a few methods where that backend needs to render a particular
feature differently?

Cheers, Chris.
--
Aptivate | http://www.aptivate.org | Phone: +44 1223 967 838
Citylife House, Sturton Street, Cambridge, CB1 2QF, UK

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.

Carl Meyer

unread,
Jun 18, 2014, 6:01:55 PM6/18/14
to django-d...@googlegroups.com
Hi Chris,
Yes - that's going all the way to the other end of the spectrum, making
life easier for third-party database backends at the expense of
third-party Functions. How, in your plan, is a third-party Function
(which won't be recognized by any database backend's SQL emitter)
supposed to provide "just works" support for all built-in database
backends, presuming it needs different SQL for different backends?

Carl

Chris Wilson

unread,
Jun 18, 2014, 6:42:18 PM6/18/14
to django-d...@googlegroups.com
Good point, thanks :)

If neither the backend knows about the function, nor the function about
the backend, then a third-party module would have to register support for
both (e.g. "I know how to convert a Case() Function into HSQLDB SQL").
Perhaps we could have a registry where third-party modules could register
such support.

Lacking that, perhaps it's just a bug? E.g. "Our Case() Function works on
Postgres, MySQL and MSSQL, but not on Oracle because we haven't got
around to implementing the patch in the Oracle backend to support it,
patches welcome." A backend could easily detect a Function that it doesn't
understand in the expression tree, give it a chance to render itself, and
if that fails then it could raise a warning or an error?

Carl Meyer

unread,
Jun 18, 2014, 7:00:17 PM6/18/14
to django-d...@googlegroups.com
On 06/18/2014 04:41 PM, Chris Wilson wrote:
>>>> I think database backends should have a hook to override the SQL
>>>> implementation of any given Function. I don't think this needs to
>>>> imply pushing the default implementation of all Functions down into
>>>> the database backend (that just makes life unnecessarily difficult
>>>> for third-party Functions).
>>>
>>> Why does it even need a hook or monkey-patching? Why not pass the
>>> expression tree to the SQL backend (as the most natural format for
>>> programmatically parsing), for it to render out in backend-specific
>>> SQL as it sees fit? Perhaps relying in inherited base methods in most
>>> cases, and overriding a few methods where that backend needs to
>>> render a particular feature differently?
>>
>> Yes - that's going all the way to the other end of the spectrum,
>> making life easier for third-party database backends at the expense of
>> third-party Functions. How, in your plan, is a third-party Function
>> (which won't be recognized by any database backend's SQL emitter)
>> supposed to provide "just works" support for all built-in database
>> backends, presuming it needs different SQL for different backends?
>
> Good point, thanks :)
>
> If neither the backend knows about the function, nor the function about
> the backend, then a third-party module would have to register support
> for both (e.g. "I know how to convert a Case() Function into HSQLDB
> SQL"). Perhaps we could have a registry where third-party modules could
> register such support.

Yes; this is what I meant by "perhaps supported by a
public API to register additional function-implementation overrides at
ready()" in my mail that you originally replied to.

Carl

Josh Smeaton

unread,
Jun 18, 2014, 7:56:10 PM6/18/14
to django-d...@googlegroups.com
Hi All,

I think there may be some confusion here since I didn't actually explain myself properly to begin with. Let's run through an example:

class Lower(Func):
    function = 'LOWER'

This function will work for all core backends, but it will also work for all third party backends. Third party backends don't have to implement `as_vendor()`, it's just there incase the default doesn't work for them. MSSQL supports the function name "LOWER", so this function will automatically work for django-mssql.

Now let's say that a mongodb backend wants to use the Lower function, but the function name is different. What mongo could do, is provide their own implementation:

def mongo_lower(self, compiler, connection):
    self.function = '$toLower'
    return self.as_sql(compiler, connection)
 
setattr(Lower, 'as_mongo', mongo_lower)
 # or perhaps a friendlier registration method:
Lower.register('mongo', mongo_lower)  # register would internally just call the setattr() above 

I think this process is pretty good. Everyone gets to use the default implementation for free (function signatures and names are generally pretty stable across backends), but the option exists to easily extend or modify the generated SQL where needed. I think this equally supports 3rd party backends and function authors.

Do you think this is appropriate? Have I missed something fundamental above where this process isn't fair to backends or library authors?

Josh

Anssi Kääriäinen

unread,
Jun 19, 2014, 3:24:42 AM6/19/14
to django-d...@googlegroups.com
On Thu, 2014-06-19 at 02:56 +0300, Josh Smeaton wrote:

>
> Lower.register('mongo', mongo_lower) # register would
> internally just call the setattr() above
>
>
> I think this process is pretty good. Everyone gets to use the default
> implementation for free (function signatures and names are generally
> pretty stable across backends), but the option exists to easily extend
> or modify the generated SQL where needed. I think this equally
> supports 3rd party backends and function authors.
>
>
> Do you think this is appropriate? Have I missed something fundamental
> above where this process isn't fair to backends or library authors?

This is pretty much how I intended this to work. There are two downsides
with this approach:
1) We don't have .register() method. If we go with this plan we might
still be able to add .register() to 1.7 (it is a trivial classmethod).
2) The as_vendor() method will need to do a bit more, and know a bit
more about the expression than is strictly needed. The backend approach
would be:
def as_sql(self, compiler, connection):
lhs_sql, lhs_params = compiler.compile(self.lhs)
rhs_sql, rhs_params = compiler.compile(self.rhs)
return connection.less_than_sql(lhs_sql, rhs_sql, lhs_params,
rhs_params)
# In mongo backend
def less_than_sql(self, lhs_sql, rhs_sql, lhs_params, rhs_params):
return "%s < %s" % (lhs_sql, rhs_sql), (lhs_params, rhs_params)

The as_vendor approach gives:
def as_mongo(self, compiler, connection):
lhs_sql, lhs_params = compiler.compile(self.lhs)
rhs_sql, rhs_params = compiler.compile(self.rhs)
return "%s < %s" % (lhs_sql, rhs_sql), (lhs_params, rhs_params)

That is, the as_{vendor} methods need to repeat the compile() calls, and
also know how to compile the internal attributes of the LessThan class.

I'm not sure how to design the backend approach in a way that is both
fast, and also allows for default implementation, and custom
implementation if a backend needs it.

As said before the lookups and transforms already use as_{vendor}. I
hope it will work just fine, but it is hard to be sure about this. If we
add the .register() API to lookups and transforms to 1.7, then we can
change how the registration works internally, and also deprecate the
current way if there is need to do that.

- Anssi


Shai Berger

unread,
Jun 19, 2014, 7:04:59 AM6/19/14
to django-d...@googlegroups.com
Hi,

A minor style point:

On Thursday 19 June 2014 02:56:10 Josh Smeaton wrote:
>
> class Lower(Func):
> function = 'LOWER'
>
>[...]
>
> def mongo_lower(self, compiler, connection):
> self.function = '$toLower'
> return self.as_sql(compiler, connection)
>
> setattr(Lower, 'as_mongo', mongo_lower)
>
I would spell this as

Lower.as_mongo = mongo_lower

Is there a reason I'm missing not to do so?

Thanks,
Shai.

Josh Smeaton

unread,
Jun 19, 2014, 7:15:27 AM6/19/14
to django-d...@googlegroups.com
Nope that's completely fine. I copied that example from the tests earlier on which interpolates connection.vendor into a string. If you know which backend you're modifying (and you should) then Func.as_vendor = .. Would be preferred.
> --
> You received this message because you are subscribed to a topic in the Google Groups "Django developers" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/HggiPzwkono/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/201406191404.32091.shai%40platonix.com.

Anssi Kääriäinen

unread,
Jun 27, 2014, 7:04:28 AM6/27/14
to django-d...@googlegroups.com
On Thursday, June 19, 2014 12:23:20 AM UTC+3, Carl Meyer wrote:
On 06/18/2014 02:59 PM, Aymeric Augustin wrote:
> 2014-06-18 19:18 GMT+02:00 Carl Meyer <ca...@oddbird.net
> <mailto:ca...@oddbird.net>>:
>
>     If you need a Function in your
>     project (whether provided by Django or by a third-party library), and
>     the Function doesn't natively support the database backend you're using,
>     you can simply subclass it, add the appropriate as_vendor method, and
>     use your subclass.
>
>
> I was thinking about third-party database backends. How are they expected
> to ship customised implementations of Functions included in Django?

Yes, I wasn't thinking about this case in my first mail, and I agree
that this is a serious problem for the pure as_vendor() approach; it's
not acceptable IMO to expect third-party database backends to
monkeypatch their as_vendor() methods onto the built-in Function classes.

I think database backends should have a hook to override the SQL
implementation of any given Function. I don't think this needs to imply
pushing the default implementation of all Functions down into the
database backend (that just makes life unnecessarily difficult for
third-party Functions).

This is possible to do by supplying a custom SQLCompiler class for the backend, and overriding its .compile() method.

Personally I don't see usage of as_vendor() as that problematic. Supplying as_vendor in first DatabaseWrapper.__init__() should work. Yes, we are dynamically altering the node's class. But we are just using the __dict__ of the node to store backend specific implementation instead of using a dictionary somewhere in the backend for the implementation. Using the __dict__ has the advantage that inheritance works automatically.

As multiple core developers have said they don't like as_vendor() for 3rd party backend support it might be better to offer official hook for altering the generated SQL. That is, change the SQLCompiler.compile() method from

    def compile(self, node):
        vendor_impl = getattr(
            node, 'as_' + self.connection.vendor, None)
        if vendor_impl:
            return vendor_impl(self, self.connection)
        else:
            return node.as_sql(self, self.connection)

to

    def compile(self, node):
        ret = self.connection.compile(node)
        if ret is not None:
            return ret
        vendor_impl = getattr(
            node, 'as_' + self.connection.vendor, None)
        if vendor_impl:
            return vendor_impl(self, self.connection)
        else:
            return node.as_sql(self, self.connection)

The main reason for offering a hook is that overriding SQLCompiler is a bit problematic. There exists various subclasses of SQLCompiler, and the backend needs to support those, too.

 - Anssi

Josh Smeaton

unread,
Jun 27, 2014, 7:27:08 AM6/27/14
to django-d...@googlegroups.com
I agree that 3rd party backends should have a hook to provide their own customisations. I still think that the as_vendor approach is useful provided that the backend has a chance to register their customisations (where required - they still automatically use the default as_sql() method if there is no as_vendor method) at an appropriate time. I guess I don't really understand the particular issues with registering custom as_vendor implementations.

Yes, I wasn't thinking about this case in my first mail, and I agree 
that this is a serious problem for the pure as_vendor() approach; it's 
not acceptable IMO to expect third-party database backends to 
monkeypatch their as_vendor() methods onto the built-in Function classes. 

Why is this not acceptable? I think I'm missing something.

- Josh

Michael Manfre

unread,
Jun 27, 2014, 9:07:04 AM6/27/14
to django-d...@googlegroups.com
On Fri, Jun 27, 2014 at 7:04 AM, Anssi Kääriäinen <anssi.ka...@thl.fi> wrote:
This is possible to do by supplying a custom SQLCompiler class for the backend, and overriding its .compile() method.

Personally I don't see usage of as_vendor() as that problematic. Supplying as_vendor in first DatabaseWrapper.__init__() should work. Yes, we are dynamically altering the node's class. But we are just using the __dict__ of the node to store backend specific implementation instead of using a dictionary somewhere in the backend for the implementation. Using the __dict__ has the advantage that inheritance works automatically.

As multiple core developers have said they don't like as_vendor() for 3rd party backend support it might be better to offer official hook for altering the generated SQL. That is, change the SQLCompiler.compile() method from

    def compile(self, node):
        vendor_impl = getattr(
            node, 'as_' + self.connection.vendor, None)
        if vendor_impl:
            return vendor_impl(self, self.connection)
        else:
            return node.as_sql(self, self.connection)

to

    def compile(self, node):
        ret = self.connection.compile(node)

If this is going to be moved off SQLCompiler, I think DatabaseOperations is a better place for it because we already have DatabaseOperatoins.combine_expression. The compile method will need access to the compiler to match the signature of as_sql (and it is necessary in some cases to inspect the query).
 
The main reason for offering a hook is that overriding SQLCompiler is a bit problematic. There exists various subclasses of SQLCompiler, and the backend needs to support those, too.

All of the subclasses are going to inherit SQLCompiler.compile and do the same exact logic. I don't understand what problem is caused by overriding compile that is solved by a function call to compile on some other class.

Regards,
Michael Manfre

Muskan arora

unread,
Jun 27, 2014, 9:53:03 AM6/27/14
to django-d...@googlegroups.com
I am working on a Django project in which I need to calculate sum in
models.py/admin.py.Is it possible? If yes,then how?

Muskan
Blog: http://muskana912.wordpress.com/
I've learned more by not following bad examples than by following good
examples. - Paulo Coelho ! :)




> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/daf867fe-bedf-4134-bc92-728e2fa2c17f%40googlegroups.com.

Anssi Kääriäinen

unread,
Jun 30, 2014, 3:04:53 AM6/30/14
to django-d...@googlegroups.com
On Friday, June 27, 2014 4:07:04 PM UTC+3, Michael Manfre wrote:
If this is going to be moved off SQLCompiler, I think DatabaseOperations is a better place for it because we already have DatabaseOperatoins.combine_expression. The compile method will need access to the compiler to match the signature of as_sql (and it is necessary in some cases to inspect the query).

+1 to both.

 
The main reason for offering a hook is that overriding SQLCompiler is a bit problematic. There exists various subclasses of SQLCompiler, and the backend needs to support those, too.

All of the subclasses are going to inherit SQLCompiler.compile and do the same exact logic. I don't understand what problem is caused by overriding compile that is solved by a function call to compile on some other class.

There are various subclasses of compiler in the standard ORM. A custom backend needs to provide all of them. The problem is best explained by looking at mysql/compiler.py: https://github.com/django/django/blob/master/django/db/backends/mysql/compiler.py.

 - Anssi

Josh Smeaton

unread,
Jun 30, 2014, 3:49:00 AM6/30/14
to django-d...@googlegroups.com
If this is going to be moved off SQLCompiler, I think DatabaseOperations is a better place for it because we already have DatabaseOperatoins.combine_expression. The compile method will need access to the compiler to match the signature of as_sql (and it is necessary in some cases to inspect the query).

So what kind of hook are we looking at here? Supplying a different `.compile()` method that gets the node passed in? Won't that lead to all kinds of isinstance checks to see if the compiler wants to change the SQL of a particular node? Can you maybe give me a brief example of what is wanted here if I've misunderstood above?

> There are various subclasses of compiler in the standard ORM. A custom backend needs to provide all of them. The problem is best explained by looking at mysql/compiler.py: https://github.com/django/django/blob/master/django/db/backends/mysql/compiler.py.

Yes, but all of the other compilers inherit the default SQLCompiler, and will inherit the .compile method since they all subclass SQLCompiler. I think that's what Michael was getting at.

- Josh

Anssi Kääriäinen

unread,
Jun 30, 2014, 4:28:10 AM6/30/14
to django-d...@googlegroups.com
On Mon, 2014-06-30 at 10:48 +0300, Josh Smeaton wrote:

> So what kind of hook are we looking at here? Supplying a different
> `.compile()` method that gets the node passed in? Won't that lead to
> all kinds of isinstance checks to see if the compiler wants to change
> the SQL of a particular node? Can you maybe give me a brief example of
> what is wanted here if I've misunderstood above?

To me it seems there are two options - do isinstance() checks, or
provide a dictionary of class -> method. If you go with isinstance()
checks that could realistically lead to dozens of isinstance checks
chained in a long if-elif list.

The dictionary approach seems easier to support:

def compile(self, node, compiler):
if node.__class__ in self.node_implementations:
return self.node_implementation[node.__class__](
node, compiler, self.connection).

Still, I don't see *any* advantage of doing this compared to just
providing the same implementation method in the node itself with the
as_vendor syntax. I don't know of any problem that is caused by using
as_vendor(), but avoided using backend specific .compile() hook.
Registering the as_vendor() methods has been the biggest complaint, but
that can be avoided by doing the registration on first __init__ of the
backend - there is no way SQL is going to be generated for given backend
without it being initialized first.

-Anssi


Josh Smeaton

unread,
Jun 30, 2014, 5:56:50 AM6/30/14
to django-d...@googlegroups.com
>Still, I don't see *any* advantage of doing this compared to just 
providing the same implementation method in the node itself with the 
as_vendor syntax

Me neither. I think I confused the subject when I brought up putting differences into the backend - but I was trying to highlight the difference because it's not usually how django does things. If someone has an actual concern with the `as_vendor()` can you please mention it, and why. Otherwise I think we can proceed as-is with the as_vendor. Do you agree Anssi?

Josh

Anssi Kääriäinen

unread,
Jun 30, 2014, 6:54:54 AM6/30/14
to django-d...@googlegroups.com
On Mon, 2014-06-30 at 12:56 +0300, Josh Smeaton wrote:
> >Still, I don't see *any* advantage of doing this compared to just
> providing the same implementation method in the node itself with the
> as_vendor syntax


>
>
> Me neither. I think I confused the subject when I brought up putting
> differences into the backend - but I was trying to highlight the
> difference because it's not usually how django does things. If someone
> has an actual concern with the `as_vendor()` can you please mention
> it, and why. Otherwise I think we can proceed as-is with the
> as_vendor. Do you agree Anssi?

We can continue with as_vendor() no matter what the resolution is. We
already have the exact same problem for Lookups and Transforms, so we
aren't adding new problems by continuing with as_vendor().

Michael feels backend.ops.compile() is useful for django-mssql. Also,
multiple core developers think the as_vendor() syntax is bad because it
is monkey patching, or places 3rd party backends in unfair position. I
don't agree, but I don't see other core developers supporting my
position.

- Anssi
> --
> You received this message because you are subscribed to the Google
> Groups "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-develop...@googlegroups.com.
> To post to this group, send email to
> django-d...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/92b100f7-3daa-4e7b-ade6-7e8fa24ed046%40googlegroups.com.

Marc Tamlyn

unread,
Jun 30, 2014, 8:08:38 AM6/30/14
to django-d...@googlegroups.com
FWIW I agree with Anssi

Marc


Loic Bistuer

unread,
Jun 30, 2014, 9:16:17 AM6/30/14
to django-d...@googlegroups.com
+1, as_vendor() is IMO rather clean and expressive as a end user API.

3rd party backends are indeed a concern but we discussed this briefly on IRC today and I believe we have an acceptable solution for them, Josh is planning to write up about it.

-- 
Loic

Josh Smeaton

unread,
Jun 30, 2014, 8:11:22 PM6/30/14
to django-d...@googlegroups.com
This is how we think we can ask 3rd party backends to provide implementations for functions they need to modify:


class DatabaseWrapper(BaseDatabaseWrapper):

    def __init__(self, *args, **kwargs):
        self.patch_functions()

    def patch_functions(self):
        """ 
        all other functions work correctly, so we just
        need to patch Length
        """
        from django.db.models.functions import Length
        def new_length(self, compiler, connection):
            self.function = 'LEN' # rather than 'LENGTH'
            return self.as_sql(compiler, connection)
        Length.as_mssql = new_length

I think this is rather elegant - patch what you need to, and leave everything else. It uses the same API we'd ask 3rd party libraries to use so there is no disconnect between libraries and backends. Anssi pointed out the DatabaseWrapper.__init__ method as the place of registration, since it will be run before any functions are created. Perhaps we could create a new method on DatabaseWrapper called `provide_function_implementations` that 3rd party backends can override to effectively replace a custom `patch_functions`, but it'd really only be a "blessed" method used for documenting the agreed extension point.

Carl, Michael, and Aymeric, I'm interested in hearing your thoughts on this. Does this work for you?

Regards,

Shai Berger

unread,
Jul 1, 2014, 10:49:55 AM7/1/14
to django-d...@googlegroups.com
On Monday 30 June 2014 17:11:22 Josh Smeaton wrote:
> This is how we think we can ask 3rd party backends to provide
> implementations for functions they need to modify:
>
>
> class DatabaseWrapper(BaseDatabaseWrapper):
>
> def __init__(self, *args, **kwargs):
> self.patch_functions()
>
> def patch_functions(self):
> """
> all other functions work correctly, so we just
> need to patch Length
> """
> from django.db.models.functions import Length
> def new_length(self, compiler, connection):
> self.function = 'LEN' # rather than 'LENGTH'
> return self.as_sql(compiler, connection)
> Length.as_mssql = new_length
>
>

I think we can reach very similar results, with a much nicer API, by letting
the backends just override the function classes.

In more detail: Add to DatabaseWrapper an attributes, say "functions" or
"overrides" (if the same mechanism is used for lookups and transforms), such
that -- if the attribute has an attribute named like the function class, it is
used instead of the class, and its as_sql is called.

Thus, for the above example, in the backend somewhere, we'd have

class Overrides(object):

class Length(models.Length):
function = 'LEN'

# ... any other overrides similarly defined

In the DatabaseWrapper, we'll just have

class DatabaseWrapper(BaseDatabaseWrapper):

def __init__(self, *args, **kwargs):
...
self.overrides = Overrides()

Much like we combine other parts into the wrapper.

This allows, IMO, more elegant coding of the backend.

I specified that if we take the class from the backend then we use as_sql(),
rather than as_vendor(), because

a) As in the exmaple above, we then don't even always have to define
as_vendor();

b) It allows us to still support as_vendor() in the function classes
themselves; this is an elegant way for the Function author to support different
backends (including 3rd-party ones).

I've hinted before that I'm developing ideas about "DbConfig classes", similar
to AppConfig classes. Such overrides would certainly be part of those configs,
and would allow an integrator to easily add support of a 3rd- (or 1st-) party
function in a 3rd- (or 1st-) party backend.

Just to summarize, I propose that the function used will be

1) If hasattr(connection.overrides, "Func"): connection.overrides.Func.as_sql

2) else, if hasattr(models.Func, "as_vendor"): models.Func.as_vendor

3) else models.Func.as_sql

Implementation-wise, I think this should be done by switching the class of the
Function object if necessary "under its feet"; that's a slightly fishy, monkey-
patchy operation, but it would be written once in core code, instead of in
every backend.

Rebuttals welcome,

Shai.

Anssi Kääriäinen

unread,
Jul 1, 2014, 11:26:36 AM7/1/14
to django-d...@googlegroups.com
On Tue, 2014-07-01 at 17:49 +0300, Shai Berger wrote:

> I think we can reach very similar results, with a much nicer API, by letting
> the backends just override the function classes.

The backend specific class must be instantiated at query compile time.
How is the data contained in the original class copied to the overriding
class? One possibility is to copy the original instance's __dict__ to
the overriding class.

Also, "if the attribute has an attribute named like the function class"
isn't good, clashes between plain class names is expected. The overrides
could be a dictionary of original class -> overriding class. So,
something like this:

class BackendLower(models.Lower):
def __init__(self, from_node):
# The hacky way
self.__dict__ = from_node.__dict__

def as_sql(self, qn, connection):
....
overrides = {models.Lower: BackendLower}

And compiler.compile() would look something like this:

if node.__class__ in connection.overrides:
return connection.overrides[node.__class__](node).as_sql(...)
else:
return node.as_sql(...)

Using the above way there is no need to alter classes dynamically. On
the other hand, copying data from the original class to the implementing
class can be risky (circular dependencies for example), and it is a bit
more expensive. Still, inheritance will not work like with as_vendor(),
but I am not sure how common subclasses without custom as_sql() are.

- Anssi

Michael Manfre

unread,
Jul 1, 2014, 3:11:20 PM7/1/14
to django-d...@googlegroups.com
Monkey patching is not really an API hook. It's what non-core code is forced to do when the core is not flexible enough to support what it needs. If this is the chosen route, which is the way it appears to be heading, then at least try to hide the fact that it's a monkey patch and mimic SQLAlchemy [1] with a decorator, or some type of register method.

I'm on the fence about whether I'd even use a monkey patch mechanism, instead of defining my own compile() for each of the SQLCompilers; DatabaseOperations.compile_node(self, compiler, node) would be better than overriding each compile. To me, it seems much more elegant to duck type the nodes as they're being compiled, instead of a giant import and monkey patch setup function.

Regards,
Michael Manfre



--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Aymeric Augustin

unread,
Jul 1, 2014, 3:16:02 PM7/1/14
to django-d...@googlegroups.com
On 1 juil. 2014, at 21:10, Michael Manfre <mma...@gmail.com> wrote:

> Monkey patching is not really an API hook. It's what non-core code is forced to do when the core is not flexible enough to support what it needs. If this is the chosen route, which is the way it appears to be heading, then at least try to hide the fact that it's a monkey patch and mimic SQLAlchemy [1] with a decorator, or some type of register method.

For what it’s worth, I heartily agree with that. Unfortunately, I don’t have the time to make an alternative proposal. I shan't be offended if we keep that API — or rather, the lack thereof.

--
Aymeric.




Josh Smeaton

unread,
Jul 1, 2014, 7:45:02 PM7/1/14
to django-d...@googlegroups.com
Thanks for the input here - I didn't think there'd be such an aversion to directly monkey patching the function objects, but there clearly is from backend maintainers and core developers. So time to move past that and come up with something that's acceptable.

The solution Anssi presented above in reply to Shai seems like overkill. The performance cost of copying attributes over alone worries me.

> Still, inheritance will not work like with as_vendor(), but I am not sure how common subclasses without custom as_sql() are

Nearly every Function implementation will inherit from the Func type without providing an `as_sql`, the it will be very common if I've understood correctly.

Would changing the compile method to first try a compile_node on operations suffice?

def compile(self, node):
        sql, params = self.connection.ops.compile_node(self, node)
        if sql is not None:
              return sql, params
        vendor_impl = getattr(
            node, 'as_' + self.connection.vendor, None)
        if vendor_impl:
            return vendor_impl(self, self.connection)
        else:
            return node.as_sql(self, self.connection)

This would also work for anything adhering to the Query Expression API. The issue I have with this is the overhead of calling ops.compile_node() for every single node when I *think* only a small percentage of nodes will require a backend implementation anyway. 


- Josh

Anssi Kääriäinen

unread,
Jul 2, 2014, 1:50:33 AM7/2/14
to django-d...@googlegroups.com



> > Still, inheritance will not work like with as_vendor(), but I am not
> sure how common subclasses without custom as_sql() are
>
>
> Nearly every Function implementation will inherit from the Func type
> without providing an `as_sql`, the it will be very common if I've
> understood correctly.

This is actually a good point against as_vendor() - Funcs do not need
custom as_sql(), they need custom class attribute (different SQL
function name) for some backends. Of course one can achieve that by
using as_vendor(), too.
>
> Would changing the compile method to first try a compile_node on
> operations suffice?
>
>
> def compile(self, node):
> sql, params = self.connection.ops.compile_node(self,
> node)
> if sql is not None:
> return sql, params
> vendor_impl = getattr(
> node, 'as_' + self.connection.vendor, None)
> if vendor_impl:
> return vendor_impl(self, self.connection)
> else:
> return node.as_sql(self, self.connection)
>
>
> This would also work for anything adhering to the Query Expression
> API. The issue I have with this is the overhead of calling
> ops.compile_node() for every single node when I *think* only a small
> percentage of nodes will require a backend implementation anyway.

The overhead is likely small, but it is hard to know without
benchmarking. The WhereNode instances go through compiler.compile(), so
generating SQL for qs.filter(*lots_of_args) should tell pretty well how
much overhead there is.

- Anssi


Anssi Kääriäinen

unread,
Jul 2, 2014, 4:22:15 AM7/2/14
to django-d...@googlegroups.com
On Wed, 2014-07-02 at 08:56 +0300, Anssi Kääriäinen wrote:
> The overhead is likely small, but it is hard to know without
> benchmarking. The WhereNode instances go through compiler.compile(), so
> generating SQL for qs.filter(*lots_of_args) should tell pretty well how
> much overhead there is.

The worst-case overhead of adding ops.compile_node(node, compiler) seems
to be around 5%. The test case is:

filter_clause = models.Q()
for i in range(0, 10000):
filter_clause |= models.Q(id__exact=i)
qs = AModel.objects.filter(filter_clause)
for i in range(0, 100):
str(qs.query)

The slowdown seems to be almost the same even if you use just 10 Q()
objects instead of the non-realistic 10000 Q() objects used above. The
reason is that when doing just str(qs.query) calls compiler.compile()
takes more than 60% of execution time even with 10 Q() objects. Main
portion of that execution time is spent in lookups.py as_sql(). Here is
top portion of the profile for 10 Q() objects, without
ops.compile_node() addition:

1 0.007 0.007 3.166 3.166 tester.py:1(<module>)
2000 0.011 0.000 2.622 0.001 query.py:188(__str__)
2000 0.011 0.000 2.611 0.001 query.py:199(sql_with_params)
2000 0.075 0.000 2.552 0.001 compiler.py:85(as_sql)
48000/4000 0.187 0.000 2.146 0.001 compiler.py:74(compile)
8000/4000 0.167 0.000 2.122 0.001 where.py:85(as_sql)
20000 0.148 0.000 1.723 0.000 lookups.py:148(as_sql)
20000 0.267 0.000 0.942 0.000 lookups.py:138(process_lhs)
41 0.006 0.000 0.608 0.015 __init__.py:1(<module>)
20000 0.127 0.000 0.582 0.000 lookups.py:96(process_rhs)
1 0.000 0.000 0.533 0.533 __init__.py:11(setup)
20000 0.081 0.000 0.360 0.000 lookups.py:87(get_db_prep_lookup)
2 0.001 0.001 0.331 0.165 log.py:1(<module>)
20000 0.047 0.000 0.322 0.000 lookups.py:92(process_lhs)
3 0.001 0.000 0.303 0.101 debug.py:1(<module>)
1 0.000 0.000 0.278 0.278 urlresolvers.py:8(<module>)
20000 0.149 0.000 0.257 0.000 __init__.py:655(get_db_prep_lookup)

The refactored lookup implementation is performing somewhat worse than
in 1.6, where compiler.py as_sql() takes only 1.996 seconds (2.552 for
1.7). We are just doing more in 1.7 than in 1.6, as the lookup's lhs is
query expression that needs to also go through compiler.compile().

There are some optimization points in 1.7 that could be used to remove
some of the overhead (process_rhs and get_db_prep_lookup coding
specifically). With some effort it is possible to get back to 1.6 speeds
if that is seen as important.

- Anssi


Josh Smeaton

unread,
Oct 24, 2014, 6:39:26 AM10/24/14
to django-d...@googlegroups.com
This discussion got a little off track, and I'd like to bring it back to the original purpose. Which SQL functions should we implement when the aggregate/annotation refactor lands? I'll take the output of this discussion and create a ticket in trac once we've at least got a start on a minimal subset. I'll start with a set of functions that should be common enough to warrant inclusion. Please add to this list (or suggest removals from the list).

- Coalesce
- ToLower/Lower
- ToUpper/Upper
- Length
- Case-When / IIF 
- Substring

I think this set is fairly ubiquitous, and should have good support from all the built in backends that Django supports. More esoteric functions or structures (like analytic functions) can always be implemented in libraries or possibly a contrib app if the community deems it worthwhile.

Thanks, 

Josh Smeaton

unread,
Nov 2, 2014, 6:21:36 PM11/2/14
to django-d...@googlegroups.com
I've opened the ticket https://code.djangoproject.com/ticket/23753 to track which functions should be implemented.
Reply all
Reply to author
Forward
0 new messages