non-relational DB

50 views
Skip to first unread message

Waldemar Kornewald

unread,
Oct 22, 2009, 7:35:48 AM10/22/09
to django-developers
Hi everyone,
this rather long mail contains a status report and instructions for
contributors and implementation notes for Django core developers. If
you only want to know the status you can stop after the first section.
If you want to contribute I hope this provides a good starting point
into our port.

---------------------------------------
Status report

We've got pretty far with our App Engine port. For example, the
sessions db and cached_db backends both work unmodified on App Engine.
You can also order results and use basic filter()s as supported by the
low-level App Engine API (gt, gte, lt, lte, exact, pk__in). You can
also use QuerySet.order_by(), .delete(), .count(), Model.save(),
.delete().

This is our second porting attempt (it's not in the old repository).
Our first attempt had too many conflicts with the multi-db branch
(esp. the one on github). This time we just hacked everything
together. We didn't concentrate on cleaning up the current backend
API. We've also disabled SQL support.

The next step is to move all the hacks into a nice backend API (at the
same time making sure that it won't conflict with multi-db) and
re-enable SQL support. That's where we need help. Also, if you want to
work on SimpleDB support this is the right time to join. The App
Engine backend itself can be handled by Thomas Wanschik and me -
contributions in this area are not absolutely necessary, so please
concentrate on the cleanup if you want to help.

Now to the details (for those who want to contribute).

---------------------------------------
Introducing QueryGlue

The old Django code was distributed across three layers:
* django.db.models.queryset.QuerySet
* django.db.models.sql.query.Query (from now on just sql.Query)
* backend

When a new QuerySet is instantiated (e.g. by calling
Model.objects.all()) it asks the backend for its Query class and then
creates an instance of that class. By default, this class is
sql.Query. Only the Oracle backend has its own Query which subclasses
sql.Query.

Normally, sql.Query builds the query on-the-fly. Whenever you call
QuerySet.filter(<filters>) the filters get put into a
Q(<filters>) and passed to
sql.Query.add_q( Q(...) ).
This function iterates over all filter rules in the Q object and calls
sql.Query.add_filter() for each individual filter.
This in turn directly modifies sql.Query.where which is a tree
structure that represents the WHERE clause. It already contains
information about the JOIN type for each filter (INNER, OUTER), the
fields that get referenced by the filter, the column and table
aliases, and so on. It already does a lot of what we need for
non-relational backends, but it's too SQL-specific.

The current behavior is also a problem for multi-db because it makes
too many assumptions about the storage format of the filter rules. The
user could call QuerySet.using(other_connection) anytime, so QuerySet
shouldn't really work with the low-level sql.Query class before it
actually executes the query.

We've solved this problem by introducing a backend-independent query
representation between QuerySet and the low-level Query (sql.Query,
appengine.Query, etc.). This representation is called QueryGlue. You
can find it in django.db.models.queryglue. It
provides almost exactly the same "public" API as sql.Query (so it can
easily be integrated with QuerySet). Each filter() call gets
translated into a tree structure that is inspired by sql.Query.where,
but it doesn't contain any information about the kind of JOIN.
Instead, it stores high-level important information like whether we're
filtering on a primary key, which columns and tables are involved in a
JOIN, etc.

---------------------------------------
The low-level Query class

Once the query needs to be executed (e.g., by calling .count() or by
iterating over the query) the QueryGlue instance creates a new
low-level Query instance which gets the QueryGlue as its only
parameter. Currently, the low-level Query class is hard-coded to
GAEQuery/BaseQuery in django.db.models.nonrelational.query.

Then, QueryGlue calls the Query's respective execution function
(results_iter(), count(), etc.). The
constructor only gets the QueryGlue instance. Then, we call the
respective execution function (results_iter(), count(), etc.) on the
instantiated low-level Query. Our GAEQuery can now iterate over all
filters in QueryGlue.filters and convert them to an App Engine Query
object.

---------------------------------------
subqueries

Instead of working with subquery classes we've added delete_bulk(),
insert(), etc. directly to QueryGlue and the low-level Query class. If
sql.Query really needs the current design those functions can still be
routed to the respective subquery instance, but on App Engine it's
easier to handle those operations in a separate function.

---------------------------------------
The cleanup

We made a few not-so-clean changes to Django itself. I've attached a
diff, so contributors can easily find all the changes we did to Django
(they're also commented with TODO and GAE):

............................
* disabled multi-table inheritance;
this could be emulated as described on the Django wiki
http://code.djangoproject.com/wiki/NonSqlBackends

See
django/db/models/base.py: line 147

............................
* disabled deletion of related objects in Model.delete() and QuerySet.delete()

See
django/db/models/query.py: lines 1036, 1065

............................
* replaced sql.subqueries.*Query usage with simple functions on a
single Query class (insert_or_update() instead of InsertQuery and
UpdateQuery)

See
django/db/models/query.py: lines 1058, 1088

............................
* commented out distinction between insert and update in
Model.save_base() because there's no such concept in App Engine (and
SimpleDB, AFAIK)

See
django/db/models/base.py: lines 470, 475

............................
The long-term goal is of course to clean this up and move most of
these changes into the backend API.

---------------------------------------
Common non-relational features

The plan is to add support for simple joins and select_related to all
non-relational backends by
either subclassing the backend's Query class on-the-fly with a
JoinQuery or by supporting something like query pre-processors which
can be added above the low-level Query class. We haven't thought about
the details, yet, but I hope you get the idea.

---------------------------------------
SQL layer details:

The ugly detail is that sql.subqueries contains specialized query
classes like InsertQuery, DeleteQuery, etc. which subclass the
backend's Query class. This means that currently, the module loading
process jumps around:
* sql/__init__.py imports sql.query and then sql.subqueries
* sql.query creates the base Query class
* after that, sql.query allows the backend to override the Query class
* sql.subqueries creates subclasses which derive from Query

In multi-db in SVN this is uglier because the subquery classes don't
have just one single sql.Query base class from which to derive,
anymore. There can be multiple backends, each with their own sql.Query
class, so the subqueries have to be maintained by the backend (with
some multi-inheritance magic and manual caching of the custom
subclasses).

In multi-db on github this is much cleaner: The backends can't
override sql.Query, anymore. Instead, there's an SQLCompiler class
which can be overridden by the backend to take care of
backend-specific details. sql.Query stores a slightly more abstract
representation of the query. This multi-db branch moves a lot of code
around. That's why we should try to keep as much code as possible
where it is (at least until the branch gets merged into trunk).

---------------------------------------
The source

The test project and our unit tests are here:
http://bitbucket.org/wkornewald/django-testapp/

The modified Django source and the backend is here:
http://bitbucket.org/wkornewald/django-nonrel-hacked/

We've patched the trunk branch. Unforunately, the branches are
unnamed (I converted the git mirror because the hg mirror's branches
on bitbucket are broken). You should be able to find the right branch
with "hg heads"
and "hg up -C" to it. Normally our branch should be at tip, anyway, so
you don't need to do anything.

When merging you need to find the trunk branch with "hg heads" and "hg
merge <revnum>" with the trunk head. If this becomes a huge problem
we'll switch to the django-trunk mirror, but I wanted to keep the
option to switch to Alex' multidb branch if that's better, so I chose
this sub-optimal Django mirroring solution.

---------------------------------------
Task management

Our tasks are managed in a Google Spreadsheet:
https://spreadsheets.google.com/ccc?key=0AnLqunL-SCJJdE1fM0NzY1JQTXJuZGdEa0huODVfRHc&hl=en

The task list isn't complete, yet. We're working on that.

Bye,
Waldemar Kornewald

django.diff

Waldemar Kornewald

unread,
Oct 22, 2009, 7:46:38 AM10/22/09
to Django developers
Hi again,
now a little question:

Some fields do type conversions. For example, TimeField converts
datetime objects into time objects.
App Engine doesn't support time, but only datetime, so should we do
such conversions at the backend level or should we expect the field to
handle it (esp. if it already has such conversion code)?

What's the status of the email backends ticket? There hasn't been any
reply to Andi Albrecht's latest patch and comment.
http://code.djangoproject.com/ticket/10355
This is essential for supporting all kinds of cloud platforms.

Bye,
Waldemar Kornewald

Russell Keith-Magee

unread,
Oct 22, 2009, 8:07:59 AM10/22/09
to django-d...@googlegroups.com
On Thu, Oct 22, 2009 at 7:46 PM, Waldemar Kornewald
<wkorn...@gmail.com> wrote:
>
> Hi again,
> now a little question:
>
> Some fields do type conversions. For example, TimeField converts
> datetime objects into time objects.
> App Engine doesn't support time, but only datetime, so should we do
> such conversions at the backend level or should we expect the field to
> handle it (esp. if it already has such conversion code)?

I'm unsure what problem you're having here. The backend needs to
return a type that the TimeField can turn into a Python Time object.
TimeField is fairly liberal in what it will accept - DateTime objects,
Time objects, and strings that express a time will all be handled.

As long as your backend returns one of these acceptable types, you're done.

> What's the status of the email backends ticket? There hasn't been any
> reply to Andi Albrecht's latest patch and comment.
> http://code.djangoproject.com/ticket/10355
> This is essential for supporting all kinds of cloud platforms.

We're in the process of doing feature voting for v1.2. Personally, I'm
happy with the state of the patch, but there have been a couple of -1
votes for the patch, which means that some people still need to be
convinced that it's the right thing to do. Once voting is finished, we
may need to revisit this issue on django-dev.

Yours,
Russ Magee %-)

Waldemar Kornewald

unread,
Oct 22, 2009, 8:13:13 AM10/22/09
to django-d...@googlegroups.com
On Thu, Oct 22, 2009 at 2:07 PM, Russell Keith-Magee
<freakb...@gmail.com> wrote:
>
> On Thu, Oct 22, 2009 at 7:46 PM, Waldemar Kornewald
> <wkorn...@gmail.com> wrote:
>>
>> Hi again,
>> now a little question:
>>
>> Some fields do type conversions. For example, TimeField converts
>> datetime objects into time objects.
>> App Engine doesn't support time, but only datetime, so should we do
>> such conversions at the backend level or should we expect the field to
>> handle it (esp. if it already has such conversion code)?
>
> I'm unsure what problem you're having here. The backend needs to
> return a type that the TimeField can turn into a Python Time object.
> TimeField is fairly liberal in what it will accept - DateTime objects,
> Time objects, and strings that express a time will all be handled.
>
> As long as your backend returns one of these acceptable types, you're done.

Great. I just wasn't sure if this was just an internal implementation
detail which we better shouldn't rely on in our backends.

Bye,
Waldemar Kornewald

Andi Albrecht

unread,
Oct 22, 2009, 8:26:22 AM10/22/09
to django-d...@googlegroups.com
On Thu, Oct 22, 2009 at 2:07 PM, Russell Keith-Magee
<freakb...@gmail.com> wrote:
>
To give a short feedback. I'm still there and I've read most of the
comments given in the voting sheet. I'd happy to address the concerns
once voting is finished - in a different thread, of course :)

>
> Yours,
> Russ Magee %-)
>
> >
>

Thomas Wanschik

unread,
Oct 22, 2009, 5:09:50 PM10/22/09
to Django developers
http://code.google.com/p/live-android/
I think Waldermar wanted to say: 'Only the Oracle backend has its own
Query which subclasses sql.query.BaseQuery.' instead of 'sql.Query'.

We added a module called 'nonrelational' to django.db.models. There we
define query.py which itself defines a BaseQuery class (the equivalent
to sql.query.BaseQuery). Just like Oracle subclasses
sql.query.BaseQuery we could use the same mechanism and subclass
nonrelational.BaseQuery in order to get the GAEQuery. For now this
subclassing mechanism is called in the module sql.query (not sql.Query
from above) by connection.ops.query_class(BaseQuery) and uses
sql.query.BaseQuery as the BaseQuery. We need some way of telling
which class should be the BaseQuery. This could be done via the
settings, something similar to settings.DATABASE_TYPE =
'nonrelational' and for sql it would be settings.DATABASE_TYPE= 'sql'.
According to this the right BaseQuery could be chosen. Additionally
these mechanisme shouldn't be called in sql.query. But that's only a
proposal of making the port a little bit more clean. I don't know if
there are any conflicts with multi-db this way. What do you guys think
of it?
This should be done cleaner too. Instead of using QueryGlue for
QuerySet.query, QuerySet.query should be an instance of the actual
Query class (see above for a proposal of a loading mechanism for the
Query class) and the actual QueryGlue instance should be passed to
QuerySet.query somehow. QuerySet's methods will update the QueryGlue
instance only. This mechanism could be used for the existing sql
backends too (QueryGlue has not to be used for this but at least some
high-level information tree. Much code of QueryGlue could be reused
for this tree but has to be extended too). Of course that will result
in changes to the existing backends but would result in a flexible way
to write backends by letting backends traverse the tree and form the
actual query for the specified database as soon as the query is
executed).

> ---------------------------------------
> subqueries
>
> Instead of working with subquery classes we've added delete_bulk(),
> insert(), etc. directly to QueryGlue and the low-level Query class. If
> sql.Query really needs the current design those functions can still be
> routed to the respective subquery instance, but on App Engine it's
> easier to handle those operations in a separate function.
>
> ---------------------------------------
> The cleanup
>
> We made a few not-so-clean changes to Django itself. I've attached a
> diff, so contributors can easily find all the changes we did to Django
> (they're also commented with TODO and GAE):
>
> ............................
> * disabled multi-table inheritance;
> this could be emulated as described on the Django wikihttp://code.djangoproject.com/wiki/NonSqlBackends
>
> See
> django/db/models/base.py: line 147
>
> ............................
> * disabled deletion of related objects in Model.delete() and QuerySet.delete()
>
> See
> django/db/models/query.py: lines 1036, 1065
>
> ............................
> * replaced sql.subqueries.*Query usage with simple functions on a
> single Query class (insert_or_update() instead of InsertQuery and
> UpdateQuery)
>
> See
> django/db/models/query.py: lines 1058, 1088
>
> ............................
> * commented out distinction between insert and update in
> Model.save_base() because there's no such concept in App Engine (and
> SimpleDB, AFAIK)
>
> See
> django/db/models/base.py: lines 470, 475
>
> ............................
> The long-term goal is of course to clean this up and move most of
> these changes into the backend API.
>

Looking at the diff you can see that we really made small changes to
django and in the majority of cases we simply commented some django
code out (like the deletion of related objects). So moving these parts
into the backend shouldn't be hard and would enable users to write
nonrelational databases backends for django in a clean way without
manipulating much existent code of django (existing parts only would
have to be moved into the sql database backend).
> Our tasks are managed in a Google Spreadsheet:https://spreadsheets.google.com/ccc?key=0AnLqunL-SCJJdE1fM0NzY1JQTXJu...
>
> The task list isn't complete, yet. We're working on that.
>
> Bye,
> Waldemar Kornewald
>
> django.diff
> 9KAnzeigenHerunterladen

I hope this will give contributers an idea of where to start. I will
add the solution ideas to the spreadsheet as soon as i find some time.
It would be nice to hear of some thoughts from django-developers
too :)
Bye
Thomas Wanschik

Russell Keith-Magee

unread,
Oct 22, 2009, 6:52:12 PM10/22/09
to django-d...@googlegroups.com
On Fri, Oct 23, 2009 at 5:09 AM, Thomas Wanschik
<twan...@googlemail.com> wrote:
>
>> When a new QuerySet is instantiated (e.g. by calling
>> Model.objects.all()) it asks the backend for its Query class and then
>> creates an instance of that class. By default, this class is
>> sql.Query. Only the Oracle backend has its own Query which subclasses
>> sql.Query.
>
> I think Waldermar wanted to say: 'Only the Oracle backend has its own
> Query which subclasses sql.query.BaseQuery.' instead of 'sql.Query'.

I should point out that this is one of the specific problems Alex and
I are trying to address in the multi-db refactor. When we've finished,
returning the right query class should be as simple as implementing an
API on the backend.

Yours,
Russ Magee %-)

Thomas Wanschik

unread,
Oct 23, 2009, 1:00:16 PM10/23/09
to Django developers
I just want to remind contributers to fill in the cell "Assigned to"
and "Status" in the task spreadsheet while working on a specific task
in order to prefend problems.
Here is the link:
https://spreadsheets.google.com/ccc?key=0AnLqunL-SCJJdE1fM0NzY1JQTXJuZGdEa0huODVfRHc&hl=en

Bye,
Thomas Wanschik

Thomas Wanschik

unread,
Oct 25, 2009, 12:42:14 PM10/25/09
to Django developers


On 22 Okt., 23:52, Russell Keith-Magee <freakboy3...@gmail.com> wrote:
> On Fri, Oct 23, 2009 at 5:09 AM, Thomas Wanschik
>
Thanks for your answer Russell. But i have one question left. Should
we make the effort and clean the app engine backend up in the way the
oracle backend is done (using a query_class) or should we wait for the
multi-db refactor and then clean up our code according to multi-db?
Will it be easier to merge the backend into django then?

Bye,
Thomas Wanschik

> Yours,
> Russ Magee %-)

Russell Keith-Magee

unread,
Oct 25, 2009, 7:46:27 PM10/25/09
to django-d...@googlegroups.com

The current query_class will need to change slightly to support
multi-db, so anything you implement against that interface will
require some rework later on. That said, the fundamental approach
(i.e., the backend tells you what class to use for queries) will still
be there - it will just be used in a slightly different way.

If you want to write (and test) code now, my suggestion would be to
try making your code as clean as possible against the current
interface, with the expectation that there will be some rework once
multi-db lands. The corollary to this is that if you find yourself
needing to make weird and widespread engineering decisions in order to
support the query_class approach, you should stop and wait for
multi-db to land.

Yours
Russ Magee %-)

Waldemar Kornewald

unread,
Oct 26, 2009, 4:52:08 AM10/26/09
to django-d...@googlegroups.com
On Mon, Oct 26, 2009 at 1:46 AM, Russell Keith-Magee
<freakb...@gmail.com> wrote:
> The current query_class will need to change slightly to support
> multi-db, so anything you implement against that interface will
> require some rework later on. That said, the fundamental approach
> (i.e., the backend tells you what class to use for queries) will still
> be there - it will just be used in a slightly different way.

In the SVN multi-db branch there is a modified query_class() API.
OTOH, on github it got replaced with SQLCompiler. Are the
query_class() changes already committed somewhere?

Why do you still need query_class() if you already have SQLCompiler?
If this is just about making non-SQL backends work then you'll need
some kind of backend-independent query representation, so
QuerySet.using() can be supported. That's exactly what we've already
done with QueryGlue, so maybe you should better reuse what we've
started and finish that together with us, so we all don't waste time
on refactoring everything twice?

Bye,
Waldemar Kornewald

Russell Keith-Magee

unread,
Oct 26, 2009, 7:12:55 AM10/26/09
to django-d...@googlegroups.com
On Mon, Oct 26, 2009 at 4:52 PM, Waldemar Kornewald
<wkorn...@gmail.com> wrote:
>
> On Mon, Oct 26, 2009 at 1:46 AM, Russell Keith-Magee
> <freakb...@gmail.com> wrote:
>> The current query_class will need to change slightly to support
>> multi-db, so anything you implement against that interface will
>> require some rework later on. That said, the fundamental approach
>> (i.e., the backend tells you what class to use for queries) will still
>> be there - it will just be used in a slightly different way.
>
> In the SVN multi-db branch there is a modified query_class() API.
> OTOH, on github it got replaced with SQLCompiler. Are the
> query_class() changes already committed somewhere?

No, they haven't been developed yet. Alex and I did the initial design
work at the DjangoCon sprints, but we haven't actually implemented
anything yet.

> Why do you still need query_class() if you already have SQLCompiler?
> If this is just about making non-SQL backends work then you'll need
> some kind of backend-independent query representation, so
> QuerySet.using() can be supported. That's exactly what we've already
> done with QueryGlue, so maybe you should better reuse what we've
> started and finish that together with us, so we all don't waste time
> on refactoring everything twice?

There are two different agents at work here.

We need to split sql.Query from QueryCompiler to support the fact that
the same SQL-like query needs to be rendered in different ways by
different backends. This can be as simple as the character used for
quoting, or as complex as wrapper clauses needed to handle LIMIT and
OFFSET.

There is a separate issue of determining if sql.Query is the right
internal structure to use for representing a query.

To date, sql.Query is the right structure for all Django's supported
backends. It might even be the right structure for a non-SQL backend
that provides a SQL-like query layer (AppEngine possibly falls into
this category, as might a SimpleDB backend). However, a CouchDB,
Cassandra or MongoDB backend probably won't get much traction using an
internal query structure that talks about Joins and Where clauses.

So - the intention is to repurpose query_class() slightly. Once
refactored, query_class() will be required to return a class that
implements the Query interface. sql.Query is the only example at
present, but other backends can provide other internal
representations. The call to query_class() will be made in QuerySet -
not as part of the sql.Query construction. In this way, query_class()
becomes the "get me the actual implementation" method on the backend.

We're *not* trying to build a completely generic internal query
representation. I'm not convinced that such an animal is even possible
in the general case - again, JOIN means something to relational
databases, but doesn't mean much to non-SQL databases. If AppEngine is
able to leverage some of the sql.Query internals, thats great - but I
don't expect that this will be the default situation.

Yours,
Russ Magee %-)

Waldemar Kornewald

unread,
Oct 26, 2009, 8:46:37 AM10/26/09
to django-d...@googlegroups.com
On Mon, Oct 26, 2009 at 1:12 PM, Russell Keith-Magee
<freakb...@gmail.com> wrote:
> To date, sql.Query is the right structure for all Django's supported
> backends. It might even be the right structure for a non-SQL backend
> that provides a SQL-like query layer (AppEngine possibly falls into
> this category, as might a SimpleDB backend). However, a CouchDB,
> Cassandra or MongoDB backend probably won't get much traction using an
> internal query structure that talks about Joins and Where clauses.

App Engine's datastore API is more similar to MongoDB than SQL. Even
on SimpleDB I don't think that the Where tree is a good idea because
it's way too SQL-specific.

> So - the intention is to repurpose query_class() slightly. Once
> refactored, query_class() will be required to return a class that
> implements the Query interface. sql.Query is the only example at
> present, but other backends can provide other internal
> representations. The call to query_class() will be made in QuerySet -
> not as part of the sql.Query construction. In this way, query_class()
> becomes the "get me the actual implementation" method on the backend.

Why do you want to implement this in multi-db if it's only useful for
non-SQL support? Shouldn't you better keep multi-db as-is and add the
query_class() feature to our branch? That would save us lots of
conflicts because won't have to implement our code twice (once for the
old query_class and once for your version) and we'll probably have to
change your query_class, anyway.

> We're *not* trying to build a completely generic internal query
> representation. I'm not convinced that such an animal is even possible
> in the general case - again, JOIN means something to relational
> databases, but doesn't mean much to non-SQL databases. If AppEngine is
> able to leverage some of the sql.Query internals, thats great - but I
> don't expect that this will be the default situation.

Does this mean you'll remove QuerySet.using()? Otherwise you'd have to
transform an sql.Query to an appengine.Query.

If the generic query representation is not much more detailed than Q
objects then I don't see a big problem, anyway (our QueryGlue can be
easily transformed into sql.Query or any other query type exactly for
that reason). The point why we need QueryGlue is that the queries will
have to be manipulated and interpreted in order to emulate certain
features (e.g., joins) and its much easier to do this on the final
query tree than on its intermediate states.

Bye,
Waldemar Kornewald

Russell Keith-Magee

unread,
Oct 26, 2009, 9:05:56 AM10/26/09
to django-d...@googlegroups.com
On Mon, Oct 26, 2009 at 8:46 PM, Waldemar Kornewald
<wkorn...@gmail.com> wrote:
>
> On Mon, Oct 26, 2009 at 1:12 PM, Russell Keith-Magee
> <freakb...@gmail.com> wrote:
>> To date, sql.Query is the right structure for all Django's supported
>> backends. It might even be the right structure for a non-SQL backend
>> that provides a SQL-like query layer (AppEngine possibly falls into
>> this category, as might a SimpleDB backend). However, a CouchDB,
>> Cassandra or MongoDB backend probably won't get much traction using an
>> internal query structure that talks about Joins and Where clauses.
>
> App Engine's datastore API is more similar to MongoDB than SQL. Even
> on SimpleDB I don't think that the Where tree is a good idea because
> it's way too SQL-specific.

Exactly my point. There is no such thing as a "generic" internal
query. The closest we can hope for is a common interface for objects
that can have Qs, filters, et all added to them. sql.Query interprets
those Q's and filters as joins. Other backends will require other
interpretations.

>> So - the intention is to repurpose query_class() slightly. Once
>> refactored, query_class() will be required to return a class that
>> implements the Query interface. sql.Query is the only example at
>> present, but other backends can provide other internal
>> representations. The call to query_class() will be made in QuerySet -
>> not as part of the sql.Query construction. In this way, query_class()
>> becomes the "get me the actual implementation" method on the backend.
>
> Why do you want to implement this in multi-db if it's only useful for
> non-SQL support? Shouldn't you better keep multi-db as-is and add the
> query_class() feature to our branch? That would save us lots of
> conflicts because won't have to implement our code twice (once for the
> old query_class and once for your version) and we'll probably have to
> change your query_class, anyway.

Because the way query_class() is currently used causes other problems.
Providing an entry point for multi-db is a bonus.

>> We're *not* trying to build a completely generic internal query
>> representation. I'm not convinced that such an animal is even possible
>> in the general case - again, JOIN means something to relational
>> databases, but doesn't mean much to non-SQL databases. If AppEngine is
>> able to leverage some of the sql.Query internals, thats great - but I
>> don't expect that this will be the default situation.
>
> Does this mean you'll remove QuerySet.using()? Otherwise you'd have to
> transform an sql.Query to an appengine.Query.

QuerySet.using() will continue to exist. However, I expect there will
be some restrictions on when you can call it. Retasking across backend
types will be one of those restrictions.

> If the generic query representation is not much more detailed than Q
> objects then I don't see a big problem, anyway (our QueryGlue can be
> easily transformed into sql.Query or any other query type exactly for
> that reason). The point why we need QueryGlue is that the queries will
> have to be manipulated and interpreted in order to emulate certain
> features (e.g., joins) and its much easier to do this on the final
> query tree than on its intermediate states.

I need to take a closer look at QueryGlue to be able to offer any
deeper critique of this. I'll put this on my todo list.

Yours,
Russ Magee %-)

Waldemar Kornewald

unread,
Oct 26, 2009, 10:46:20 AM10/26/09
to django-d...@googlegroups.com
On Mon, Oct 26, 2009 at 3:05 PM, Russell Keith-Magee

<freakb...@gmail.com> wrote:
>
> On Mon, Oct 26, 2009 at 8:46 PM, Waldemar Kornewald
> <wkorn...@gmail.com> wrote:
>>
>> On Mon, Oct 26, 2009 at 1:12 PM, Russell Keith-Magee
>> <freakb...@gmail.com> wrote:
>>> To date, sql.Query is the right structure for all Django's supported
>>> backends. It might even be the right structure for a non-SQL backend
>>> that provides a SQL-like query layer (AppEngine possibly falls into
>>> this category, as might a SimpleDB backend). However, a CouchDB,
>>> Cassandra or MongoDB backend probably won't get much traction using an
>>> internal query structure that talks about Joins and Where clauses.
>>
>> App Engine's datastore API is more similar to MongoDB than SQL. Even
>> on SimpleDB I don't think that the Where tree is a good idea because
>> it's way too SQL-specific.
>
> Exactly my point. There is no such thing as a "generic" internal
> query. The closest we can hope for is a common interface for objects
> that can have Qs, filters, et all added to them. sql.Query interprets
> those Q's and filters as joins. Other backends will require other
> interpretations.
> [...]

> I need to take a closer look at QueryGlue to be able to offer any
> deeper critique of this. I'll put this on my todo list.

Yes, that'll help in our discussions and I hope it'll make clearer why
query_class() should rather be implemented in our branch instead of
multi-db (which already works the way it is - withour query_class()).

Here's the link:
http://bitbucket.org/wkornewald/django-nonrel-hacked/src/tip/django/db/models/queryglue.py
What QueryGlue does is something like this (though, it's simplified):
queryset.filter(bla__attr=3)
=> gets translated to =>
queryglue.filters_tree.add(( ['bla', 'attr'], 'exact', 3 ))

As you can see, there isn't anything backend-specific in the
filters_tree. It's actually not even that much different from what
sql.Query.add_filter() already does - just without adding information
about joins and other SQL-specific stuff.

Now, an SQL backend can just iterate over filters_tree and call
sql.Query.add_filter() for each child in the tree - this would be the
easiest way to make sql.Query work again in our code. OTOH, the
non-relational backends could inspect the tree and possibly execute
multiple queries - one for each table involved in the query - and then
join the result set in memory (depending on the query and your data
this can be inefficient - or efficient).

Bye,
Waldemar Kornewald

Waldemar Kornewald

unread,
Oct 29, 2009, 2:44:49 PM10/29/09
to django-d...@googlegroups.com
Hi,
Russell and Alex, did you already look at QueryGlue? We really need to
discuss which branch the new query_class() should be in.

Bye,
Waldemar Kornewald

Alex Gaynor

unread,
Oct 29, 2009, 3:51:14 PM10/29/09
to django-d...@googlegroups.com
I haven't had a chance to look at it, and I probably won't until at
least a few of the items on my plate are dealt with. That being said
I am extremely leery about investing time in something with names like
"QueryGlue" as to me they imply a lack of organization in the code,
and that may of may not be true, but giving things name thats are at
least somewhat explanatory to outside users really helps.

Alex

--
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me

Waldemar Kornewald

unread,
Oct 30, 2009, 3:56:32 AM10/30/09
to django-d...@googlegroups.com
On Thu, Oct 29, 2009 at 9:51 PM, Alex Gaynor <alex....@gmail.com> wrote:
> I haven't had a chance to look at it, and I probably won't until at
> least a few of the items on my plate are dealt with.  That being said
> I am extremely leery about investing time in something with names like
> "QueryGlue" as to me they imply a lack of organization in the code,
> and that may of may not be true, but giving things name thats are at
> least somewhat explanatory to outside users really helps.

I've renamed it to QueryData. With that huge roadblock out of our way,
I hope you're much more likely to help. ;)

Bye,
Waldemar Kornewald

Waldemar Kornewald

unread,
Nov 13, 2009, 4:32:01 AM11/13/09
to django-d...@googlegroups.com
Hey,
a little status update:

We've switched our work to Alex' github multi-db branch because we
depend on that to make a clean non-relational backend API. Otherwise
we'd have to rewrite too much code once multi-db gets merged into
trunk. The new branch is at:
http://bitbucket.org/wkornewald/django-nonrel-multidb/

Our django-testapp project finally contains unit tests for all
supported DB features (Model.save(), QuerySet.get(), .count(),
.filter(), .exclude(), etc.).

Now we can implement a query_class() backend API and begin to move the
hacked-in App Engine code out of Django.

Bye,
Waldemar Kornewald
Reply all
Reply to author
Forward
0 new messages