I'm not completely convinced this is completely true with the current
codebase, but it's certainly 95% true. The ORM was designed to be SQL
agnostic, but there are a couple of places in where, in the name of
'getting the job done', some shortcuts were taken that means that
Query assumes the presence of a SQL backend or SQL-like data
structures. The as_sql() call on Query is the one obvious example of
this, but there are some more subtle places where there is some
conceptual leakage.
However, Alex's comments are otherwise accurate. I'm aware of a few
projects where people are playing with integrating various non-SQL
data stores with Django (including AppEngine, MongoDB, Cassandra,
Couch and others). They are at varying levels of maturity, and have
varying levels of feature completeness.
My goal for supporting these backends at a project level is _not_ to
add them to the Django core - at least, not immediately, and certainly
not in the v1.2 timeframe. Instead, these projects _should_ be able to
be built and develop external to the Django core. In time, it may be
appropriate to add them to the core - but that is a long term goal,
not a short term goal.
If we were to phrase this as a v1.2 feature goal, it would not be "Add
AppEngine Support". Rather, it would be "make the Query backend
compatible with non-SQL query languages". This is likely to involve a
number of small but subtle changes to the ORM, but we can address
these on a case-by-case basis as the AppEngine (or any other non-SQL)
backend encounters difficulties.
Yours,
Russ Magee %-)
Hmm... I would expect a backend to completely replace the Query class,
not work around something like as_sql(). One part of making non-db
backends supportable is to introduce a way to say "instead of using
django.db.models.sql.*, use this other namespace for the Query class and
everything it uses". That's why all the SQL stuff (modulo a few bits
that haven't been done yet) is in the one namespace and the public API
is entirely outside that namespace.
A non-db backend ends up implementing at the QuerySet -> Query
internal-API layer and that's where most of the tweaking will occur, I
would imagine. The object structure for non-relational and
other-storage-system backends is quite a bit different to the object
structure required to create a relational-algebra-based tree (which is
what Query does).
[...]
> My goal for supporting these backends at a project level is _not_ to
> add them to the Django core - at least, not immediately, and certainly
> not in the v1.2 timeframe. Instead, these projects _should_ be able to
> be built and develop external to the Django core. In time, it may be
> appropriate to add them to the core - but that is a long term goal,
> not a short term goal.
Definitely agreed. I think we know enough at this point to be able to
provide fairly generic support for this kind of thing. We've been
heading in that direction for a couple of years now, with that intent.
Regards,
Malcolm
Sorry - I meant QuerySet, not Query (i.e., the as_sql() call on
QuerySet is an example of leakage). Chalk that one up to a flu-addled
brain. I agree that a non-sql backend should be implementing
appengine.Query, which _should_ be a drop-in for sql.Query - although,
as noted, there might be a couple of places where some tweaks are
needed.
Yours,
Russ Magee %-)
This one is slightly topical. Alex has a github branch that refactors
the m2m code to get the SQL out of the related field model [1]. In
order to do this, it introduces a dummy model for m2m fields. This is
needed for Alex's multi-db work, and it was my intention to target the
patch for Django's trunk in the near-ish future. If there is overlap
into the requirements for AppEngine/non-sql support, I'd be interested
to hear any feedback.
[1] http://github.com/alex/django/tree/m2m-refactor
Russ %-)
That's true, in a sense, if you define "support AppEngine" (or other
non-relational backends) as "let you define models and query 'em."
However... what people usually *mean* when they ask "does Django run
on AppEngine?" is "does the *Django admin* run on AppEngine?"
I suspect that simply supporting non-relational backends won't really
make most real-world users happy, but will just sorta kick the can
down the road.
That's why I, for one, won't say that Django "supports" non-relational
backends until (most of) django.contrib runs on NOSQL_BACKEND
unmodified, and until it's easy to write reusable apps against an API
that'll run there.
So, for me, the first step is to separate the QuerySet API into "the
set of operations that we can expect *any* backend to support" and
then "the set of operations that realistically only a relational
backend will support." For example, ``get(pk=N)`` should be expected
to work anywhere, but ``filter(foo__bar__baz__startswith='spam')``
probably is only gonna work in SQL. Likewise, ``select_related()``
means something specific on a SQL backend, but probably would be a
noop against something like CouchDB.
Given the work I've done in this area to date I expect the first set
of operations to be sufficiently powerful that any app generic enough
to be a contrib candidate could be written against this scaled-down
QuerySet API. If that's true, it should in theory be possible to get
the admin (and auth, etc.) running on non-relational stores.
[My hope is to put a lot more work into this in the near future, but
given my unpredictable work schedule I can't promise anything.]
Anyway, this is mostly a brain dump... but the main takeaway I want
people to get is that simply supporting non-relational backends in a
first-class manner isn't enough; most average users won't give a shit
until the admin works, too.
Jacob
I'll have a look at that as soon as I get a chance. It's
build-your-own-diff country, so hopefully it's fairly self-contained.
Regards,
Malcolm
It's true in a broader sense than that, too.
> However... what people usually *mean* when they ask "does Django run
> on AppEngine?" is "does the *Django admin* run on AppEngine?"
Failure to phrase their question correctly isn't my problem. :-)
That being said, I think we're in mostly violent agreement here as to
the end goal. However, it's walk before running. The goal would be to
support unchanged public API and that influences the design.
However, you *will* have to set up the settings appropriately, so cruft
like the email backends that Walther has in his summary list isn't a
major core change to Django.
> I suspect that simply supporting non-relational backends won't really
> make most real-world users happy, but will just sorta kick the can
> down the road.
>
> That's why I, for one, won't say that Django "supports" non-relational
> backends until (most of) django.contrib runs on NOSQL_BACKEND
> unmodified, and until it's easy to write reusable apps against an API
> that'll run there.
>
> So, for me, the first step is to separate the QuerySet API into "the
> set of operations that we can expect *any* backend to support" and
> then "the set of operations that realistically only a relational
> backend will support." For example, ``get(pk=N)`` should be expected
> to work anywhere, but ``filter(foo__bar__baz__startswith='spam')``
> probably is only gonna work in SQL.
I'd disagree on that last point. It might work out to be multiple
operations and be non-atomic, but that's once of the prices you pay when
moving to a key/value-based storage system. There's no reason the
function call can't return the ultimately right results over a
non-changing dataset, though. Making that so is also something that
won't add a single line of cruft to the SQL backend, since it already
has the support for that. It all goes into the backend's sql/*
replacement module.
> Likewise, ``select_related()``
> means something specific on a SQL backend, but probably would be a
> noop against something like CouchDB.
Right. Which is definitely "works" in my book.
Regards,
Malcolm
[... various individual items snipped...]
>
> And these are just the first few issues we've run into when analyzing
> the source.
Most of those are the kind of incremental changes that are part of
making the backend stuff generic. For example, pushing some of the logic
of the insert/update stuff down into the Query layer is a solution to
the last point. A lot of the delete stuff is already part of the query
layer, so it's not much extra to add a layer on the extra stuff.
This is definitely the sort of stuff that is good to know, but initially
I think it's good to keep phrasing as being specific problems, rather
than targeting particular solutions, since there is clearly some
difference in the way you want to go to the way we would like move and
it's primarily important to understand the problems we're solving,
rather than spinning around on prematurely debating micro-solutions.
> It would be much easier to simply allow for overriding Model and
> QuerySet at a high level.
By "easier" you mean that you would only have to write *all* of the
public API again for each backend, instead of reusing as much common
stuff as possible?! That's the solution that maybe gets you GAE support
early on at the cost of a bunch of developer work for this particular
individual item, but doesn't pay-off for long-term reusability.
Doing the easiest thing possible here at the expense of pain down the
track is a false goal. Our job as framework maintainers is to do the
hard stuff so that everybody else says "that was much easier than I
thought it would be" when they come to using it. We've done it a number
of times in the past and I take no end of pleasure in repeatedly hearing
people surprised at discovering how tricky things are under the cover so
that their code just works, no matter whether they're using the fully
public API or some of the semi-stable internal bits like extending
QuerySets or writing custom Q-objects, or writing a new database
backends. *That* is the brass ring. Making Query and friends replaceable
means that bug fixes and enhancements in the common code are seen by
everybody. Making Model and QuerySet replaceable means that there will
hardly ever be parity between the alternate storage backends as the
different maintainers add different sets of features.
[...]
> But if you (the Django developers) hope to
> support most of the contrib apps 1:1 without any special checks for
> non-relational backends I think this goal will be very difficult to
> achieve. I had to disable result sorting in the admin interface, for
> example, because that would require too many composite datastore
> indexes and there are almost endless combinations when you can filter
> and sort and search results in the admin interface. Who knows, maybe
> when Google adds full-text search this limitation will go away...
This is why I also disagreed slightly with Jacob's end-goal. Because
people's dreams are a little ahead of reality. Django cannot standardise
on lowest-common-denominator, even for contrib apps. So I think there
will be things that simply don't work in certain cases because the
backend isn't up to the task. But it won't be entire apps. It will be
small features in individual apps. Where we draw that fuzzy line is
obviously something to evaluate as we go along. For now, let's
definitely think big and get as much of the way there as we can.
Regards,
Malcolm
Yes. I don't see how what Malcolm said contradicts this. Malcolm was
referring to the fact that some of the model level operations - such
as delete - are already implemented as interfaces on Query. If you
need to move more model functions into the Query to satisfy AppEngine,
this is something we are open to discussing.
> What
> information do you need for refactoring the backend API?
A concrete proposal. "I think we should move save() to the Query
interface because.... here is a sample implementation explaining what
I mean."
As Malcolm said - it's very easy to get stuck in incredibly complex
debates of entirely hypothetical situations. The current
implementation is sufficient for SQL backends, and is relatively clean
in design. You are the expert on AppEngine. You are the person in the
position to tell us where you need flexibility that is not currently
afforded. Make a concrete proposal, and we'll discuss it.
Yours,
Russ Magee %-)
Begin at the beginning :-) Pick a small task - e.g., converting a
simple Django ORM query into SimpleDB's query API, and see where that
takes you. Rinse and repeat until you have a working API.
> Is the consensus that further refactoring or rethinking of things like
> QuerySet and Query are required to make this happen?
Not really a consensus - more a general feeling that there are some
SQL-specifics that still need to be purged. The Query/QuerySet
interface was designed with support for non-SQL backends as a design
goal, but without an actual non-SQL backend implementation to prove
the design. Finding and purging these SQL-specific components is the
work in progress.
> Is there an
> "official" interface between Django and DB backends?
It's official in the sense that exists in the core codebase. However,
unlike most of Django's official interfaces, it is not extensively
documented. The two interfaces are sql.Query and db.backend. If you're
interested in this problem, you're going to need to go spelunking
through the code, trying to understand it from the SQL point of view,
and then adapting it for a non-SQL backend.
Yours,
Russ Magee %-)
Not yet. As mentioned earlier in the thread, a large chunk of the
process of making non-SQL support is to allow wholesale overriding of
the django.db.models.sql namespace. Everything in there is SQL specific
and I would hope that that is the layer that is entirely replaced by
alternate storage systems. So overriding subqueries feels like too low
level, because it has plenty of builtin SQL assumptions.
So anything that imports and uses things from django.db.models.sql needs
to instead use a common API to fetch the appropriate module and we need
to thin down the interface. Nothing outside of django.db.models.query
should be importing from that namespace, ideally (I believe that should
be true now, unless we've introduced bugs) and then we need to introduce
something like a (*shudder*) set of factory methods or otherwise split
up the features. That's certainly where the design changes are initially
going to happen in this, we already know that.
Regards,
Malcolm
You seem to be oscillating between extremes. It's not a large
refactoring. The large refactoring was queryset-refactor so that these
changes will be small and fairly self-contained. But, really, we could
debate "large", "small", "medium" until the cows come home. They are
relative. I expect it will be smaller than the queryset-refactor and
larger than a couple of evenings' work, but since the work hasn't been
done yet, that's only an intuitive guess.
My understanding of the problem -- whilst I don't have your AppEngine
knowledge, I have fairly good understanding of non-SQL storage systems
in general and excellent understanding of the ORM side of Django -- is
that trying to patch non-SQL support into django.db.models.sql won't be
flexible enough. There will be too many alternate paths because there
are a lot of relational-tree data structures and column-based storage
and join-based single-query features in that code. And it's already on
the performance critical path. So it's a candidate for replacement with
more appropriate object models for other storage backends.
>
> > So anything that imports and uses things from django.db.models.sql needs
> > to instead use a common API to fetch the appropriate module and we need
> > to thin down the interface. Nothing outside of django.db.models.query
> > should be importing from that namespace, ideally (I believe that should
> > be true now, unless we've introduced bugs) and then we need to introduce
> > something like a (*shudder*) set of factory methods or otherwise split
> > up the features. That's certainly where the design changes are initially
> > going to happen in this, we already know that.
>
> When do you plan to start with the changes?
It's already started. This thread is part of the design work and
validation of the ideas around the area.
And it's not "you", it's "us". Anybody can start trying out things and
publishing a repository or putting up some patches to review for ideas
or prototype implementations.
I'm hoping I personally will have some time to devote serious
development work in this area over the next few months, as it's
something I've been working incrementally on in Django for over three
years now.
> I'm trying to get some more App Engine developers involved, so this we
> can get this done faster. Mitch, do you have anyone else on the
> SimpleDB side who could help, too?
Whilst enthusiasm is good, please realise that the goal isn't "AppEngine
support as fast as possible". It can't be. That's too short-sighted for
a feature change like this in Django. If we're going to do the work, we
have to really work out the changes to support as broad a range as
possible. We've made some pretty good progress in this thread already
and you're collecting the broader problems on the new wiki page is going
to be useful.
I'd like to hear any other people who've spent serious time (I think
you'll find you've spent more time looking at this than anybody except a
couple core developers, Waldemar, and you know it's a large-ish area of
code to think through) with other storage engines. I've been thinking
about this on and off over the last couple of days as I have time and
I've put some serious thought into this aspect a few times in the past
-- which is why I can comment on some of the things fairly quickly. I
know Jacob and Russell have also spent time here. So we collect the
ideas and start thinking about what works. Maybe try out a few things
and see.
Start writing some prototype code and see what happens. I'm certainly
one of the people happy to review and critique. Jacob, Russell and I
have given some indications in this thread of the direction we'd like to
go (in terms of the layer we suspect is the right place to be the point
for inserting vastly different storage engines), but now we have to see
what works.
Regards,
Malcolm
To the extent that I'm in a position to provide design guidance and
feedback from the perspective of the Django Core, put me on this list
too. Time permitting, I might be able to contribute some code, too.
Yours,
Russ Magee %-)
While I encourage you to try to find common ground between these
various data stores, don't get too hung up on trying to satisfy every
possible non-relational backend at the same time. A single working
implementation is better than a dozen partly working ones :-)
As a side note - is there anyone out of this group of AppEngine
aficionados that will be attending DjangoCon? It would be good if we
can use the chance for face time to sort out any big-picture issues.
Yours,
Russ Magee %-)
If we're gunna start talking about a more generic DB API then count me in!
For what it's worth, I really think that a basic API that only
supports a few simple operations (get, set, and delete, primarily) is
the way to go. There's not enough commonality between datastores at a
higher level than that. While you could "fake" some operations in
code, I think that will end up causing problems. Trying to "fake" an
order-by with a limit in a key-value store, for example, just seems
like a bad idea.
Also, if you're trying to make SimpleDB work with Django you might
want to check out the python-simpledb client I wrote
(http://github.com/sixapart/python-simpledb/tree/). Conceptually it's
very similar to the Django ORM. We (Six Apart) also just released a
package called remoteobjects that gives REST APIs a Django-ORM like
interface. Might want to check that out too.
http://github.com/sixapart/remoteobjects/tree/master
Mike
Sure.
> Some higher-level features like JOINs can be useful *and* practical
> even on non-SQL DBs. Of course, emulated operations have to be
> accurate within certain limits. It's no use to have an emulated order-
> by on a subset of the real match (e.g., for sorting by two attributes
> in SimpleDB), but an emulated JOIN on small result sets or very simple
> queries can be very useful. Of course, it's up to the developer to
> make sure that he's within the limits.
It depends on what you mean by "emulated JOINs." I think an "emulated
related field" is useful. In other words, even for a key-value store
backend I think it'd be useful to define a field type that is a
"related field." When you access such a field, another DB operation
would be performed to fetch the related item. I'm afraid that
emulating the SQL join syntax in general for all backends would be
overpromising. One of the reasons some of these new datastores are so
useful is because they force you to rethink your data and store it in
a way that makes it efficient to retrieve. You lose this benefit if
you hide the efficient interface exposed by the datastore behind a
layer that does a bunch of inefficient stuff.
Mike
There are two problems here:
1. You want Django to be able to talk to non-SQL backends
2. You want to be able to talk to two different backends in the same project.
Point 1 is the task of the non-SQL backend work.
Point 2 is the task of the multi-database support work that Alex
Gaynor has been working on for the GSoC.
There's no need to conflate the two problems. The multi-db interface
that Alex has been developing will allow one database to be Postgres
and a second to be MySQL; once there is support for non-SQL backends,
it will be trivial to make the second database CouchDB, or any other
supported backend.
Yours
Russ Magee %-)
Why, what a lovely bikeshed.
Seriously, of all the problems we have to deal with, the _name_ of the
Query class isn't one of them.
> I looked into django/db/models/query.py, the insert_query function
> implementation is very specific for SQL.
Yes. That's why we're embarking on a project to make it - and all the
other SQL specific parts of the generic interface - non-SQL specific.
Yours,
Russ Magee %-)
I've already given Andi a review of his first draft; his second draft
is on my list of things to look at.
Yours,
Russ Magee %-)