NoSQL support

234 views
Skip to first unread message

Waldemar Kornewald

unread,
Apr 27, 2011, 4:12:24 AM4/27/11
to django-developers
Hi,
we (the Django-nonrel developers) would like to work on official NoSQL
support for Django. We'd like to focus only on databases similar to
App Engine, MongoDB, and Cassandra 0.7+ with secondary indexes. Our
goal is not to support all native query features of key-value stores
like Redis.

I've all problems and required changes here:
http://code.djangoproject.com/wiki/NoSqlSupport

What we'd like to do is take Django-nonrel and clean it up (which also
means removing some of the modifications we did), so it's ready for a
merge into trunk. Why not start from Alex' branch? Practically all
modifications in Alex' branch are already implemented in Django-nonrel
(plus a few other changes).

In order to pull this off we'll need support from a Django core
developer. Ideally this would be someone with practical experience in
NoSQL databases and the Django ORM. Does any core developer have time
to help with this project?

Thanks!

Bye,
Waldemar Kornewald

--
Django on App Engine, MongoDB, ...? Browser-side Python? It's open-source:
http://www.allbuttonspressed.com/

legutierr

unread,
Apr 27, 2011, 3:18:55 PM4/27/11
to Django developers
+ 100 on this (oh, wait, do I not get that many votes? +10 then).

Waldemar and Thomas (and the rest of the people contributing to django-
nonrel) have worked very hard to advance Django and expand its use
into new spheres. It would be great to see their work recognized by
the core team, and to begin to see the relevant parts integrated into
trunk.

Obviously this is only going to get done if one of the core developers
has the time and desire to work with Waldemar and Thomas. As someone
who uses with Django every day and has committed to the platform on a
commercial basis, and as an infrequent contributor, I very much hope
that someone on the core team decides to take them under their wing.

Regards,

Ed Gutierrez

Russell Keith-Magee

unread,
Apr 28, 2011, 2:59:40 AM4/28/11
to django-d...@googlegroups.com

I don't think you'll get much argument from the core team that, in
principle, having the infrastructure in place to support NoSQL data
stores would be a good thing. Waldemar et al have clearly put a lot of
effort into their branch. However, the devil is in the details.

Fundamentally, there are two problems standing in the way of this project.

The first is resources. I can't speak for any other members of the
core team, but looking at my calendar for the next couple of months, I
can tell that I'm not going to have as much time to dedicate to Django
as I have over the last couple of years.

The second is knowing the size of the job that is being proposed. At
the moment, this is a completely unknown quantity. I haven't used the
django-nonrel branch, and I'm not aware of anyone that I know and
trust that has. Django-nonrel has been developed completely
independently of django-trunk, with it's own mailing lists, it's own
development team, and so on, so Django's core team hasn't had any
exposure to the design and development process that has lead to the
code that is there.

To be completely frank, from my perspective, the code is an unknown
quantity at this point. It *might* be fine -- but it might not, on
anything from a scale from "needs minor work" to "needs to be
rebuilt". I simply don't know, and any process that will lead to me
knowing requires me to spend a non-trivial amount of time reviewing
the code and it's branch. This is one area where the wiki page could
help -- providing a 1000ft view of how the branch does what it does.
The current wiki content is a good start, but it needs a lot more
detail -- at the moment, it's contains a lot of brief feature
descriptions, but not a lot detail on how or why those features work
they way they do.

So how do we move forward? The assertion has been made that what is
needed next is attention from the core. I'd like to propose something
different.

The core team is already a bottleneck in the whole Django process. The
proposed body of work is of unknown size and scope, and will require a
non-trivial amount of time to establish scope. This has the potential
to consume the limited resources of the core and exacerbate the
bottleneck that already exists.

From my perspective, what is needed next isn't attention from the core
-- it's attention from the *community*.

Personally, the best way to convince me that something is ready for
core is when there is broad community support saying it is ready for
core. Show me an active discussion on django-dev, involving people
that are known to the Django community, arguing the merits of your
patch. Show me the discussion that validates why your approach is a
better than the alternatives (in particular, better than the approach
that has been proposed by one core developer and reviewed by another).

Once there's community consensus that the approach is good, *then* the
code will be ready for serious review from the core. And because the
community has already vouched for the code, there is a much lower risk
involved.

In reality, this is exactly what we ask of *any* proposal for trunk,
but on a slightly larger scale. It isn't the core team's
responsibility to review every patch submitted to Trac -- if it were,
we simply wouldn't be able to keep up. So, if you propose a small
patch, we ask that you get someone independent to review it. I don't
think it's too much of a stretch of the imagination to suggest that if
you are proposing a big patch, you need to get more independent
review. And, for the record, I've asked Waldemar for exactly this in
the past [1].

So -- certainly, lets try and get this into trunk. But the first step
isn't to monopolize the attention of a core developer for an unknown
period of time. Django is a community, not just a core team. That
community needs to be involved in the process, especially when we're
talking about a change as big as introducing support for
non-relational stores.

[1] http://groups.google.com/group/django-developers/browse_thread/thread/9208f63b2fb14acc

Yours,
Russ Magee %-)

Markus Gattol

unread,
Apr 28, 2011, 5:16:40 AM4/28/11
to Django developers
Speaking of bottlenecks ... imho Waldemar and Thomas should be core
devs in addition to
the proposed process of getting more/better community review on Django-
nonrel before a merge into trunk can happen.
Message has been deleted

Jonas H.

unread,
Apr 28, 2011, 5:36:11 AM4/28/11
to django-d...@googlegroups.com
On 04/28/2011 08:59 AM, Russell Keith-Magee wrote:
> To be completely frank, from my perspective, the code is an unknown
> quantity at this point. It *might* be fine -- but it might not, on
> anything from a scale from "needs minor work" to "needs to be
> rebuilt". I simply don't know, and any process that will lead to me
> knowing requires me to spend a non-trivial amount of time reviewing
> the code and it's branch.

For anyone who's interested, here's the complete diff of Django-nonrel
against Django 1.3: http://paste.pocoo.org/show/379546/

I think all those changes could fit into ~10 concrete Trac tickets.

(That doesn't mean discussions won't consume a lot of time for everybody
who's involved -- I just wanted give people an idea about kind and
quantity of the code changes.)

Jonas

Eric Florenzano

unread,
Apr 28, 2011, 6:37:20 AM4/28/11
to Django developers
On Apr 28, 2:36 am, "Jonas H." <jo...@lophus.org> wrote:
> For anyone who's interested, here's the complete diff of Django-nonrel
> against Django 1.3:http://paste.pocoo.org/show/379546/

Are you sure this diff is correct? From a quick look over that diff,
it seems there's a bunch of seemingly unrelated changes in there
having to do with password resetting, base64 url encoding, and file
uploading--none of which have to do with NoSQL.

Thanks,
Eric Florenzano

Waldemar Kornewald

unread,
Apr 28, 2011, 7:32:36 AM4/28/11
to django-d...@googlegroups.com

The base64 url encoding and password resetting code is required for
MongoDB and other NoSQL DBs which have a string-based primary key. The
old code would only work with integers.

The file upload code is required to support App Engine's Blobstore.
That one indeed isn't exactly related to NoSQL support, but it's
needed by our users. I've already submitted a separate patch for this
change:
http://code.djangoproject.com/ticket/13721

Note that I never proposed to merge Django-nonrel directly. The
cleanup that I mentioned in my last mail would involve getting rid of
unrelated stuff (though I hope you'd still commit those changes in the
same release because they're needed to run Django on App Engine). I'd
also like to change select_related() and add a backwards-compatible
mode to AutoField as described on the wiki. Also, I'm not sure if
Model._entity_exists is acceptable because it might not be
backwards-compatible (it already breaks a few unit tests). Maybe
someone has an idea how to solve it differently?

As suggested by Russell, I'll try to explain the reasoning behind
every proposed change on the wiki page in the next few days.

Bye,
Waldemar

Jacob Kaplan-Moss

unread,
Apr 28, 2011, 11:13:14 AM4/28/11
to django-developers
On Thu, Apr 28, 2011 at 6:32 AM, Waldemar Kornewald
<wkorn...@gmail.com> wrote:
> As suggested by Russell, I'll try to explain the reasoning behind
> every proposed change on the wiki page in the next few days.

That would be a huge help. I'm trying to wrap my brain around the
megadiff Jonas posted, but I'm having trouble following what's going
on.

The big difference maker, for me, would be if you could separate out
the nonrel changes into a series of patches/commits that I could
review bit by bit. Seeing that sort of logical progression from low-
to high-level really helps me, personally. If you can take the time to
break stuff up in that manner I can certainly reciprocate and find the
time to review.

Jacob

Julien Phalip

unread,
Apr 29, 2011, 6:46:00 AM4/29/11
to Django developers
On Apr 28, 4:59 pm, Russell Keith-Magee <russ...@keith-magee.com>
wrote:
> I haven't used the django-nonrel branch, and I'm not aware of anyone that
> I know and trust that has.

For what it's worth, I am currently using django-nonrel and its
companions djangoappengine, djangotoolbox, and django-dbindexer [1] in
a project on Google App Engine and so far I have been impressed by how
stable and simple it is to use and by how closely it tracks Django
trunk.

I can't vouch for the implementation details as I haven't reviewed
them very closely and I don't know enough about all the issues at
stake, but I can vouch for its simplicity of use and for the fact that
it does work in real projects. All in all, django-nonrel appears to me
as a very promising solution.

Cheers,

Julien


[1] http://www.allbuttonspressed.com/projects/djangoappengine#installation

Patryk Zawadzki

unread,
May 10, 2011, 10:25:37 AM5/10/11
to django-d...@googlegroups.com
On Wed, Apr 27, 2011 at 10:12 AM, Waldemar Kornewald
<wkorn...@gmail.com> wrote:
> Hi,
> we (the Django-nonrel developers) would like to work on official NoSQL
> support for Django. We'd like to focus only on databases similar to
> App Engine, MongoDB, and Cassandra 0.7+ with secondary indexes. Our
> goal is not to support all native query features of key-value stores
> like Redis.

Did you guys consider providing a Document class that is entirely
separate from models.Model?

Technically speaking teaching the ORM non-relational tricks is of
course possible but in reality the philosophy is entirely different
and you need to plan for NoSQL from the very beginning. Traditional
models are flat and have a schema, NoSQL documents can have extra
fields and each of them can hold a fairly complicated structure,
possibly involving numerous other (python-enforced) schemas at
different points in the tree.

In the end you won't be able to move models or logic between
traditional RDBMS and NoSQL engines anyway. What we get instead is
either a whole bunch of NotImplementedErrors or a heap of hacks to
simulate traditional relations in a world that does not need them.

Of course as much of the ORM API as it makes sense should be supported
by the Document but I really feel these should be designed as separate
object types.

--
Patryk Zawadzki
I solve problems.

legutierr

unread,
May 10, 2011, 3:40:26 PM5/10/11
to Django developers

> Did you guys consider providing a Document class that is entirely
> separate from models.Model?
>
> Technically speaking teaching the ORM non-relational tricks is of
> course possible but in reality the philosophy is entirely different
> and you need to plan for NoSQL from the very beginning. Traditional
> models are flat and have a schema, NoSQL documents can have extra
> fields and each of them can hold a fairly complicated structure,
> possibly involving numerous other (python-enforced) schemas at
> different points in the tree.

Maybe it is inevitable that this kind of debate will crop up in any
discussion of django-nonrel or NoSQL, but I very much hope that the
philosophical debate does not detract from this fact: that django-
nonrel has demonstrated in very real terms that the actual changes
needed for Django's ORM to interface with a diverse set of non-
relational systems, are, in the general scheme of things, relatively
minor. Because they are localized and relatively minor, if those
changes do not have a negative impact on the usability and stability
of the ORM, and if they do not introduce noticeable backwards
incompatibility, that small set of changes should, in my opinion, be
considered for acceptance into Django.

That being said...

The idea that relational and non-relational systems represent entirely
different philosophies is very much the conventional wisdom, but I
think that when such an argument is made people often ignore the fact
that object-oriented systems are just as dissimilar from relational
systems as are non-relational data stores, if not more so. In fact, I
think that most people would say that many so-called non-relational
systems map to object-oriented systems with less "impedance mismatch"
than relational systems do. The fact that django-nonrel and Alex
Gaynor's GSoC project last year each took similar paths to providing
non-relational functionality, arrived at independently, and that both
projects required rather minimal changes to the ORM, bears this out, I
think. It demonstrates that Django's ORM does not easily accommodate
non-relational systems by accident, but because the generalized
representation of persistent data that is an "ORM" is as good a match
for many so-called NoSQL systems as it is for RDBMSs.

> In the end you won't be able to move models or logic between
> traditional RDBMS and NoSQL engines anyway. What we get instead is
> either a whole bunch of NotImplementedErrors or a heap of hacks to
> simulate traditional relations in a world that does not need them.

It is also important to note that some "NoSQL" systems are more
appropriate than others to be represented in an object-mapping system
like Django's ORM. A redis backend, for instance, might look like a
bit of a hack, but a backend for App Engine or MongoDB would implement
a large and almost complete subset of the ORM functionality without
any major hacks. And what is wrong with ORM backends implementing
different subsets of the "full" set of ORM functionality? Already
that is the case with the supported databases. Sqlite, for instance,
doesn't enforce most of the constraints that the other database
systems enforce. MySQL myisam does not implement transactions. Etc.

Now, there has been much debate regarding hacks that are required in
order to implement certain relational functionality in the ORM. The
most obvious one is the question of how to handle query joins. It is
important to note that simulating query joins is *not* something that
django-nonrel does; query joins are simply not supported (just as they
are not supported between separate databases with multi-db enabled).
The modifications that django-nonrel makes are much more localized and
trivial than that; many are on the order of "changing the datatype of
a field from an integer to something more generic to accommodate non-
integer primary keys".

> Of course as much of the ORM API as it makes sense should be supported
> by the Document but I really feel these should be designed as separate
> object types.

I think that my own personal experience might be relevant here. I use
RDBMSs in by day-to-day work, and I have worked with RDBMSs in a
professional capacity for over a decade. I have played around with
django-nonrel quite a bit, but only experimentally. I have no "skin
in the game" as it were, no professional or personal reason to want to
see this integration move forward, except in this regard: having made
a big professional commitment to Django, I want to see it become as
popular and ubiquitous as possible, and I want to have the opportunity
to use it in as many contexts as possible.

Making those changes to Django trunk that would allow non-relational
database adaptors to be written without patching Django would be great
for Django, as a *product.*. How? It would turn the Django ORM into
one of the best (certainly one of the first, if not, at this point,
the only) common abstraction layers sitting atop both non-relational
and relational systems. SQL long ago came to serve that purpose for
RDBMSs, but no similar abstraction layer has emerged for non-
relational systems. And yet, somehow, Waldemar and Thomas have been
able to create something that serves that very purpose, by simply
making small adaptations to Django's existing ORM. I really hope that
the Django community recognizes how astoundingly great that is.

But on a more mundane level, consider this: using any object other
than django.db.models.Model as the base class will mean that any
application that uses a non-relational system will have to be written
specifically to use a non-relational system as a backend. That means
that none of the contrib aps, nor many third-party apps could be
reused with a non-relational system; most importantly, the Admin
becomes very hard, if not impossible to use. If you want to host your
Django app on Google AppEngine, you really need the interface to
Google's data store to be implemented as Waldemar and Thomas have done
it, as a backend to Django's ORM.

Regards,

Eduardo

Patryk Zawadzki

unread,
May 10, 2011, 5:29:19 PM5/10/11
to django-d...@googlegroups.com
On Tue, May 10, 2011 at 9:40 PM, legutierr <legu...@gmail.com> wrote:
> Maybe it is inevitable that this kind of debate will crop up in any
> discussion of django-nonrel or NoSQL, but I very much hope that the
> philosophical debate does not detract from this fact: that django-
> nonrel has demonstrated in very real terms that the actual changes
> needed for Django's ORM to interface with a diverse set of non-
> relational systems, are, in the general scheme of things, relatively
> minor.  Because they are localized and relatively minor, if those
> changes do not have a negative impact on the usability and stability
> of the ORM, and if they do not introduce noticeable backwards
> incompatibility, that small set of changes should, in my opinion, be
> considered for acceptance into Django.

Please don't get me wrong. I have worked with RDBMS for more than a
decade but I alse use django-nonrel with MongoDB on a daily basis. I
also think that the approach django-mongokit takes is much more
natural for NoSQL data than just reusing the ORM. The ORM has no way
to express complex structures and if such support is added, you will
always have to choose which subset to use. For relational tables you'd
get foreign keys and for non-relational you'd get structure semantics.
Then we have the ModelForms that would need to start producing
sub-formsets for certain structures. In the end you end up with one
swiss army knife instead of a fork and a knife. While possible, it's
not very convenient to dine using a swiss army knife.

legutierr

unread,
May 10, 2011, 7:53:22 PM5/10/11
to Django developers
> Please don't get me wrong. I have worked with RDBMS for more than a
> decade but I alse use django-nonrel with MongoDB on a daily basis. I
> also think that the approach django-mongokit takes is much more
> natural for NoSQL data than just reusing the ORM. The ORM has no way
> to express complex structures and if such support is added, you will
> always have to choose which subset to use. For relational tables you'd
> get foreign keys and for non-relational you'd get structure semantics.
> Then we have the ModelForms that would need to start producing
> sub-formsets for certain structures. In the end you end up with one
> swiss army knife instead of a fork and a knife. While possible, it's
> not very convenient to dine using a swiss army knife.
>
> --
> Patryk Zawadzki
> I solve problems.

You do make a good point. It's not just that NoSQL systems lack
features (transactions, schema, relations) that are common to SQL-
based systems; it is also the case that non-relational systems have
additional features that are missing from many relational databases.
Either the ORM will have to ignore those features, or it will
implement them in a way that risks being convoluted and imperfect.

But what is wrong with the ORM ignoring features that would risk
making it convoluted? Already there are dozens, maybe hundreds of
features of Oracle (including features that are NoSQL-like, like
object tables and hierarchical queries) that are not implemented in
the Django ORM, and that's OK. There are a number of specialized
fields in Postgres that are not officially supported, some that are
also NoSQL-like (array fields, for instance). Even aggregation wasn't
supported at all by the ORM until version 1.1. The ORM is already
making these kinds of choices in order to provide a standardized
interface that can be reused without modification; anyone who needs
these features can use them outside the ORM, or can extend the ORM.

There is a place in the world for swiss army knives. Among the big
selling points that Django has are its contrib apps and its pluggable
third-party apps. The admin contrib app is a headline feature of the
framework; I would call the admin a swiss army knife. There's nothing
wrong with being a swiss army knife, it seems to be part of the
framework's objectives.

Waldemar Kornewald

unread,
May 11, 2011, 3:20:20 AM5/11/11
to django-d...@googlegroups.com
On Tue, May 10, 2011 at 11:29 PM, Patryk Zawadzki <pat...@pld-linux.org> wrote:
> On Tue, May 10, 2011 at 9:40 PM, legutierr <legu...@gmail.com> wrote:
>> Maybe it is inevitable that this kind of debate will crop up in any
>> discussion of django-nonrel or NoSQL, but I very much hope that the
>> philosophical debate does not detract from this fact: that django-
>> nonrel has demonstrated in very real terms that the actual changes
>> needed for Django's ORM to interface with a diverse set of non-
>> relational systems, are, in the general scheme of things, relatively
>> minor.  Because they are localized and relatively minor, if those
>> changes do not have a negative impact on the usability and stability
>> of the ORM, and if they do not introduce noticeable backwards
>> incompatibility, that small set of changes should, in my opinion, be
>> considered for acceptance into Django.
>
> Please don't get me wrong. I have worked with RDBMS for more than a
> decade but I alse use django-nonrel with MongoDB on a daily basis. I
> also think that the approach django-mongokit takes is much more
> natural for NoSQL data than just reusing the ORM. The ORM has no way
> to express complex structures and if such support is added, you will
> always have to choose which subset to use.

Are EmbeddedModelField and DictField not enough to express complex structures?

Django-nonrel currently only doesn't allow to run complex queries on
those fields, but that can be added.

Bye,
Waldemar

Markus Gattol

unread,
May 11, 2011, 4:01:27 AM5/11/11
to Django developers
> You do make a good point.  It's not just that NoSQL systems lack
> features (transactions, schema, relations) that are common to SQL-
> based systems

Actually most graph databases (such as neo4j) which are all considered
NoSQL, have all those features you just mentioned (ACID, relations,
schemas, etc.). At some point down the line I'd like to maybe use
PostgresSQL, MongoDB and neo4j with the ORM. Maybe that's to much,
maybe not. Maybe it's a bad idea, maybe not. We shall see ...
Reply all
Reply to author
Forward
0 new messages