[GSOC] NoSQL Support for the ORM

86 views
Skip to first unread message

Alex Gaynor

unread,
Apr 6, 2010, 8:11:18 PM4/6/10
to django-d...@googlegroups.com
Non-relational database support for the Django ORM
==================================================

Note: I am withdrawing my proposal on template compilation. Another student
has expressed some interest in working on it, and in any event I am now more
interested in working on this project.

About Me
~~~~~~~~

I'm a sophomore computer science student at Rensselaer Polytechnic Institute.
I'm a frequent contributor to Django (including last year's successful multiple
database GSoC project) and other related projects; I'm also a committer on both
`Unladen Swallow <http://code.google.com/p/unladen-swallow/>`_ and
`PyPy <http://codespeak.net/pypy/dist/pypy/doc/>`_.

Background
~~~~~~~~~~

As the person responsible for large swaths of multiple database support I am
intimately familiar with the architecture of the ORM, the code itself, and the
various concerns that need to be accounted for (pickleability, etc.).

Rationale
~~~~~~~~~

Non-relational databases tend to support some subset of the operations that are
supported on relational databases, therefore it should be possible to perform
these operations on all databases. Some people are of the opinion that we
shouldn't bother to support these databases, because they can't perform all
operations, I'm of the opinion that the abstraction is already a little leaky,
we may as well exploit this for a common API where possible, as well as giving
users of these databases the admin and models forms for free.

Method
~~~~~~

The ORM architecture currently has a ``QuerySet`` which is backend agnostic, a
``Query`` which is SQL specific, and a ``SQLCompiler`` which is backend
specific (i.e. Oracle vs. MySQL vs. generic). The plan is to change ``Query``
to be backend agnostic by delaying the creation of structures that are SQL
specific, specifically join/alias data. Instead of structures like
``self.where``, ``self.join_aliases``, or ``self.select`` all working in terms
of joins and table aliases the composition of a query would be stored in terms
of a tree containing the "raw" filters, as passed to the filter calls, with
things like ``Field.get_prep_value`` called appropriately. The ``SQLCompiler``
will be responsible for computing the joins for all of these data-structures.

The major complications are operations where ordering matters, for example
``filter()`` and ``annotate()``. Because the order of these operations matters
it is imperative that the structures continue to maintain the ordered semantics
of these methods. Another example is that filters across a many valued
relationship have different semantics when they're in the same call to
``filter()`` as opposed to separate calls. In the current ``Query`` this is
represented by using different table aliases, however because the new structure
doesn't deal in aliases yet all values should be annotated with a table
"counter" indicating that once joins are computed two different values need to
be on the same join. This is a bit of a leaky abstraction, but that's life.
It should be noted that joins don't have to be explicitly marked as being
different, only the same (i.e. the ``SQLCompiler`` can choose to reuse,
reorder, or do anything else it likes to efficiently generate SQL).

For operations that aren't supported by a backend (i.e. a JOIN on a
non-relational backend, or ``extra`` SQL on non-SQL backends) it is the
backend's responsibility to raise the appropriate exception (or attempt to
emulate it in some way (e.g. some JOINs can be emulated with nested IN
queries)).

Timeline
~~~~~~~~

This timeline is way coarser than I'd like, consider it a work in progress.

* 2 weeks - update all ``Query`` methods to store data in a backend agnostic
manner.
* 4 weeks - update ``SQLCompiler`` to correctly generate SQL from the
structures, specifically migrate the JOIN generation logic.
* 2 weeks - begin working on a backend for a non-relational database (probably
MongoDB)
* 3 weeks - deal with bugs as they come up, these will mostly be
related to the
semantics of inserts and updates at a guess.

Deliverables
~~~~~~~~~~~~

* Refactored ORM ``Query`` and ``SQLCompiler`` classes.
* A working MongoDB backend (to live outside of the core) supporting:
* Native lookups (MongoDB supports most "basic" lookup types)
* Creation/update
* deletion
* Working forms (should fall out naturally)

Reality
~~~~~~~

All applications aren't magically going to start working on database they
weren't designed to work with. Using a non-relational database requires a
fundamental change of mindset, the point of this is to be able to use the same
API where possible, and get access to things like the admin and forms.

A note on the admin
~~~~~~~~~~~~~~~~~~~

The admin's fundamental operations are list, create, update. Fundamentally
these should fall out, naturally, for all backends that work. However, there
are some operations that can subtly require more advanced backend
operations. Specifically, ``list_filter`` and ``search_fields``
require backends that
support for the queries that they generate. In cases where a user tries to use
these features with a backend that doesn't support them the expected result is
for the backend to raise an exception. This isn't a great user-interface, but
the admin attempting to query the backend for this information results in both
code bloat, and a terrible dependency inversion (i.e. the backend should be
responsible for knowing what operations it can perform). Ultimately this is a
case where it is the developer's responsibility to know what they are doing.


Comments, criticism, Nobel prize nominations, and letter bombs welcome,
Alex

Waldemar Kornewald

unread,
Apr 7, 2010, 4:43:05 AM4/7/10
to Django developers
Hey Alex,

On Apr 7, 2:11 am, Alex Gaynor <alex.gay...@gmail.com> wrote:
> Non-relational database support for the Django ORM
> ==================================================
>
> Note:  I am withdrawing my proposal on template compilation.  Another
> student
> has expressed some interest in working on it, and in any event I am
> now more
> interested in working on this project.

It's great that you want to work on this project. Since I want to see
this feature in Django, I'm offering mentoring help with the NoSQL
part. You know Django's ORM better than me, so I probably can't really
help you there, but I can help to make sure that your modifications
will work well on NoSQL DBs. Just in case this is necessary, I'll
apply as a GSoC mentor before it's too late (if I remember correctly,
in 2007 we could still allow new mentors even at this late stage)?

> Method
> ~~~~~~
>
> The ORM architecture currently has a ``QuerySet`` which is backend
> agnostic, a
> ``Query`` which is SQL specific, and a ``SQLCompiler`` which is
> backend
> specific (i.e. Oracle vs. MySQL vs. generic).  The plan is to change
> ``Query``
> to be backend agnostic by delaying the creation of structures that are
> SQL
> specific, specifically join/alias data.  Instead of structures like
> ``self.where``, ``self.join_aliases``, or ``self.select`` all working
> in terms
> of joins and table aliases the composition of a query would be stored
> in terms
> of a tree containing the "raw" filters, as passed to the filter calls,
> with
> things like ``Field.get_prep_value`` called appropriately.  The
> ``SQLCompiler``
> will be responsible for computing the joins for all of these data-
> structures.

Could you please elaborate on the data structures? In the end, non-
relational backends shouldn't have to reproduce large parts of the
SQLQuery code just to emulate a JOIN. When we tried to do a similar
refactoring we quickly faced the problem that we needed something
similar to setup_joins() and other SQLQuery features. We'd also have
to create code for grouping filters into individual queries on tables.
The Query class should take care of as much of the common stuff as
possible, so nonrel backends can potentially emulate every single SQL
feature (e.g., via MapReduce or whatever) with the least effort.
Otherwise this refactoring would actually have more disadvantages than
our current SQLCompiler-based approach in Django-nonrel (as ridiculous
as that sounds).

However, it's important that all of the emulated features are handled
not by the backend, but by a reusable code layer which sits on top of
the nonrel backends. It would be wasteful to let every backend
developer write his own JOIN emulation and denormalization and
aggregate code, etc.. The refactored ORM should at least still allow
for writing some kind of "proxy" backend that sits on top of the
actual nonrel backend and takes care of SQL features emulation. I'm
not sure if it's a good idea to integrate the emulation into Django
itself because then progress will be slowed down.

Ideally, we should provide a simplified API for nonrel backends,
similar to the one that we recently published for Django-nonrel, so a
backend could be written in two days instead of two weeks. We can port
our work over to the refactored ORM, so this you don't have to deal
with this (except if it should be officially integrated into Django).

In addition to these changes you'll also need to take care of a few
other things:

Many NoSQL DBs provide a simple "upsert"-like behavior where on save()
they either create a new entity if none exists with that primary key
or update the existing entity if one exists. However, on save() Django
first checks if an entity exists. This would be inefficient and
unnecessary, so the backend should be able to turn that behavior off.

On delete() Django also deletes related objects. This can be a costly
operation, especially if you have a large number of entities. Also,
the queries that collect the related entities can conflict with
transaction support at least on App Engine and it might also be very
inefficient on HBase. IOW, it's not sufficient to let the user handle
the deletion for large datasets. So, non-relational (and maybe also
relatinoal) DBs should be able to defer and split up the deletion
process into background tasks - which would also simplify the
developer's job because he doesn't have to take care of manually
writing background tasks for large datasets, so it's a good addition
in general.

I'm not sure how to handle multi-table inheritance. It could be done
with JOIN emulation, but this would be very inefficient.
Denormalization is IMHO not the answer to this problem, either. Should
Django simply fail to execute such a query on those backends or should
the user make sure that he doesn't use multi-table inheritance
unnecessarily in his code?

Bye,
Waldemar Kornewald

Russell Keith-Magee

unread,
Apr 7, 2010, 6:47:50 AM4/7/10
to django-d...@googlegroups.com

I can see the intention here, and I can see how this approach could be
used to solve the problem. However, my initial concern is that normal
SQL users will end up carrying around a lot of extra overhead so that
they can support backends that they will never use.

Have you given any thought to how complex the datastructures inside
Query will need to be, and how complex and/or expensive the conversion
process will be?

Other issues that spring to mind:

* What about nonSQL datatypes? List/Set types are a common feature of
Non-SQL backends, and are The Right Way to solve a whole bunch of
problems. How do you propose to approach these datatypes? What (if
any) overlap exists between the use of set data types and m2m? Is
there any potential overlap between supporting List/Set types and
supporting Arrays in SQL?

* How does a non-SQL backend integrate with syncdb and other setup
tools? What about inspectdb?

* What about basic connection management? Is the existing Connection
API likely to be compatible, or will modifications be required?

* Why the choice of MongoDB specifically? Do you have particular
experience with MongoDB? Does MongoDB have features that make it a
good choice?

* Given that you're only proposing a single proof-of-concept backend,
have you given any thought to the needs of other backends? It's not
hard to envisage that Couch, Cassandra, GAE etc will all have slightly
different requirements and problems. Is there a common ground that
exists between all data store backends? If there isn't, how do you
know that what you are proposing will be sufficient to support them?

There's also the issue of specificity in your proposal; I'll take you
at your word that what you have proposed is a draft that requires
elaboration.

Yours,
Russ Magee %-)

Alex Gaynor

unread,
Apr 7, 2010, 11:12:10 AM4/7/10
to django-d...@googlegroups.com

No. I am vehemently opposed to attempting to extensively emulate the
features of a relational database in a non-relational one. People
talk about the "object relational" impedance mismatch, much less the
"object-relational non-relational" one. I have no interest in
attempting to support any attempts at emulating features that just
don't exist on the databases they're being emulated on.

> In addition to these changes you'll also need to take care of a few
> other things:
>
> Many NoSQL DBs provide a simple "upsert"-like behavior where on save()
> they either create a new entity if none exists with that primary key
> or update the existing entity if one exists. However, on save() Django
> first checks if an entity exists. This would be inefficient and
> unnecessary, so the backend should be able to turn that behavior off.
>
> On delete() Django also deletes related objects. This can be a costly
> operation, especially if you have a large number of entities. Also,
> the queries that collect the related entities can conflict with
> transaction support at least on App Engine and it might also be very
> inefficient on HBase. IOW, it's not sufficient to let the user handle
> the deletion for large datasets. So, non-relational (and maybe also
> relatinoal) DBs should be able to defer and split up the deletion
> process into background tasks - which would also simplify the
> developer's job because he doesn't have to take care of manually
> writing background tasks for large datasets, so it's a good addition
> in general.
>

There is seperate work on another ticket to provide a way to declare
ON_DELETE behavior, though this is a bit of a relational concept it
seems to me making these easy to customize provides a good way for
different backends to specify their behavior here.

> I'm not sure how to handle multi-table inheritance. It could be done
> with JOIN emulation, but this would be very inefficient.
> Denormalization is IMHO not the answer to this problem, either. Should
> Django simply fail to execute such a query on those backends or should
> the user make sure that he doesn't use multi-table inheritance
> unnecessarily in his code?
>

There's nothing about MTI that's inherently hard on a non-relational
database, besides not being able to "select_related" the parent.

> Bye,
> Waldemar Kornewald
>
> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>

Alex

--
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me

Alex Gaynor

unread,
Apr 7, 2010, 11:22:27 AM4/7/10
to django-d...@googlegroups.com

I see no reason they need to be any more complex than the current
ones. You have a tree that represents filters (combined where and
having, this means that the SQLCompiler is responsible for splitting
these up, which I think will make fixing some other bugs easier (i.e.
disjunction with a filter on aggregates currently doesn't work)).
There's already quite a lot of stuff that's computed later, such as
select_related's transformation into JOINs.

> Other issues that spring to mind:
>
>  * What about nonSQL datatypes? List/Set types are a common feature of
> Non-SQL backends, and are The Right Way to solve a whole bunch of
> problems. How do you propose to approach these datatypes? What (if
> any) overlap exists between the use of set data types and m2m? Is
> there any potential overlap between supporting List/Set types and
> supporting Arrays in SQL?
>

Is there overlap between List/Set and Arrays in SQL? Probably. In my
opinion there's no reason, once we have a good clean seperation of
concerns in the architecture that implementing a ListField would be
particularly hard. If we happened to include one in Django, all the
better (from the perspective of interoperability).

>  * How does a non-SQL backend integrate with syncdb and other setup
> tools? What about inspectdb?
>

Most, but not all non-relational databases don't require table setup
the way relational DBs do. MongoDB doesn't require anything at all,
by contrast Cassandra requires an XML configuration file. How to
handle these is a little touchy, but basically I think syncdb should
stay conceptually pure, generating "tables", if extra config is needed
backends should ship custom management commands.

As for inspectdb it only really makes sense on backends that have
structured "tables", so they could implement it, and other backends
could punt.

>  * What about basic connection management? Is the existing Connection
> API likely to be compatible, or will modifications be required?
>

No, it's not. non-relational databases aren't bound by PEP-249 and
thus have wildly incompatible APIs. However, we cheated a bit with
multi-db. Compilers are responsible for actually executing their
queries, so they can already deal with the inconsistancies here.

>  * Why the choice of MongoDB specifically? Do you have particular
> experience with MongoDB? Does MongoDB have features that make it a
> good choice?
>

MongoDB offers a wide range of filtering options, which from my
perspective means it presents a greater test of the flexibility of the
developed APIs. For this reason GAE would also be a good choice.
Something like Riak or Cassandra, which basically only have native
support for get(pk=3) would be a poor test of the flexibility of the
API.

>  * Given that you're only proposing a single proof-of-concept backend,
> have you given any thought to the needs of other backends? It's not
> hard to envisage that Couch, Cassandra, GAE etc will all have slightly
> different requirements and problems. Is there a common ground that
> exists between all data store backends? If there isn't, how do you
> know that what you are proposing will be sufficient to support them?
>

To a certain extent this is a matter of knowing the featuresets of the
databases and, hopefully, having a mentor who is knowledgeable about
them. The reality is under the GSOC time constraints attempting to
write complete backends for multiple databases would probably be
impossible.

> There's also the issue of specificity in your proposal; I'll take you
> at your word that what you have proposed is a draft that requires
> elaboration.
>
> Yours,
> Russ Magee %-)
>

lasizoillo

unread,
Apr 7, 2010, 2:19:29 PM4/7/10
to django-d...@googlegroups.com
2010/4/7 Alex Gaynor <alex....@gmail.com>:

>  * 2 weeks - begin working on a backend for a non-relational database (probably
>   MongoDB)

Pymodels[1] have backends for MogoDB and Tokyo Tyrant/Cabinet. Maybe
some things can be reused in backend.

http://bitbucket.org/neithere/pymodels/

Regards,

Javi

Alex Gaynor

unread,
Apr 7, 2010, 2:22:08 PM4/7/10
to django-d...@googlegroups.com
> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>

I don't really see how, they use a completely different API.

Waldemar Kornewald

unread,
Apr 7, 2010, 4:43:28 PM4/7/10
to django-developers
On Wed, Apr 7, 2010 at 5:12 PM, Alex Gaynor <alex....@gmail.com> wrote:
> No.  I am vehemently opposed to attempting to extensively emulate the
> features of a relational database in a non-relational one.  People
> talk about the "object relational" impedance mismatch, much less the
> "object-relational non-relational" one.  I have no interest in
> attempting to support any attempts at emulating features that just
> don't exist on the databases they're being emulated on.

This decision has to be based on the actual needs of NoSQL developers.
Did you actually work on non-trivial projects that needed
denormalization and in-memory JOINs and manually maintained counters?
I'm not making this up. The "dumb" key-value store API is not enough.
People are manually writing lots of code for features that could be
handled by an SQL emulation layer. Do we agree until here?

Then, the question boils down to: Is the ORM the right place to handle
those features?

We see more advantages in moving those features into the ORM instead
of some separate API:
No matter whether you do denormalization or an in-memory JOIN, you end
up emulating an SQL-like JOIN. When you're maintaining a counter you
again do a simple and very common operation supported by SQL:
counting. Django's ORM already provides that functionality. Django's
current reusable apps already use that functionality. Developers
already know Django's ORM and thus also that functionality. By moving
these features into the ORM
* existing Django apps will either work directly on NoSQL or at least
be much easier to port
* Django apps written for NoSQL will be portable across all NoSQL DBs
without any code changes and in the worst case require only minor
changes to switch to SQL
* the resulting code is shorter and easier to understand than with a
separate API which would only add another layer of indirection you'd
have to think about *every* (!) single time you work with models (and
if you have to think about this while writing model code you end up
with potentially a lot more bugs, as is actually the case in practice)
* developers won't have to use and learn a different models API (you'd
only need to learn an API for specifying "optimization" rules, but the
models would still be the same)

App Engine's indexes are not that different from what we propose. Like
many other NoSQL DBs, the datastore doesn't create indexes for all
possible queries. Sometimes you'll need a composite index to make
certain queries work. On Cassandra, CouchDB, Redis, and many other
"crippled" NoSQL DBs you solve this problem by maintaining even the
most trivial DB indexes with manually written indexing *code* (and I
mean *anything* that filters on fields other than the primary key). I
bet five years ago database developers would've called anyone nuts
who'd seriously suggest that nonsense, but somehow the NoSQL hype
makes developers forget about productivity. Anyway, on App Engine,
instead of writing code for those trivial indexes you add a simple
index definition to your index.yaml (actually, it's automatically
generated for you based on the queries you execute) and suddenly the
normal query API supports the respective filter rules transparently
(with exactly the same API; this is in strong contrast to Cassandra,
etc. which also make you manually write code for traversing those
manually implemented indexes! basically, they make you implement a
little specialized DB for every project and this is no joke, but the
sad truth). Now, our goal is to bring App Engine's indexing
definitions to the next level and allow to specify denormalization and
other "advanced" indexing rules which make more complicated queries
work transparently, again via the same API that everyone already
knows.

Instead of seeing this as object-relational non-relational mapping you
should see this as an object-relational mapping for a type of DB that
needs explicitly specified indexing rules for complex queries (which,
if you really think about it, exactly describes what working with
NoSQL DBs is like).

>> In addition to these changes you'll also need to take care of a few
>> other things:
>>
>> Many NoSQL DBs provide a simple "upsert"-like behavior where on save()
>> they either create a new entity if none exists with that primary key
>> or update the existing entity if one exists. However, on save() Django
>> first checks if an entity exists. This would be inefficient and
>> unnecessary, so the backend should be able to turn that behavior off.
>>
>> On delete() Django also deletes related objects. This can be a costly
>> operation, especially if you have a large number of entities. Also,
>> the queries that collect the related entities can conflict with
>> transaction support at least on App Engine and it might also be very
>> inefficient on HBase. IOW, it's not sufficient to let the user handle
>> the deletion for large datasets. So, non-relational (and maybe also
>> relatinoal) DBs should be able to defer and split up the deletion
>> process into background tasks - which would also simplify the
>> developer's job because he doesn't have to take care of manually
>> writing background tasks for large datasets, so it's a good addition
>> in general.
>>
>
> There is seperate work on another ticket to provide a way to declare
> ON_DELETE behavior, though this is a bit of a relational concept it
> seems to me making these easy to customize provides a good way for
> different backends to specify their behavior here.

Hmm, I'm not sure. The requirement is that this works transparently on
all DBs (without manually changing ForeignKeys). The proposed setting
ON_DELETE_HANDLED_BY_DB comes close, but it's still not the same
because we still need Django's code for collecting the related objects
(just at a later point and in groups of maybe 100 entities, so it can
be distributed across multiple background task runs).

>> I'm not sure how to handle multi-table inheritance. It could be done
>> with JOIN emulation, but this would be very inefficient.
>> Denormalization is IMHO not the answer to this problem, either. Should
>> Django simply fail to execute such a query on those backends or should
>> the user make sure that he doesn't use multi-table inheritance
>> unnecessarily in his code?
>>
>
> There's nothing about MTI that's inherently hard on a non-relational
> database, besides not being able to "select_related" the parent.

What if you filter on one field defined in the parent class and
another field defined on the child class? Emulating this query would
be either very inefficient and (for large datasets) possibly return no
results, at all, or require denormalization which I'd find funny in
the case of MTI because it brings us back to single-table inheritance,
but it might be the only solution that works efficiently on all NoSQL
DBs.

Bye,
Waldemar Kornewald

Waldemar Kornewald

unread,
Apr 7, 2010, 5:55:36 PM4/7/10
to django-developers
On Wed, Apr 7, 2010 at 5:22 PM, Alex Gaynor <alex....@gmail.com> wrote:
>> Other issues that spring to mind:
>>
>>  * What about nonSQL datatypes? List/Set types are a common feature of
>> Non-SQL backends, and are The Right Way to solve a whole bunch of
>> problems. How do you propose to approach these datatypes? What (if
>> any) overlap exists between the use of set data types and m2m? Is
>> there any potential overlap between supporting List/Set types and
>> supporting Arrays in SQL?
>>
>
> Is there overlap between List/Set and Arrays in SQL?  Probably.  In my
> opinion there's no reason, once we have a good clean seperation of
> concerns in the architecture that implementing a ListField would be
> particularly hard.  If we happened to include one in Django, all the
> better (from the perspective of interoperability).

Do all SQL DBs provide an array type? PostgreSQL has it and I think it
can exactly mimic NoSQL lists, but I couldn't find an equivalent in
sqlite and MySQL. Does this possibly stand in the way of integrating
an official ListField into Django or is it OK to have a field that
isn't supported on all DBs? Or can we fall back to storing the list
items in separate entities in that case?

>>  * How does a non-SQL backend integrate with syncdb and other setup
>> tools? What about inspectdb?
>>
>
> Most, but not all non-relational databases don't require table setup
> the way relational DBs do.  MongoDB doesn't require anything at all,
> by contrast Cassandra requires an XML configuration file.  How to
> handle these is a little touchy, but basically I think syncdb should
> stay conceptually pure, generating "tables", if extra config is needed
> backends should ship custom management commands.

Essentially, I agree, but I would add things like auto-generated
CouchDB views to the syncdb process (since syncdb on SQL already takes
care of creating indexes, too).

>>  * Why the choice of MongoDB specifically? Do you have particular
>> experience with MongoDB? Does MongoDB have features that make it a
>> good choice?
>>
>
> MongoDB offers a wide range of filtering options, which from my
> perspective means it presents a greater test of the flexibility of the
> developed APIs.  For this reason GAE would also be a good choice.
> Something like Riak or Cassandra, which basically only have native
> support for get(pk=3) would be a poor test of the flexibility of the
> API.

MongoDB really is a good choice. Out-of-the-box (without manual index
definitions) it provides more features than GAE and most other NoSQL
DBs. MongoDB and GAE should also have the simplest backends.

Why should the Cassandra/CouchDB/Riak/Redis/etc. backend only support
pk=... queries? There's no reason why the backend couldn't maintain
indexes for the other fields and transparently support filters on any
field. I mean, you don't really want developers to manually create and
query separate indexing models for mapping one field value to its
respective primary key in the primary model table. We can do much
better than that.

>>  * Given that you're only proposing a single proof-of-concept backend,
>> have you given any thought to the needs of other backends? It's not
>> hard to envisage that Couch, Cassandra, GAE etc will all have slightly
>> different requirements and problems. Is there a common ground that
>> exists between all data store backends? If there isn't, how do you
>> know that what you are proposing will be sufficient to support them?
>>
>
> To a certain extent this is a matter of knowing the featuresets of the
> databases and, hopefully, having a mentor who is knowledgeable about
> them.  The reality is under the GSOC time constraints attempting to
> write complete backends for multiple databases would probably be
> impossible.

Well, you might be able to quickly adapt the MongoDB backend to GAE
(within GSoC time constraints) due to their similarity. Anyway, there
is common ground between the NoSQL DBs, but this highly depends on
what problem we agree to solve. If we only provide exactly the
features that each DB supports natively, they'll appear dissimilar
because they take very different approaches to indexing and if this
isn't abstracted and automated NoSQL support doesn't really make sense
with Django. OTOH, if the goal is to make an abstraction around their
indexes they can all look very similar from the perspective of
Django's ORM (of course they have different "features" like sharding
or eventual consistency or being in-memory DBs or supporting fast
writes or reads or having transactions or ..., but in the end only few
of these features have any influence on Django's ORM, at all).

Bye,
Waldemar Kornewald

bur...@gmail.com

unread,
Apr 8, 2010, 8:03:33 AM4/8/10
to django-d...@googlegroups.com
Hi all,

On Thu, Apr 8, 2010 at 12:55 AM, Waldemar Kornewald
<wkorn...@gmail.com> wrote:
> On Wed, Apr 7, 2010 at 5:22 PM, Alex Gaynor <alex....@gmail.com> wrote:
>>> Other issues that spring to mind:

[...]


> Well, you might be able to quickly adapt the MongoDB backend to GAE
> (within GSoC time constraints) due to their similarity. Anyway, there
> is common ground between the NoSQL DBs, but this highly depends on
> what problem we agree to solve. If we only provide exactly the
> features that each DB supports natively, they'll appear dissimilar
> because they take very different approaches to indexing and if this
> isn't abstracted and automated NoSQL support doesn't really make sense
> with Django. OTOH, if the goal is to make an abstraction around their
> indexes they can all look very similar from the perspective of
> Django's ORM (of course they have different "features" like sharding
> or eventual consistency or being in-memory DBs or supporting fast
> writes or reads or having transactions or ..., but in the end only few
> of these features have any influence on Django's ORM, at all).
>
> Bye,
> Waldemar Kornewald

Could we switch to one issue/feature per thread, please?

I think the overall approach is chosen already, and everyone agreed with it.
And each detail now has to be discussed separately, and overall
discussion continued here.
I.e, I have few words about design of counters and indexes (and my
favorite NoSQL Berkeley DB), but not arrays/lists.

--
Best regards, Yuri V. Baburov, ICQ# 99934676, Skype: yuri.baburov,
MSN: bu...@live.com

Alex Gaynor

unread,
Apr 8, 2010, 12:14:05 PM4/8/10
to django-d...@googlegroups.com
On Wed, Apr 7, 2010 at 4:43 PM, Waldemar Kornewald <wkorn...@gmail.com> wrote:
> On Wed, Apr 7, 2010 at 5:12 PM, Alex Gaynor <alex....@gmail.com> wrote:
>> No.  I am vehemently opposed to attempting to extensively emulate the
>> features of a relational database in a non-relational one.  People
>> talk about the "object relational" impedance mismatch, much less the
>> "object-relational non-relational" one.  I have no interest in
>> attempting to support any attempts at emulating features that just
>> don't exist on the databases they're being emulated on.
>
> This decision has to be based on the actual needs of NoSQL developers.
> Did you actually work on non-trivial projects that needed
> denormalization and in-memory JOINs and manually maintained counters?
> I'm not making this up. The "dumb" key-value store API is not enough.
> People are manually writing lots of code for features that could be
> handled by an SQL emulation layer. Do we agree until here?
>

No, we don't. People are desiging there data in ways that fit their
datastore. If all people did was implement a relational model in
userland code on top of non-relational databases then they'd really be
missing the point.

> Then, the question boils down to: Is the ORM the right place to handle
> those features?
>
> We see more advantages in moving those features into the ORM instead
> of some separate API:
> No matter whether you do denormalization or an in-memory JOIN, you end
> up emulating an SQL-like JOIN. When you're maintaining a counter you
> again do a simple and very common operation supported by SQL:
> counting. Django's ORM already provides that functionality. Django's
> current reusable apps already use that functionality. Developers
> already know Django's ORM and thus also that functionality. By moving
> these features into the ORM
> * existing Django apps will either work directly on NoSQL or at least
> be much easier to port

Not a design concern. People expecting programs designed for totally
separate data models to work should expect to be disappointed. Unless
you're using the limited subset of features supported by all
databases, of course.

> * Django apps written for NoSQL will be portable across all NoSQL DBs
> without any code changes and in the worst case require only minor
> changes to switch to SQL
> * the resulting code is shorter and easier to understand than with a
> separate API which would only add another layer of indirection you'd
> have to think about *every* (!) single time you work with models (and
> if you have to think about this while writing model code you end up
> with potentially a lot more bugs, as is actually the case in practice)
> * developers won't have to use and learn a different models API (you'd
> only need to learn an API for specifying "optimization" rules, but the
> models would still be the same)
>

Uhh, the whole point of htis is that there is only a single API.

Filters on base fields can be implemented fairly easily on databases
with IN queries. Otherwise I suppose it raises an exception.

Alex Gaynor

unread,
Apr 8, 2010, 12:18:08 PM4/8/10
to django-d...@googlegroups.com
On Wed, Apr 7, 2010 at 5:55 PM, Waldemar Kornewald <wkorn...@gmail.com> wrote:
> On Wed, Apr 7, 2010 at 5:22 PM, Alex Gaynor <alex....@gmail.com> wrote:
>>> Other issues that spring to mind:
>>>
>>>  * What about nonSQL datatypes? List/Set types are a common feature of
>>> Non-SQL backends, and are The Right Way to solve a whole bunch of
>>> problems. How do you propose to approach these datatypes? What (if
>>> any) overlap exists between the use of set data types and m2m? Is
>>> there any potential overlap between supporting List/Set types and
>>> supporting Arrays in SQL?
>>>
>>
>> Is there overlap between List/Set and Arrays in SQL?  Probably.  In my
>> opinion there's no reason, once we have a good clean seperation of
>> concerns in the architecture that implementing a ListField would be
>> particularly hard.  If we happened to include one in Django, all the
>> better (from the perspective of interoperability).
>
> Do all SQL DBs provide an array type? PostgreSQL has it and I think it
> can exactly mimic NoSQL lists, but I couldn't find an equivalent in
> sqlite and MySQL. Does this possibly stand in the way of integrating
> an official ListField into Django or is it OK to have a field that
> isn't supported on all DBs? Or can we fall back to storing the list
> items in separate entities in that case?
>

I'd be -1 on using a separate entity, if it's supported it is, if not
it's not. There's no reason it has to be included in Django in any
event (certainly none of the non-relational backends will be, at least
to start with).

Because that's all they support out of the box. You call it
maintaining an index, but it really means setting up a separate
"table" (in RDBMS parlance) and I think that's a level of emulation
that's far beyond what should be supported out of the box. In any
event I can't stop someone from writing a backend that does do that
level of abstraction.

Waldemar Kornewald

unread,
Apr 8, 2010, 1:08:02 PM4/8/10
to django-d...@googlegroups.com
On Thu, Apr 8, 2010 at 6:14 PM, Alex Gaynor <alex....@gmail.com> wrote:
> On Wed, Apr 7, 2010 at 4:43 PM, Waldemar Kornewald <wkorn...@gmail.com> wrote:
>> On Wed, Apr 7, 2010 at 5:12 PM, Alex Gaynor <alex....@gmail.com> wrote:
>>> No.  I am vehemently opposed to attempting to extensively emulate the
>>> features of a relational database in a non-relational one.  People
>>> talk about the "object relational" impedance mismatch, much less the
>>> "object-relational non-relational" one.  I have no interest in
>>> attempting to support any attempts at emulating features that just
>>> don't exist on the databases they're being emulated on.
>>
>> This decision has to be based on the actual needs of NoSQL developers.
>> Did you actually work on non-trivial projects that needed
>> denormalization and in-memory JOINs and manually maintained counters?
>> I'm not making this up. The "dumb" key-value store API is not enough.
>> People are manually writing lots of code for features that could be
>> handled by an SQL emulation layer. Do we agree until here?
>>
>
> No, we don't.  People are desiging there data in ways that fit their
> datastore. If all people did was implement a relational model in
> userland code on top of non-relational databases then they'd really be
> missing the point.

Then you're calling everyone a fool. :) What do you call a CouchDB or
Cassandra index mapping usernames to user pks? Its purpose it exactly
to do something that relational DBs provides out-of-the-box. You can't
deny that people do in fact manually maintain such indexes.

So, you're suggestion to write code like this:

# ----------
class User(models.Model):
username = models.CharField(max_length=200)
email = models.CharField(max_length=200)
...

class UsernameUser(models.Model):
username = models.CharField(primary_key=True, max_length=200)
user_id = models.IntegerField()

class EmailUser(models.Model):
email = models.CharField(primary_key=True, max_length=200)
user_id = models.IntegerField()

def add_user(username, email):
user = User.objects.create(username=username, email=email)
UsernameUser.objects.create(username=username, user_id=user.id)
EmailUser.objects.create(email=email, user_id=user.id)
return user

def get_user_by_username(username):
id = UsernameUser.objects.get(username=username).user_id
return User.objects.get(id=id)

def get_user_by_email(email):
id = EmailUser.objects.get(email=email).user_id
return User.objects.get(id=id)

get_user_by_username('marcus')
get_user_by_email('mar...@marcus.com')
# ----------

What I'm proposing allows you to just write this:

# ----------
class User(models.Model):
username = models.CharField(max_length=200)
email = models.CharField(max_length=200)
...

User.objects.get(username='marcus')
User.objects.get(email='mar...@marcus.com')
# ----------

Are you seriously saying that people should use the first version of
the code when they work with a simplistic NoSQL DB (note, it's how
they work today with those DBs)?

>> * Django apps written for NoSQL will be portable across all NoSQL DBs
>> without any code changes and in the worst case require only minor
>> changes to switch to SQL
>> * the resulting code is shorter and easier to understand than with a
>> separate API which would only add another layer of indirection you'd
>> have to think about *every* (!) single time you work with models (and
>> if you have to think about this while writing model code you end up
>> with potentially a lot more bugs, as is actually the case in practice)
>> * developers won't have to use and learn a different models API (you'd
>> only need to learn an API for specifying "optimization" rules, but the
>> models would still be the same)
>>
>
> Uhh, the whole point of htis is that there is only a single API.

And what you're suggesting is an API whose semantics are different on
every single backend? How is that better? The indexing API would at
least look and behave the same on all backends, so it's a "learn once
and use anywhere" experience.

>> What if you filter on one field defined in the parent class and
>> another field defined on the child class? Emulating this query would
>> be either very inefficient and (for large datasets) possibly return no
>> results, at all, or require denormalization which I'd find funny in
>> the case of MTI because it brings us back to single-table inheritance,
>> but it might be the only solution that works efficiently on all NoSQL
>> DBs.
>>
>
> Filters on base fields can be implemented fairly easily on databases
> with IN queries.  Otherwise I suppose it raises an exception.

How would that be implemented with an IN filter (you have two
different tables)? What would the (pseudo-)code look like?

Bye,
Waldemar Kornewald

Javier Guerra Giraldez

unread,
Apr 8, 2010, 1:50:05 PM4/8/10
to django-d...@googlegroups.com
On Thu, Apr 8, 2010 at 12:08 PM, Waldemar Kornewald
<wkorn...@gmail.com> wrote:
>> No, we don't.  People are desiging there data in ways that fit their
>> datastore. If all people did was implement a relational model in
>> userland code on top of non-relational databases then they'd really be
>> missing the point.
>
> Then you're calling everyone a fool. :) What do you call a CouchDB or
> Cassandra index mapping usernames to user pks? Its purpose it exactly
> to do something that relational DBs provides out-of-the-box. You can't
> deny that people do in fact manually maintain such indexes.


I think there are two very different goals; maybe opposite, maybe complementary:

A: use the _same_ ORM with NoSQL backends. then it's important to
provide (almos) every capability of the current ORM, even if they have
to be emulated when the backend doesn't provide it natively.

B: create a new ORM-like facilty for NoSQL (lets call it ONoM). it
would be used mostly the same as the ORM; but with different
performance properties, and some capabilities missing, some others
added, and some available but with 'emulation warnings'. but in the
end, they should return queryset-like objects, that _must_ be usable
by existing code that take querysets.


IMHO, if the choice between these two isn't make clear and explicit at
start, this kind of arguments won't end.

--
Javier

flo...@gmail.com

unread,
Apr 8, 2010, 2:30:24 PM4/8/10
to Django developers
On Apr 8, 10:50 am, Javier Guerra Giraldez <jav...@guerrag.com> wrote:

> A: use the _same_ ORM with NoSQL backends.  then it's important to
> provide (almos) every capability of the current ORM, even if they have
> to be emulated when the backend doesn't provide it natively.

To do this would mean to essentially implement a relational database
on top of a non-relational data store. Except it would be worse,
because instead of using the database's built-in logic, you'd be doing
all of these operations over the network. It would also be very, very
difficult to do this in a way where the abstraction wouldn't break
down when doing anything non-trivial.

It's just a bad idea.

It's for these same reasons that some database backends throw errors
for some of the aggregate operations.

FWIW, I think Alex's approach has merit--only support that subset of
features that the underlying database directly supports. If someone
wants to build features on top of that, they can do so, but it should
probably live externally to Django. At least until it becomes very
stable and widely-used.

Thanks,
Eric Florenzano

Waldemar Kornewald

unread,
Apr 8, 2010, 3:32:43 PM4/8/10
to django-d...@googlegroups.com
On Thursday, April 8, 2010, flo...@gmail.com <flo...@gmail.com> wrote:
> On Apr 8, 10:50 am, Javier Guerra Giraldez <jav...@guerrag.com> wrote:
>
>> A: use the _same_ ORM with NoSQL backends.  then it's important to
>> provide (almos) every capability of the current ORM, even if they have
>> to be emulated when the backend doesn't provide it natively.
>
> To do this would mean to essentially implement a relational database
> on top of a non-relational data store.  Except it would be worse,
> because instead of using the database's built-in logic, you'd be doing
> all of these operations over the network.  It would also be very, very
> difficult to do this in a way where the abstraction wouldn't break
> down when doing anything non-trivial.

What I'm proposing is not a complete emulation of all features at all
cost, but simply an automation of the things that are possible and in
wide use on nonrel DBs. Moreover, you'd only use these features where
actually needed, so this would be a helper that replaces exactly the
code you'd otherwise write by hand - nothing more. Denormalization,
counters, etc. indeed go over the network, but people still do it
because there is no alternative (CouchDB being an exception, but there
we can auto-generate a view, so the index is created on the DB: same
game, different location).

I'm also not saying that this should be tightly integrated in Django.
It's good enough to provide a separate package that adds these
features. I'm just concerned that Alex' refactoring will make it more
difficult or even impossible to implement an emulation layer because
his goal is totally different.

Bye,
Waldemar Kornewald

bur...@gmail.com

unread,
Apr 8, 2010, 4:26:25 PM4/8/10
to django-d...@googlegroups.com
Hi Waldemar, Alex,

why you didn't do different threads for the different issues? :\

Regarding getting .filter() to work, I suggest we will use explicit
and implicit indexes, something like this:

class User(models.Model):
username = models.CharField(max_length=200, db_index=True) #
db_index=True should add the third line implicitly
email = models.CharField(max_length=200) # db_index=True can help,
but we have to go explicit here.
email = models.CharField(max_length=200, unique=True,
db_index='lowercase') # further ideas on lowercase support
#username_index = models.Index(['username'], 'pk') # line 3
email_index = models.Index(['email'], 'pk',
filter={'email':'lowercase'}) # line 4
...

And user will write:
users = User.objects.filter(email_index__startswith='me@')

Or:

class UsernameIndex(models.Index):
model = User
keys = ['category', 'email']
values = ['id']

def clean_email(value):
return value.lower()

and the user writing:
users = UsernameIndex.objects.filter(email__startswith='me@', category='staff')

But I don't think it has related to GSoC at all.
And this can be made outside of Django.

The same for Counters:

class User(models.Model):
username = models.CharField(max_length=200, db_index=True)
email = models.CharField(max_length=200)
category = models.CharField(max_length=20, choices=(('S', 'Staff'),
('U', 'User'))
counter = aggregates.Counter('category') # count number of users by category

I think, we should go with a separate Django proposal(s), don't put
that on Alex's shoulders.

What Alex will need to provide, is a way to assign a hook for
.filter() from QuerySet, which probably should be put into some kind
of NosqlManager (which will replace Manager at User.objects).
> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>



--
Best regards, Yuri V. Baburov, ICQ# 99934676, Skype: yuri.baburov,
MSN: bu...@live.com

flo...@gmail.com

unread,
Apr 8, 2010, 5:03:12 PM4/8/10
to Django developers
On Apr 8, 12:32 pm, Waldemar Kornewald <wkornew...@gmail.com> wrote:

> What I'm proposing is not a complete emulation of all features at all
> cost, but simply an automation of the things that are possible and in
> wide use on nonrel DBs. Moreover, you'd only use these features where
> actually needed, so this would be a helper that replaces exactly the
> code you'd otherwise write by hand - nothing more. Denormalization,
> counters, etc. indeed go over the network, but people still do it
> because there is no alternative (CouchDB being an exception, but there
> we can auto-generate a view, so the index is created on the DB: same
> game, different location).

"Denormalization, counters, etc." is a completely orthogonal problem.
Solving those problems would help even those who are using relational
databases, in fact. But just because it's useful, and precisely
because it's orthogonal, means it doesn't belong in this summer of
code project.

I think what you're going to run into is that since CouchDB,
Cassandra, MongoDB, GAE, Redis, Riak, Voldemort, etc. are all so
vastly different, that attempting to do *any* emulation will result in
serious pain down the line.

It simply doesn't seem reasonable to claim that whatever refactoring
Alex does, will make "it more difficult or even impossible to
implement an emulation layer" because all he would be doing is
decoupling SQL from the query class. That can *only* make your goal
easier.

Thanks,
Eric Florenzano

Russell Keith-Magee

unread,
Apr 9, 2010, 3:06:07 AM4/9/10
to django-d...@googlegroups.com
On Thu, Apr 8, 2010 at 5:55 AM, Waldemar Kornewald <wkorn...@gmail.com> wrote:
> On Wed, Apr 7, 2010 at 5:22 PM, Alex Gaynor <alex....@gmail.com> wrote:
>>> Other issues that spring to mind:
>>>
>>>  * What about nonSQL datatypes? List/Set types are a common feature of
>>> Non-SQL backends, and are The Right Way to solve a whole bunch of
>>> problems. How do you propose to approach these datatypes? What (if
>>> any) overlap exists between the use of set data types and m2m? Is
>>> there any potential overlap between supporting List/Set types and
>>> supporting Arrays in SQL?
>>>
>>
>> Is there overlap between List/Set and Arrays in SQL?  Probably.  In my
>> opinion there's no reason, once we have a good clean seperation of
>> concerns in the architecture that implementing a ListField would be
>> particularly hard.  If we happened to include one in Django, all the
>> better (from the perspective of interoperability).
>
> Do all SQL DBs provide an array type? PostgreSQL has it and I think it
> can exactly mimic NoSQL lists, but I couldn't find an equivalent in
> sqlite and MySQL. Does this possibly stand in the way of integrating
> an official ListField into Django or is it OK to have a field that
> isn't supported on all DBs? Or can we fall back to storing the list
> items in separate entities in that case?

No - Array types aren't available everywhere. However, it would be
nice to be able to support them (even if not in core); if this GSoC
lays the groundwork to make this possible, then it's worth looking at.

I was more interested in the m2m issue - the 'natural' way to handle
m2m on some NoSQL isn't to have a separate relation, it's to maintain
a list/set of related references.

Yours,
Russ Magee %-)

Waldemar Kornewald

unread,
Apr 9, 2010, 4:42:36 AM4/9/10
to django-d...@googlegroups.com
On Thu, Apr 8, 2010 at 11:03 PM, flo...@gmail.com <flo...@gmail.com> wrote:
> On Apr 8, 12:32 pm, Waldemar Kornewald <wkornew...@gmail.com> wrote:
>
>> What I'm proposing is not a complete emulation of all features at all
>> cost, but simply an automation of the things that are possible and in
>> wide use on nonrel DBs. Moreover, you'd only use these features where
>> actually needed, so this would be a helper that replaces exactly the
>> code you'd otherwise write by hand - nothing more. Denormalization,
>> counters, etc. indeed go over the network, but people still do it
>> because there is no alternative (CouchDB being an exception, but there
>> we can auto-generate a view, so the index is created on the DB: same
>> game, different location).
>
> "Denormalization, counters, etc." is a completely orthogonal problem.
> Solving those problems would help even those who are using relational
> databases, in fact.  But just because it's useful, and precisely
> because it's orthogonal, means it doesn't belong in this summer of
> code project.

I guess we have a misunderstanding here. I never wanted to have it in
this GSoC project. It's clearly out of scope. I just want to make sure
that emulation will not be more difficult.

> I think what you're going to run into is that since CouchDB,
> Cassandra, MongoDB, GAE, Redis, Riak, Voldemort, etc. are all so
> vastly different, that attempting to do *any* emulation will result in
> serious pain down the line.

You are absolutely right that those DBs are vastly different (*at the
low level*) and that's why Django's ORM would suck as a crippled
low-level replacement for the native NoSQL APIs. I can't imagine how
you would map, for instance, Redis' list management features to the
ORM. Depending on your problem you won't get around using Redis'
native API to solve a problem efficiently, no matter how extensive the
emulation layer is. Unless I've misunderstood, Alex wants to build
just a low-level API replacement, but it doesn't make any sense
because the native NoSQL APIs are much better at this task than
Django's ORM will ever be.

But there's a second use-case: Working with object-like data (with
relations between objects). That's where people write indexing code by
hand (column indexes, denormalization indexes, counters, etc.) which
is very unproductive. This use-case is pretty common and it maps
pretty well to Django's ORM *at the high level* because the high-level
usage can look the same on all DBs:
* get by username: User.objects.filter(username=...)
* join via denormalization index: Profile.objects.filter(age=21,
user__username=...)
* keep counter for number of votes for each video: video.vote_set.count()
* etc..

An abstraction/emulation layer can save you a lot of work because you
won't have to maintain the required indexes by hand and those indexes
don't make your query code more complicated. Also, it makes your code
portable (except where you needed the native API for optimization).

This also means that there is a clear separation of purpose: the
native API for a few optimizations and Django's ORM for object-like
data. We already use that distinction when we combine raw SQL with the
ORM.

It's no problem to automatically maintain such indexes on NoSQL DBs.
As long as you're free to store whatever you want in the DB and run
background tasks you have full control over everything.

> It simply doesn't seem reasonable to claim that whatever refactoring
> Alex does, will make "it more difficult or even impossible to
> implement an emulation layer" because all he would be doing is
> decoupling SQL from the query class.  That can *only* make your goal
> easier.

When I said the ORM refactoring should not make it more difficult to
implement the emulation layer Alex said that he was "vehemently
opposed" to emulating SQL features extensively. I don't know what
"extensively" means, but if we have to make a design decision during
the refactoring and Alex says "I don't care about that feature for
NoSQL" we might end up with an ORM that actually makes it more
difficult. That's why I find it important that we agree on a common
goal for the refactoring.

Bye,
Waldemar Kornewald

Thomas Wanschik

unread,
Apr 9, 2010, 5:31:36 AM4/9/10
to Django developers
On Apr 9, 9:06 am, Russell Keith-Magee <freakboy3...@gmail.com> wrote:
> No - Array types aren't available everywhere. However, it would be
> nice to be able to support them (even if not in core); if this GSoC
> lays the groundwork to make this possible, then it's worth looking at.

We already implemented a ListField in django-nonrel which is backend
independent. This can be used as a starting point.

> I was more interested in the m2m issue - the 'natural' way to handle
> m2m on some NoSQL isn't to have a separate relation, it's to maintain
> a list/set of related references.

Your right, for many situations you use lists for m2m but in general
you get problems with big lists because you would have to fetch big
entities and on App Engine you can't have more than 5000 entries in a
list. In that case you have to split the list over multiple entities.
Additionally when making queries which need filters on both models of
the m2m relation it is better to use an intermediary table with
denormalization (for example via ManyToManyField).

Reply all
Reply to author
Forward
0 new messages