It would be great if one core developer could join the project and
actually work on the code. Here's the discussion group:
http://groups.google.com/group/django-non-relational
The difficulty is that we need to convert QuerySet's intermediate
representation (QueryData) to the sql.Query representation. We're not
sure if QueryData in its current implementation has the right format
and someone who knows sql.Query better than we do could be of great
help, especially with the conversion process. Our goal is to get
non-relational backend support into Django 1.3 (which is definitely
possible, so please don't vote -1 next time if this is your only
concern).
I understand if you're currently busy with finishing 1.2, but if
you're interested in helping when will you have time?
Bye,
Waldemar Kornewald
--
http://twitter.com/wkornewald
http://bitbucket.org/wkornewald/
http://allbuttonspressed.blogspot.com/
On Jan 8, 1:10 pm, Waldemar Kornewald <wkornew...@gmail.com> wrote:
> Hi,
> our non-relational port has come to the point where we need to
> back-port the SQL layer to the query backend API (i.e., the new
> query_class()). We could need some help from Django developers who
> know the ORM internals really well. You can find a little introduction
> to the code here:http://bitbucket.org/wkornewald/django-nonrel-multidb/wiki/Home
>
> It would be great if one core developer could join the project and
> actually work on the code. Here's the discussion group:http://groups.google.com/group/django-non-relational
>
> The difficulty is that we need to convert QuerySet's intermediate
> representation (QueryData) to the sql.Query representation. We're not
> sure if QueryData in its current implementation has the right format
> and someone who knows sql.Query better than we do could be of great
> help, especially with the conversion process. Our goal is to get
> non-relational backend support into Django 1.3 (which is definitely
> possible, so please don't vote -1 next time if this is your only
> concern).
>
> I understand if you're currently busy with finishing 1.2, but if
> you're interested in helping when will you have time?
>
Is nobody out there with a little bit of time/interest in helping?
It would be really nice and would speed up the development of
supporting non-relational databases for Django.
Bye,
Thomas Wanschik
> Bye,
> Waldemar Kornewald
>
> --http://twitter.com/wkornewaldhttp://bitbucket.org/wkornewald/http://allbuttonspressed.blogspot.com/
Speaking for myself, I'm pretty busy trying to get features completed
before the 1.2 feature deadline. At the moment, anything that isn't on
the 1.2 roadmap is only getting cursory attention from me. I suspect
the same is true of most of the other core developers, and anyone else
that is closely involved in the Django development process.
However, if I might offer some advice: my experience has been that
large features like this aren't developed by large groups. They are
developed by one or two people working closely. It's only when the
functionality is mostly complete that other people offer help, mostly
in the form of testing.
Yours,
Russ Magee %-)
Do you think you can work with us in the 1.3 release cycle? In the
meantime we can try to back-port SQL as far as possible. We really
need to get this into 1.3 and some support from the Django core team
would be great.
> However, if I might offer some advice: my experience has been that
> large features like this aren't developed by large groups. They are
> developed by one or two people working closely. It's only when the
> functionality is mostly complete that other people offer help, mostly
> in the form of testing.
We are two developers who work closely together, but we don't feel
very comfortable hacking through the SQL layer without any help.
I haven't decided what I want to work on in the 1.3 release cycle. I'd
certainly like to see support for non-SQL backends, but there are many
other features I'd like to work on, too. I'm not going to make any
firm decisions until the 1.3 feature discussion process comes around.
At that time, I'll evaluate the proposals that are on the table.
So - to that end - the most productive thing you can do is get a solid
proposal together. That means a clear statement of the changes you
want to make to the Django core, and why those changes are required.
In the case of non-SQL backends, you will also need to demonstrate
that the changes you are proposing aren't GAE specific - that you are
proposing changes that are sufficient to encompass the general problem
of non-SQL backends.
And to be clear - a solid proposal isn't just "merge this branch". A
patch/branch is one way to prove that you have thought about the
problem in detail, but you also need to provide the discussion and
description necessary to explain the nature of and reasoning behind
the changes you want to make.
The proposal doesn't need to be complete on the first pass, either.
Even demonstrating that you have a solid grasp of the size and scope
of the problems that need to be solved may be sufficient.
You don't have to develop your proposal in a vacuum, either. If you
need feedback or design guidance, just ask. But please be considerate
of the fact that yours isn't the only proposal, and there are other
schedule pressures that exist.
> We really
> need to get this into 1.3 and some support from the Django core team
> would be great.
Well... you're not going to get anything into 1.3 without core team support :-)
I would also advise that you moderate your expectations. Saying that
you "really need to get this into 1.3" implies that you are making
plans based on the assumption that non-SQL backend support will be
merged into trunk and available in 1.3.
This is not a wise course of action. Nothing is final until it is
actually in trunk. There's no guarantee that non-SQL support will be
selected as a 1.3 feature. Even if non-SQL backends are picked as a
1.3 feature, that doesn't guarantee that 1.3 will include non-SQL
backend support - if the code isn't ready, the feature will be bumped.
Our schedule determines the feature set, not the other way around.
Yours
Russ Magee %-)
OK, I think I've found a better way to get very non-relational backend
support integrated with minimal changes to Django. The current patch
is very simple and it's not based on QueryData, but it already
supports the same feature set as our previous port. It's not yet
sufficient for emulating JOINs and maybe a few other "advanced"
features, but it looks like sql.Query and SQLCompiler would only need
minor modifications to be abstract enough for emulating advanced
features with nonrel DBs. Anyway, this can be added once we've
discussed whether the current solution is good enough.
The idea is to just use the SQLCompiler backend API and let the nonrel
backends interpret sql.Query.where and the other query attributes.
This required a few small changes to sql.Query and SQLCompiler.
The patch is this commit:
http://bitbucket.org/wkornewald/django-nonrel/changeset/71117e43ae33/
My reasoning behind the patch is the following:
1. Nonrel DBs don't distinguish between INSERT and UPDATE
On such DBs Model.save_base() shouldn't check if an entity already
exists. Instead, it should just always INSERT. For this I added
connection.features.distinguishes_insert_from_update.
2. Deleting related objects can be too costly on non-relational DBs
For this I added
connection.features.supports_deleting_related_objects. If it's False
no related objects will be deleted. Ideally, the backend should be
able to delegate this job to a background task. This is planned.
3. Multi-table inheritance needs JOINs
For this I added connection.features.supports_multi_table_inheritance.
If it's False Django will reject to save this model or run queries on
this model using that connection.
In the future, at least single inheritance could be emulated with a
ListField type. On App Engine multiple inheritance support would
require lots of custom datastore indexes and could lead to exploding
indexes. We can discuss this later.
4. sql.Query.has_results() used a trick that only works with SQL
I moved some of the code to SQLCompiler, so the backend can override it.
5. We need transactions which lock rows
I added a GAE-specific @commit_locked transaction decorator. This
should be moved into the backend layer, of course. I'd just like to
know if this is an option for 1.3 or not. On SQL this would be SELECT
... FOR UPDATE, but I don't know if all SQL DBs support it. Such a
decorator would be important to make get_or_create() and a few other
functions work 100% correctly, so it would also benefit the SQL layer.
I couldn't provide the respective SQL implementations, though.
Planned changes:
Delegate deletion of related objects to background task.
Maybe emulate multi-table inheritance with a ListField.
Nonrel DBs need special handling for primary key fields.
MongoDB and CouchDB store them in '_id' and App Engine uses a special
property which is not part of the field values dictionary. In order to
emulate JOINs we must store the column names of the primary keys used
in the sql.Query instance.
So, do you think this is a good path to take?
> 1. Nonrel DBs don't distinguish between INSERT and UPDATE
> On such DBs Model.save_base() shouldn't check if an entity already
> exists. Instead, it should just always INSERT. For this I added
> connection.features.distinguishes_insert_from_update.
That's not entirely true. Riak and CouchDB are two that come to mind
instantly as explicitly distinguishing between insert and update (in
Riak an update includes the vector clock information from a previous
read, and in CouchDB they can be mapped to different HTTP verbs).
> 2. Deleting related objects can be too costly on non-relational DBs
> For this I added
> connection.features.supports_deleting_related_objects. If it's False
> no related objects will be deleted. Ideally, the backend should be
> able to delegate this job to a background task. This is planned.
Again, this seems like a GAE-specific observation. In Redis, for
example, related objects can be stored in a list data structure, and
the delete operation can be passed any number or keys, so it could be
a very minimal set of operations to delete related objects (first read
the list of related object keys, then delete them in bulk). Cassandra
also has support coming for batch deletion, at which point efficient
related object deletion will be similarly trivial.
> 5. We need transactions which lock rows
> I added a GAE-specific @commit_locked transaction decorator. This
> should be moved into the backend layer, of course. I'd just like to
> know if this is an option for 1.3 or not. On SQL this would be SELECT
> ... FOR UPDATE, but I don't know if all SQL DBs support it. Such a
> decorator would be important to make get_or_create() and a few other
> functions work 100% correctly, so it would also benefit the SQL layer.
> I couldn't provide the respective SQL implementations, though.
This is admittedly a GAE-specific thing.
I think these efforts are great--a lot of people want to get up and
running on GAE with Django, and it's not so easy right now. It just
worries me a bit that the description of the effort encompasses all of
non-relational databases when the implementation seems to primarily
reason around GAE. It makes sense to concretely pick one database and
start there, otherwise no work would ever be finished, but I think
that the better thing to do is to call it GAE support instead of
nonrel support.
Thanks,
Eric Florenzano
OK, if you read my mail literally I sound like "all" nonrel DBs are
the same. Of course they're not and you can find a counter-example for
almost everything. There's nothing to worry about, though. The
counter-examples simply need additional changes to get supported. It's
not like my changes would prevent other backends.
>> 1. Nonrel DBs don't distinguish between INSERT and UPDATE
>> On such DBs Model.save_base() shouldn't check if an entity already
>> exists. Instead, it should just always INSERT. For this I added
>> connection.features.distinguishes_insert_from_update.
>
> That's not entirely true. Riak and CouchDB are two that come to mind
> instantly as explicitly distinguishing between insert and update (in
> Riak an update includes the vector clock information from a previous
> read, and in CouchDB they can be mapped to different HTTP verbs).
Yes, CouchDB is versioned, so an UPDATE operation is useful. OTOH,
many other nonrel DBs (SimpleDB, MongoDB, GAE, ...) wouldn't
distinguish between INSERT and UPDATE on a save().
>> 2. Deleting related objects can be too costly on non-relational DBs
>> For this I added
>> connection.features.supports_deleting_related_objects. If it's False
>> no related objects will be deleted. Ideally, the backend should be
>> able to delegate this job to a background task. This is planned.
>
> Again, this seems like a GAE-specific observation. In Redis, for
> example, related objects can be stored in a list data structure, and
> the delete operation can be passed any number or keys, so it could be
> a very minimal set of operations to delete related objects (first read
> the list of related object keys, then delete them in bulk). Cassandra
> also has support coming for batch deletion, at which point efficient
> related object deletion will be similarly trivial.
GAE and many other nonrel DBs have batch deletes, but I don't think
that we should delete several thousands of entities that way in a
single request. This will take too long - unless you want to make your
users wait a few seconds for the result.
> I think these efforts are great--a lot of people want to get up and
> running on GAE with Django, and it's not so easy right now. It just
> worries me a bit that the description of the effort encompasses all of
> non-relational databases when the implementation seems to primarily
> reason around GAE. It makes sense to concretely pick one database and
> start there, otherwise no work would ever be finished, but I think
> that the better thing to do is to call it GAE support instead of
> nonrel support.
We have people interested in adding MongoDB, CouchDB, and maybe
SimpleDB support. The current code should be abstract enough for
SimpleDB and probably also MongoDB (though, it would help to modify
AutoField to also support string values). Other DBs might need
additional changes, but that's what the "nonrel" project is for.
Everyone can join and work on Django changes needed for their DB.
No, but they don't allow other backends either. From my perspective,
the purpose of this effort is to make the modifications to core that
make *any* backend that stores data possible.
>> I think these efforts are great--a lot of people want to get up and
>> running on GAE with Django, and it's not so easy right now. It just
>> worries me a bit that the description of the effort encompasses all of
>> non-relational databases when the implementation seems to primarily
>> reason around GAE. It makes sense to concretely pick one database and
>> start there, otherwise no work would ever be finished, but I think
>> that the better thing to do is to call it GAE support instead of
>> nonrel support.
>
> We have people interested in adding MongoDB, CouchDB, and maybe
> SimpleDB support. The current code should be abstract enough for
> SimpleDB and probably also MongoDB (though, it would help to modify
> AutoField to also support string values). Other DBs might need
> additional changes, but that's what the "nonrel" project is for.
> Everyone can join and work on Django changes needed for their DB.
We need to be clear about your goals here.
Speaking for me personally, It's going to be very hard to get me to be
interested in this project unless you're trying to solve the whole
problem.
We have a backend interface that works well for SQL. I'm interested in
seeing the set of modifications that are required in break the
SQL-specific aspects of that interface.
I'm not at all interested in spending a bunch of time vetting
modifications that only solve part of the problem - especially when
those modifications may well prevent the elegant introduction of a
fully refactored backend interface.
I have no problems with the idea of tackling this problem in an
iterative fashion (i.e., get it working for GAE, then get it working
for GAE+MongoDB, and so on), but I'm not going to commit anything to
trunk until you've got enough backends (with enough breadth of
features) to demonstrate that the refactoring that you propose is
sufficient to solve the general problem.
Yours,
Russ Magee %-)
On 17 Jan., 05:36, Russell Keith-Magee <freakboy3...@gmail.com> wrote:
> On Sun, Jan 17, 2010 at 9:37 AM, Waldemar Kornewald
>
> <wkornew...@gmail.com> wrote:
> > On Sat, Jan 16, 2010 at 10:35 PM, flo...@gmail.com <flo...@gmail.com> wrote:
> >> I'm not really a developer on Django itself, but I am fairly
> >> interested in non-relational databases, and some of the things being
> >> said in this thread worry me a bit.
>
> > OK, if you read my mail literally I sound like "all" nonrel DBs are
> > the same. Of course they're not and you can find a counter-example for
> > almost everything. There's nothing to worry about, though. The
> > counter-examples simply need additional changes to get supported. It's
> > not like my changes would prevent other backends.
>
> No, but they don't allow other backends either. From my perspective,
> the purpose of this effort is to make the modifications to core that
> make *any* backend that stores data possible.
The current patch does not prevent any other backend, neither
relationl nor non-relational ones, to be implemented. As Waldemar
wrote we only added flags currently, such that each backend can decide
itself if a set of features is supported or not. Databases like Riak
and CouchDB which explicitly distinguish between insert and update can
do it with the current modification too. In this sense the current
patch should work for all non-relational databases and nothing is App
Engine specific.
The main question of Waldemar was which proposal to follow (using
QueryData or the current compiler approach introduced by mutli-db) and
nobody answered! It seems that multi-db support made a good step
towards even supporting non-relational databases. Such databases can
use a compiler which can interpret SQL.Query. We thing that only minor
modifications are needed to be completly SQL independent and to be
abstract enough for non-relational databases. Following the QueryData
proposal will most probably lead to copy 95% of SQL.Query. So we think
that following the second approch is much better. But we need
feedback!
So we are asking for feedback in order to prevent ourself from
following the wrong proposal.
Again this is not the final patch and other changes will surely be
added.
Bye,
Thomas Wanschik
> Yours,
> Russ Magee %-)
It is, absolutely. I think most (if not all) of the other key-value
stores need just two additions:
1. AutoField with string values
2. Extra backend-specific meta-data in Model
CouchDB and other versioned backends would store the internal revision
number and use that on UPDATE, for example.
In my previous mail I already mentioned that probably all nonrel
backends will need to know the pk column names of all tables in order
to emulate JOINs and maybe a few other features.
Have I missed anything?
As Thomas said, I just wanted to know whether you'd suggest going on
with our SQLCompiler-based solution and I'd like to know what you
think about it in general. Or is it too early to say anything?
Yes :-)
Like I've said several times now, I'm not focussing on non-SQL
backends at the moment.
I can give you broad guidance as to the type of proposal that we are
likely to accept. My last email was an attempt at such advice (i.e,
your proposal shouldn't be GAE specific).
I can also give you specific advice on specific questions about
Django's architecture.
However, I'm *not* an expert on any particular non-SQL storage
framework. I'm not in a position to give any sort of meaningful advice
on how any particular non-SQL backend will be able to integrate with
any particular architectural proposal. People like Eric are in a much
better position to answer questions like those.
When the 1.3 development cycle starts, I will evaluate the proposals
on the table. And - as I've said before - a proposal isn't "merge this
branch". It also includes enough discussion and description to
convince me (and others) that the solution described by a proposal is
correct.
> As Thomas said, I just wanted to know whether you'd suggest going on
> with our SQLCompiler-based solution and I'd like to know what you
> think about it in general. Or is it too early to say anything?
Honestly, I have no idea. You're the ones that will need to evaluate
if a SQLCompiler-based solution is feasible. Investigate, work out
what complications and limitations exist (for GAE and for other
backends), and report back in the form of a concrete proposal.
If you want my gut reaction, I would suggest that interpreting a
SQL-focussed data structure doesn't strike me as a particularly
workable solution for backends that don't expose a pseudo-SQL
interface.
However, I may also be completely wrong. Your task is to convince me,
the core team, and the rest of Django community that what you have
proposed will be sufficient to solve the problem in the general case.
I will say that I like the simplicity of the SQLCompiler approach -
the QueryData approach of adding yet another layer between Query and
final query language really doesn't appeal to me. However,
architectural simplicity ain't worth a hill of beans if it doesn't do
the job it needs to do.
Yours,
Russ Magee %-)