Extending Django in a production-oriented way. Is it possible?

25 views
Skip to first unread message

Ivan Illarionov

unread,
Dec 26, 2007, 6:38:57 AM12/26/07
to Django developers
... continued from Cache backend thread.

I have a lot of working code that I know will never get into Django
and I just can't publish it as an external project/projects because
it modifes the core of Django dramatically. For example, I use
Firebird database. I want to use native '?' placeholders without
overhead of converting them to Postgres '%s' placeholders. I want to
use very large varchars instead of blobs in 99% of situations where
Django uses TextFields. I want to define stored procedures and
triggers right inside my model classes without coupling my models with
custom SQL files and I want to have model methods created
automatically from that stored procedures and I want to use ORM
filters based on that stored procedures too. I want database-level
computed fields and many other great and small things.

Good news are that I already do everything I wrote. I want to write
production-oriented code, not Django-way-of-doing-things code. Making
the external database backend from that code will make it weak. So
what can you suggest in my situation. A fork of Djagno? I really need
your advise.

Malcolm Tredinnick

unread,
Dec 26, 2007, 6:58:38 AM12/26/07
to django-d...@googlegroups.com

On Wed, 2007-12-26 at 03:38 -0800, Ivan Illarionov wrote:
> ... continued from Cache backend thread.
>
> I have a lot of working code that I know will never get into Django
> and I just can't publish it as an external project/projects because
> it modifes the core of Django dramatically. For example, I use
> Firebird database. I want to use native '?' placeholders without
> overhead of converting them to Postgres '%s' placeholders. I want to
> use very large varchars instead of blobs in 99% of situations where
> Django uses TextFields.

That all cries out for just maintaining an external database backend.
You claim this will "make it weak", whatever that might mean, but there
isn't really any support for that. Database backends are imported like
any normal Python code. Once it's imported, it's treated like any other
code. If your backend really is a derivative of some other backend, then
you should be able to create the backend as a series of copies and
changes (at import time) to an existing backend -- for example, copying
django.db.backends.postgresql.creastion.DATA_TYPES into your own
backend's namespace and then replacing the TextField entry.

> I want to define stored procedures and
> triggers right inside my model classes without coupling my models with
> custom SQL files and I want to have model methods created
> automatically from that stored procedures and I want to use ORM
> filters based on that stored procedures too. I want database-level
> computed fields and many other great and small things.
>
> Good news are that I already do everything I wrote. I want to write
> production-oriented code, not Django-way-of-doing-things code. Making
> the external database backend from that code will make it weak. So
> what can you suggest in my situation. A fork of Djagno? I really need
> your advise.

If it really does need patching the core and you want to distribute it,
then you have to distribute a patch (or a forked version). There's not
really any practical alternative. Of course, some of the things you
mention can (or shortly will be able to) be done via subclassing
internals and things like that, so you might be able to get away with a
few self-contained apps, like the database backend. Other bits, if
they're truly edge cases and not worth including, will still need to be
patches. It's obviously impossible to give any kind of one-size-fits-all
solution here. Proposed changes have to be considered on a case-by-case
basis.

That's hardly unheard in the Open Source world. To pick but one example,
the Linux kernel distributed by every Linux distribution is a patched
version of the version Linus Torvalds releases.

Regards,
Malcolm

--
I don't have a solution, but I admire your problem.
http://www.pointy-stick.com/blog/

Ivan Illarionov

unread,
Dec 26, 2007, 7:46:09 AM12/26/07
to Django developers
I would be glad if it would be possible to maintain only external
Django backend. It's a lot easier than maintaining the whole forked
framework. But backend solution is not really that easy:
1. Using very large varchars for TextFields needs a lot more than just
overriding creation.DATA_TYPES since we need the optional max_length
attribute and advanced validation against Firebird limits. And in some
cases when we need a really really large TextFields we need another
Field type that will use database blobs. Sticking to blobs as in first
versions of firebird backend is a very limited solution. More info
here: http://www.volny.cz/iprenosil/interbase/ip_ib_strings.htm
2. '%s' placeholders are hardcoded into Django. Existing solutions
(like in sqlite backend) introduce the unneeded overhead in each
database query. There are many other small things hardcoded into
Django with Postgres in mind. Other backends has workarounds with
added overhead and it makes them weaker than de facto default postgres
backend.
3. django.core.management.sql cannot be customized by backends and it
suffers from all that `if connection.features` checks.

So only if Django introduce the new LargeTextField type, allow the
optional max_length for TextField and allow backends to change
django.core.management.sql and customize filed validation I can
publish my work as external Django database backend.

Malcolm Tredinnick

unread,
Dec 26, 2007, 9:14:06 AM12/26/07
to django-d...@googlegroups.com

On Wed, 2007-12-26 at 04:46 -0800, Ivan Illarionov wrote:
> I would be glad if it would be possible to maintain only external
> Django backend. It's a lot easier than maintaining the whole forked
> framework. But backend solution is not really that easy:

This sort of reply is exactly why this stuff can only be considered when
specifics are raised. To address your points in turn...

> 1. Using very large varchars for TextFields needs a lot more than just
> overriding creation.DATA_TYPES since we need the optional max_length
> attribute and advanced validation against Firebird limits. And in some
> cases when we need a really really large TextFields we need another
> Field type that will use database blobs. Sticking to blobs as in first
> versions of firebird backend is a very limited solution. More info
> here: http://www.volny.cz/iprenosil/interbase/ip_ib_strings.htm

You can already create custom field types, so you could create such a
field for your own use. The difficult/impossible part would be making
all existing text fields in Django (and other apps) work in this
fashion, since they rely on the fact that the length doesn't have to be
specified at creation time and introducing that sort of incompatibility
for every existing piece of code without an amazingly good reason would
be unfair. That really is just a constraint of the particular backend
(Firebird in this case), since other backends support essentially
unlimited sized text fields for the types of applications Django is
suited for. Yes, it means there might need to be a cavaet like "the
standard TextField can only support up to XX characters in Firebird",
but that's life in the multi-database game.

If a particular server, like Firebird, benefits from having different
fields in different circumstances, you should create those fields and
encourage the Firebird users to use them in their applications. It has a
slight portability problem, but if somebody is trying to write something
to take advantage of every last optimisation at the database server
level, portability left the picture a while back.

> 2. '%s' placeholders are hardcoded into Django. Existing solutions
> (like in sqlite backend) introduce the unneeded overhead in each
> database query. There are many other small things hardcoded into
> Django with Postgres in mind. Other backends has workarounds with
> added overhead and it makes them weaker than de facto default postgres
> backend.

The format string marker is something that's already on my personal "low
priority TODO" list, at least -- examining the paramstyle variable in
the DB-API module in each case and using the right style to avoid some
of the round tripping. It's a little fiddly, but not impossible and
should be practical to implement in a backwards compatible fashion
(which would otherwise rule it out). This isn't a showstopper for
anything, though, so it's a lower priority than many other things, but I
might get interested enough to do it one afternoon.

I think you'll find that most of the things you think of as
PostgreSQL-specific are more a happy consequence of choosing a database
that was very SQL compliant initially and using that as the template.
All the conditional variations for other databases are usually where
they have non-portable constructs being required, or are a bit
feature-deficient. That's unfortunate, but not really avoidable. In
fact, you'll find a lot of places where we could be much more efficient
if we just picked one database, but there are lots of benefits to
supporting multiple databases.

There may be some small items that truly are specific to that one
database, but I don't think the number is "many". Again, specifics would
help.

> 3. django.core.management.sql cannot be customized by backends and it
> suffers from all that `if connection.features` checks.

Well, that last part is an opinion, rather than indisputable fact. I
suspect we prefer to think of it as a benefit: making it easier to pull
out the backend-specific portions, precisely for the benefit of backend
writers. Ultimately, a lot of the connection.features checks will be
confined to django/db/models/sql/ and django/core/management/ (they're
hardly scattered all over the place at the moment). In queryset-refactor
I'm slowly moving most of the remaining places that do a check and then
change behaviour based on the check into a call to a backend function
that just returns the right result. Each feature is necessary (at least
one supported database is not like the others for that feature), so I'm
not sure what you suggest the alternative might be there.

As for the first part, we already have plans to include per-field
creation-time (for creating specialised fields) and lookup-time (for
looking up fields that need a special query format) SQL insertion, as
well as per-model post-table-creation SQL (for doing modifications to a
table immediately after it is created). Those changes are a logical
consequence of the Geo-Django work and will no doubt have benefits in
other places as well. That's been discussed on this list previously.

With this point and the last one, you're going to get further by raising
specific things that you might like to see changed so we can address
them one by one, rather than just hand waving and saying "it's all for
PostgreSQL" or "suffers from <large, general feature>".

This has wandered far afield from your original question, but I cannot
see what answer you are hoping to get to that query, so it seems better
to address your particular concerns. You seem to have started from an
assumption that none of your changes would be welcomed and Django's
design requires massive changes to integrate some features -- neither of
which is necessarily true -- cast a few aspersions about regarding
Django not being "production oriented" code (also not true, as witnessed
by the large number of Django sites in production around the world,
including those built over legacy databases) and then wondered how to
distribute a patch to core (answer: distribute it as a patch). As you
can see from the above, some of the things you've raised are already in
the fairly advanced planning stage -- in the sense that at least one
person knows how to implement it -- and you might be pleasantly
surprised if what happens if you bring your other concerns to the table,
spaced over a suitable period of time so that we don't melt down
considering dozens of changes in a day or two.

Given all that, keep in mind that we aren't necessarily going to be a
tight-as-a-glove fit for every database on the planet. We can't possibly
provide a comprehensive way to access every single feature. Django
doesn't aim to be a 100% Python wrapper over each database. That would
be unrealistic (or spelt "SQLAlchemy", take your pick). Instead, we aim
to hit the sweet spot for all the backends and typical use-cases in the
website / web application space, whilst still providing mechanisms such
as access to the underlying database cursor, initial SQL files, etc, for
bridging a lot of the remaining gap. If a particular application needs
every little cycle it can get from the database, then using Django's ORM
as the intermediate layer may well be a poor choice, since it's not the
right shovel to hammer in your screws for that scenario.

Possibly this is already perfectly clear to you and is why you've gone
for the distribute a patch approach (in which case, again, what other
option would there be?). However, from the details you've given, I worry
that you might have jumped a little too fast to the conclusion that some
things aren't possible (e.g. a new type of text field) or never will be
(e.g. custom SQL at table creation time).

This has turned out much longer than I intended, but I guess what I'm
really saying is keep some perspective. Things aren't as bad as your
messages seem to be making out. Explain what parts are giving you
problems and we can possibly suggest ways to make it work with the
existing code or consider if some architectural changes might make that
beneficial for all.

Best wishes,
Malcolm

--
Tolkien is hobbit-forming.
http://www.pointy-stick.com/blog/

Ivan Illarionov

unread,
Dec 26, 2007, 10:10:33 AM12/26/07
to Django developers
Thank you. Now I have better understanding. I agree that I was too
pessimistic and maybe had some wrong assumptions about this stuff. I
didn't say that Django is not production-oriented I only mean that
it's hard to extend Django with alternative database keeping the same
production quality level.

1. On TextFelds and CharFields. The main problem in the existing
django.contrib apps. They wouldn't work right out of the box. It's
really important to set at least the max_length for TextFields. With
Firebird < 2.0 we have the 252K index limit and that means that some
CharFields that are indices or have unique_tigether attribute need to
be either shortened or use ASCII encoding (or both in some cases).
Setting the per-column encoding is another very useful feature that
may be ignored by other backends. All we need to do is to add a new
keyword to Field class __init__. max_length attribute of TextFields
can be ignored by other backends too.

2. How can I extend the field validation? I need to validate that user-
defined model fits into index size and max row size limits depending
on Firebird version and database page size.

3. I really need to look at queryset-refactor and use this branch as
the base because it may solve the issues with custom table creation/
deletion

4. Another problem with Firebird is its strict foreign key
constraints. They don't allow forward references even inside
transaction. This can be emulated in Python by queuing queries inside
single transaction, checking fk constraints and executing them in
order that won't cause constraint violation errors. Do you plan to add
something like Unit Of Work pattern to Django? If you don't, I'll have
to write this custom code

5. You don't quote the AutoField column name in Model.save. Why? This
causes errors with Firebird.

Malcolm Tredinnick

unread,
Dec 26, 2007, 10:30:50 AM12/26/07
to django-d...@googlegroups.com

On Wed, 2007-12-26 at 07:10 -0800, Ivan Illarionov wrote:
> Thank you. Now I have better understanding. I agree that I was too
> pessimistic and maybe had some wrong assumptions about this stuff. I
> didn't say that Django is not production-oriented I only mean that
> it's hard to extend Django with alternative database keeping the same
> production quality level.
>
> 1. On TextFelds and CharFields. The main problem in the existing
> django.contrib apps.

And every other third-party Django application on the planet. I can't
see that it's feasible to make this kind of change (adding max_length)
to TextField now. That ties everybody else to Firebird's restriction,
which isn't fair.

This might be one of those cases where you just need to document the
restriction in your backend's documentation and say that TextFields for
firebird will automatically have a maximum length of X for whatever
value "X" is.

> 2. How can I extend the field validation? I need to validate that user-
> defined model fits into index size and max row size limits depending
> on Firebird version and database page size.

Could you clarify this a bit more, please. I don't understand what
you're asking. Why aren't the normal field length validations enough
(except for TextField)? Is it a matter of making sure somebody doesn't
try to create a model that has fields which are too large in the first
place (which could be done by hooking into the class_prepared signal and
doing a quick pass over the fields at that point -- which is when the
model is imported -- to check for inconsistencies).

Imposing arbitrary restrictions on lengths isn't something that's built
in. Maybe it's needed, maybe not... I have no real opinion on that at
the moment. Need to understand the problem better.

> 3. I really need to look at queryset-refactor and use this branch as
> the base because it may solve the issues with custom table creation/
> deletion

At the moment, that code doesn't exist on the branch. It's on the list,
but not yet implemented.

Malcolm

--
What if there were no hypothetical questions?
http://www.pointy-stick.com/blog/

Justin Bronn

unread,
Dec 26, 2007, 10:32:40 AM12/26/07
to Django developers
> 2. '%s' placeholders are hardcoded into Django. Existing solutions
> (like in sqlite backend) introduce the unneeded overhead in each
> database query. There are many other small things hardcoded into
> Django with Postgres in mind. Other backends has workarounds with
> added overhead and it makes them weaker than de facto default postgres
> backend.

In the GeoDjango branch, we had the need to override the '%s'
placeholder for our Oracle and MySQL spatial backends. I accomplished
this by adding a `get_placeholder` routine to the base Field class,
which is then overloaded as needed. [Malcolm: this is one the features
I'll specify in response to your inquiry].

See: http://code.djangoproject.com/browser/django/branches/gis/django/db/models/base.py
(lines 233 and 245).

> 3. I really need to look at queryset-refactor and use this branch as
> the base because it may solve the issues with custom table creation/
> deletion

Yes you really should; Malcolm has done a superb job with this
branch. In my personal experience, my private merges with queryset-
refactor have immensely reduced the amount of required custom code.
Moreover, it is _much_ easier to subclass ORM components for specific
purposes.

In addition, as Malcolm noted, we also had to deal with custom SQL at
table creation time. Thus, I believe your stated problems are hardly
intractable -- even without queryset-refactor -- as GeoDjango has been
doing many of your requested customizations for several months now.

Best Regards,
-Justin

Ivan Illarionov

unread,
Dec 26, 2007, 11:12:13 AM12/26/07
to Django developers
The main problem is that I already have a working code that does all
that I have written about and a lot more. Trying to fit it to external
Django extension will take too much time and effort. So I find that it
would be better to start a new independent project based on Django.
The main advantage of this approach is that I can add all the features
that I want to the core and change whatever I like (e.g. template
system, javascript helpers, durus cache). Django Firebird backend (and
any other goodies) can be extracted later from this project by
interested people.

On 26 дек, 18:32, Justin Bronn <jbr...@gmail.com> wrote:
> > 2. '%s' placeholders are hardcoded into Django. Existing solutions
> > (like in sqlite backend) introduce the unneeded overhead in each
> > database query. There are many other small things hardcoded into
> > Django with Postgres in mind. Other backends has workarounds with
> > added overhead and it makes them weaker than de facto default postgres
> > backend.
>
> In the GeoDjango branch, we had the need to override the '%s'
> placeholder for our Oracle and MySQL spatial backends. I accomplished
> this by adding a `get_placeholder` routine to the base Field class,
> which is then overloaded as needed. [Malcolm: this is one the features
> I'll specify in response to your inquiry].
>
> See:http://code.djangoproject.com/browser/django/branches/gis/django/db/m...

Ivan Illarionov

unread,
Dec 26, 2007, 11:44:26 AM12/26/07
to Django developers
The subject of this thread is not a question. I just try to highlight
the difficulties of third-party module creation. It's not so easy as
suggested in http://www.pointy-stick.com/blog/2007/11/11/django-tip-external-database-backends/

Reply all
Reply to author
Forward
0 new messages