GSoC 2007: Implementing Check Constraints

1 view
Skip to first unread message

theju

unread,
May 25, 2007, 9:55:39 PM5/25/07
to Django developers
Hello everyone,
I am a participant in the Summer of Code 2007 and working on
"Implementing Check Constraints in Django". My mentor is Simon
Blanchard.

The hosting page for the projects is at http://code.google.com/p/django-check-constraints/
and the Proposed Features Wiki at http://code.google.com/p/django-check-constraints/wiki/Features

As you can see from the features page on the wiki, implementing check
constraints will actually reduce the need for writing validators (all
you require is to add the constraints in the models). Since the
constraints are being implemented at the database-level it provides
for better data integrity.

Lately, I have been studying Django's API and have realized that most
of the foundation is already present for me to work on( I would like
to thank Adrian, Jacob and all the developers of Django for this).

For my project,all that I have to do is actually append the sql
statements for the table (containing the check statements) when syncdb
is called. My problem is a typical one that "There is more than one
way of doing it"

Here's how I believe I might be able do it...
1) By creating a Check Field (that inherits the Field from
django.db.models.fields) and getting things done. So I might have to
make a few changes in the django.core.management [precisely in the
_get_sql_model_create function and not to forget the data_types]
2) As proposed by my mentor by getting it done from the
django.db.models.options file.

So which is the best method to go about this problem (even if one
doesn't exist in the list above, please suggest)?

Thanking You
Thejaswi Puthraya

Justin Bronn

unread,
May 26, 2007, 2:45:03 AM5/26/07
to Django developers
>I am a participant in the Summer of Code 2007.

Welcome to the development list!

>For my project,all that I have to do is actually append the sql
>statements for the table (containing the check statements) when syncdb
>is called. My problem is a typical one that "There is more than one
>way of doing it"

We encountered a similar problem on the GIS branch. Specifically, the
OpenGIS specification requires that compliant spatial databases add
geometry columns by calling the stored procedure AddGeometryColumn()
after the table definition.

The way we solved it was a bit different from your proposals. We
patched the management.py function get_custom_sql_for_model() to look
for a function on the field called "_post_create_sql." In our Field
subclass, GeometryField, we define a _post_create_sql method that
returns the needed SQL (a string). The field-generated SQL hitches a
ride on the output of the initial SQL data function that is already
called anyway.

Patch to management.py is the first shown:
http://code.djangoproject.com/changeset/4674

Look at GeometryField to see how _post_create_sql() works:
http://code.djangoproject.com/browser/django/branches/gis/django/contrib/gis/db/models/fields/__init__.py

Regards,
-Justin

Jacob Kaplan-Moss

unread,
May 26, 2007, 2:53:56 AM5/26/07
to django-d...@googlegroups.com, Adrian Holovaty, Malcolm Tredinnick
On 5/26/07, Justin Bronn <jbr...@gmail.com> wrote:
> We encountered a similar problem on the GIS branch. Specifically, the
> OpenGIS specification requires that compliant spatial databases add
> geometry columns by calling the stored procedure AddGeometryColumn()
> after the table definition.
>
> The way we solved it was a bit different from your proposals. We
> patched the management.py function get_custom_sql_for_model() to look
> for a function on the field called "_post_create_sql." In our Field
> subclass, GeometryField, we define a _post_create_sql method that
> returns the needed SQL (a string). The field-generated SQL hitches a
> ride on the output of the initial SQL data function that is already
> called anyway.

I'm a bit drunk right now, but I think this patch looks like a good
idea. If nobody's got any objections and if it still seems like a good
idea in the morning I'm thinking I'll apply this patch to trunk in the
morning. It seems this feature will help a lot in custom field
development.

Jacob

Malcolm Tredinnick

unread,
May 29, 2007, 5:49:07 AM5/29/07
to django-d...@googlegroups.com

Yo,

Did you think better of this, or haven't gotten around to it yet?

I'm going to try and put to bed a lot of the custom field stuff that has
been floating around this week (it's mostly a documentation issue; 90%
of the pieces exist) and _post_create_sql() or something like it is one
of the missing pieces (for custom field creation).

I am wondering a little bit why this isn't _get_field_sql(), though.
That could be as part of the table construction phase, so you can insert
the SQL for the column directly into the CREATE TABLE statement, rather
than adding it after the fact. To be clear, I'm thinking that
_get_field_sq() would return something like

column_name INTEGER NOT NULL

and management.py would add the trailing comma if necessary but
otherwise more or less include the line verbatim in the CREATE TABLE
block.

GIS guys, any particular reasons for the approach you took? -- laziness?
economy? "it works, dammit!"? technical reason against the above?

Of course, this is all peripheral to the SoC project, because I you
really would need _post_create_sql() to add a constraint to a field if
we didn't want to drastically change the SQL construction. So maybe it's
all moot, since _post_create_sql() is probably providing a super-set of
functionality of a mythical _get_field_sql() anyway.

Regards,
Malcolm


Jacob Kaplan-Moss

unread,
May 29, 2007, 11:58:28 AM5/29/07
to django-d...@googlegroups.com, Adrian Holovaty
On 5/29/07, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
>
> On Sat, 2007-05-26 at 01:53 -0500, Jacob Kaplan-Moss wrote:
> > On 5/26/07, Justin Bronn <jbr...@gmail.com> wrote:
> > > We encountered a similar problem on the GIS branch. Specifically, the
> > > OpenGIS specification requires that compliant spatial databases add
> > > geometry columns by calling the stored procedure AddGeometryColumn()
> > > after the table definition.
> > >
> > > The way we solved it was a bit different from your proposals. We
> > > patched the management.py function get_custom_sql_for_model() to look
> > > for a function on the field called "_post_create_sql." In our Field
> > > subclass, GeometryField, we define a _post_create_sql method that
> > > returns the needed SQL (a string). The field-generated SQL hitches a
> > > ride on the output of the initial SQL data function that is already
> > > called anyway.
> >
> > I'm a bit drunk right now, but I think this patch looks like a good
> > idea. If nobody's got any objections and if it still seems like a good
> > idea in the morning I'm thinking I'll apply this patch to trunk in the
> > morning. It seems this feature will help a lot in custom field
> > development.
>
> Yo,
>
> Did you think better of this, or haven't gotten around to it yet?

I hadn't heard back from Adrian yet; I want to make sure to have his OK.

I'll let the GIS folks answer your question about _get_field_sql(); my
impression is that PostGIS at least makes you declare GIS fields
*after* you declare your table.

Jacob

Adrian Holovaty

unread,
May 29, 2007, 12:11:39 PM5/29/07
to django-d...@googlegroups.com
On 5/29/07, Jacob Kaplan-Moss <jacob.ka...@gmail.com> wrote:
> I hadn't heard back from Adrian yet; I want to make sure to have his OK.
>
> I'll let the GIS folks answer your question about _get_field_sql(); my
> impression is that PostGIS at least makes you declare GIS fields
> *after* you declare your table.

Malcolm's _get_field_sql() question is a good one -- I'm +1 on adding
that method to the fields. Only thing is, does _get_field_sql() get
passed the database backend?

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com

Joseph Kocherhans

unread,
May 29, 2007, 12:47:57 PM5/29/07
to django-d...@googlegroups.com
On 5/29/07, Adrian Holovaty <holo...@gmail.com> wrote:
>
> Malcolm's _get_field_sql() question is a good one -- I'm +1 on adding
> that method to the fields. Only thing is, does _get_field_sql() get
> passed the database backend?

If the method is going in, I think passing in the db backend would be
a great idea along with pushing django.db.backends.X.creation into the
specific field classes.

Joseph

Robert Coup

unread,
May 29, 2007, 6:28:40 PM5/29/07
to django-d...@googlegroups.com
Malcolm Tredinnick wrote:
> GIS guys, any particular reasons for the approach you took? -- laziness?
> economy? "it works, dammit!"? technical reason against the above?
>
In PostGIS, the AddGeometryColumn() function needs to be run separately
after the create table. It adds the geometry field to the table, adds
its definition (projection, geometry type, dimensions) to the
'geometry_columns' table, creates a number of constraints, etc.

Normal process for setting up a geometry table:
CREATE TABLE "mytable" (... every field /except/ geometry ones ...);
SELECT AddGeometryColumn("mytable", "mygeometryfield", 4326, "POLYGON", 2);
CREATE INDEX "mytable_mygeometryfield" ON "mytable" USING GIST
("mygeometryfield" GIST_GEOMETRY_OPS);

Thats why it needed to be post_create. post_create allows you to "fix"
anything you didn't like in a create as well or add special indexes or
constraints.

Not saying adding goodies to CREATE TABLE is bad, just that it doesn't
work everywhere.

Rob :)

--
One Track Mind Ltd.
PO Box 1604, Shortland St, Auckland, New Zealand
Phone +64-9-966 0433 Mobile +64-21-572 632
Web http://www.onetrackmind.co.nz


Malcolm Tredinnick

unread,
May 29, 2007, 8:51:17 PM5/29/07
to django-d...@googlegroups.com

I'm +1 on passing in the database backend.

I'm probably -1 on putting *all* of the creation stuff into field
classes, though. The current method, which is like, means that fields do
not have to worry about the database, by and large. Only in the case of
custom fields with very special creation requirements does the leakage
have to happen. I may not have explained this very well, but it smells a
little bad to me (deep technical reason, I realise, but without a
whiteboard and the ability to wave my arms a bit, it's hard to explain
in 250 words or less). There are going to be other opinions, though.

One "pie-in-the-sky" reason for my preferring the current way is that it
makes replacing the database layer (e.g. SQL Alchemy, rdflib, Fred's Own
Fantastic Storage System(tm), ...) cleaner, since there isn't
SQL-specific stuff in the Field classes, mostly. I've spent some time
over the months wondering how we could add creation for specialised
fields without encroaching on the field classes, but I'm not sure it's
particularly neat to try and do so.

Given the GIS guys' requirements (post-creation modification is
necessary), which is also probably the easiest way to add constraints, I
guess there's an argument for both _get_field_sql() and
_post_create_sql() hooks in there.

Regards,
Malcolm

Justin Bronn

unread,
May 29, 2007, 9:40:27 PM5/29/07
to Django developers
> In PostGIS, the AddGeometryColumn() function needs to be run separately
> after the create table. It adds the geometry field to the table, adds
> its definition (projection, geometry type, dimensions) to the
> 'geometry_columns' table, creates a number of constraints, etc.

Thanks, Rob, for the excellent summary; below, I'll give a bit more
detail.

> GIS guys, any particular reasons for the approach you took? -- laziness?
> economy? "it works, dammit!"? technical reason against the above?

Malcolm, the fact that AddGeometryColumn() must be called _after_ the
CREATE TABLE statement is not a choice of the GIS branch. As I stated
before, geometry column creation with the AddGeometryColumn() stored
procedure is part of the OpenGIS standard. [1]

When figuring out where to place the check for the _post_create_sql()
function, I did not randomly choose a function in management.py. My
goals were to (1) accomplish geometry column creation in the
aforementioned manner, (2) minimize patches outside of the contrib
directory, and (3) use the existing API to the fullest extent
possible. I believe this solution, a three-line patch to an existing
post-initialization SQL function, satisfies these criteria.

-Justin

[1] See Open GIS Consortium, Inc., OpenGIS Simple Feature
Specification For SQL Rev. 1.1, Document 99-049 (May 5, 1999), at 2-38
(s. 2.3.8, Geometry Values and Spatial Reference Systems, pg. 40),
available at http://www.opengis.org/docs/99-049.pdf.

theju

unread,
May 30, 2007, 11:28:07 PM5/30/07
to Django developers
Though I am not conversant with the details of the GIS fundas, here is
my observation....their application gave a very good pointer to people
who always wanted to do custom field development. I wouldn't be happy
to see SQL statements be used in the class (even though it gets the
work because I believe there is another place for it in the
management.py or the data_types).

Here is what I did (not complete as yet):

Created a Check class which inherits the Field Class and write all the
attributes or methods here.
Next, in the management.py I make small changes and write the
appropriate SQL statements.

In management.py ....
In _get_sql_model_create(),
...
for f in opts.fields:
if isinstance(f, (models.ForeignKey, models.OneToOneField)):
rel_field = f.rel.get_related_field()
while isinstance(rel_field, (models.ForeignKey,
models.OneToOneField)):
rel_field = rel_field.rel.get_related_field()
data_type = get_rel_data_type(rel_field)
elif isinstance(f,Check):
# Write my SQL statements here.
else:
rel_field = f
data_type = f.get_internal_type()
col_type = data_types[data_type]
if col_type is not None:
# Make the definition (e.g. 'foo VARCHAR(30)') for this
field.
field_output =
[style.SQL_FIELD(backend.quote_name(f.column)),
style.SQL_COLTYPE(col_type % rel_field.__dict__)]
field_output.append(style.SQL_KEYWORD('%sNULL' % (not
f.null and 'NOT ' or '')))
...
Personally, I believe not all cases for custom SQL development require
SQL to be appended (like mine), so is there a necessity for
_post_create_sql() for my project???

Reply all
Reply to author
Forward
0 new messages