database_check / Ticket #12

3 views
Skip to first unread message

Brant Harris

unread,
Sep 1, 2005, 11:59:12 AM9/1/05
to django-d...@googlegroups.com
Alright, so I'm trying to finish the database_check function in
management.py to generate ALTER TABLE commands. Effectively I'm
steamlining my own solution, but it was decided that the two-step
process I made was too 'messy'. Anyhow I wanted some opinions:

Here's the problem, it is impossible to find a general way to see if a
field in a table needs to be changed. In MySQL it seems to be
relatively easy to figure it out with DESCRIBE table; harder in
Postgres; and I believe impossible in SQLite. So there are two ideas
from there:

One is to add a keyword option for the fields in the model syntax,
something like "changed". So if I have a field,
CharField(maxlength=24, changed=True), django-admin will then CHANGE
that column in the database.

The other idea is to have it CHANGE every single field, regardless of
whether or not it needs to be. Here the logic goes that if you CHANGE
a field to what it already is, nothing occurs, meanwhile all the ones
that need to be changed are. This however, extremely bloats the SQL
code shot back from database_check.

There are some other problems in making this a single-step process.
The most obvious is having database_check decide if a field was
renamed. For instance, if I rename a CharField(maxlength=20) from
'lead' to 'title', then database_check will see that lead was changed
to title, and will simply rename the field in the database. However,
if I rename the CharField, and change it's maxlength to 40, how is it
to know that it's the same CharField? You see to database_check it
doesn't see you "changing" the CharField, it sees that there's a field
missing, and a new one added. So if you say yes to that previous
question, now when you remove, for instance,
"superfluous=CharField(maxlength=100)", and add instead
"important_string=CharField(maxlength=5)", database_check will merely
rename and alter the length of that field in the database, effectively
you now have a lot of fields with junk data in the "important_string"
field.

Martin Maney

unread,
Sep 1, 2005, 2:25:00 PM9/1/05
to django-d...@googlegroups.com
On Thu, Sep 01, 2005 at 10:59:12AM -0500, Brant Harris wrote:
> field in a table needs to be changed. In MySQL it seems to be
> relatively easy to figure it out with DESCRIBE table; harder in
> Postgres; and I believe impossible in SQLite. So there are two ideas
> from there:

I think you've missed the sane third possibility, which is to fail
(nicely, of course) when the request can't be fulfilled.

> One is to add a keyword option for the fields in the model syntax,

-1 Nasty hackish solution (we can't make it work, really, so we'll
add an override and leave it up to you to somehow figure out where to
invoke it. You *did* backup your data before trying this, right?). If
you have to do it that way, at least don't add the (correct once) flags
to the schema itself. It's a throwaway list, so *make* it just that,
and feed it into the change command however you like. I might get to
-0 for that. I doubt I'd care to use it.

> The other idea is to have it CHANGE every single field, regardless of
> whether or not it needs to be. Here the logic goes that if you CHANGE
> a field to what it already is, nothing occurs, meanwhile all the ones
> that need to be changed are. This however, extremely bloats the SQL
> code shot back from database_check.

At least. And who knows if it will really be a no-op for every
database? And thatstill fails for some cases, especially column
renamings (as you preovided an example of). -1

> There are some other problems in making this a single-step process.

Yes indeed. Like for example that it is in general NOT a single-step
process! Sometimes you *have* *to* make a partial change, futz with
things, then complete the change. The sane way to do that, IMO, is to
manipulate the database using a language designed for database
manipulation (eg., SQL), then get Django's model back in sync. The
alternative (that would work in general) would be to add all the
functionality of SQL to what is presently a simpler langauge designed
only to describe a static schema; of course, doing so will make it less
simple, compromising one of its current virtues.

--
The true danger is when liberty is nibbled away,
for expedients, and by parts. -- Edmund Burke

Tim Keating

unread,
Sep 2, 2005, 12:19:38 PM9/2/05
to Django developers
> Yes indeed. Like for example that it is in general NOT a single-step
> process! Sometimes you *have* *to* make a partial change, futz with
> things, then complete the change. The sane way to do that, IMO, is to
> manipulate the database using a language designed for database
> manipulation (eg., SQL), then get Django's model back in sync. The
> alternative (that would work in general) would be to add all the
> functionality of SQL to what is presently a simpler langauge designed
> only to describe a static schema; of course, doing so will make it less
> simple, compromising one of its current virtues.

Arguably, making the canonical descriptor of your data structures the
database (which AFAIK is how Rails does it) has advantages over making
it the code and building the db from that. However, I don't foresee
django changing to that model, and we play the hand we're dealt, right?

IMHO the best way for this process to work seamlessly is to forcibly
make the db tables match the structure dictated by the current version
of the model. Before doing so, however, do some analysis to detect
changes that would cause data loss, build scratch tables & move the
data over, and present the user with a set of template sql statements
-- "Hey, user, updating your table structures lost this data, if you
want it back, here it is . . . or if you don't want it anymore, run
this statement to drop the table."

Martin Maney

unread,
Sep 2, 2005, 1:12:00 PM9/2/05
to django-d...@googlegroups.com
On Fri, Sep 02, 2005 at 04:19:38PM -0000, Tim Keating wrote:
> Arguably, making the canonical descriptor of your data structures the
> database (which AFAIK is how Rails does it) has advantages over making
> it the code and building the db from that. However, I don't foresee
> django changing to that model, and we play the hand we're dealt, right?

Yeah, but that's not what I was saying, exactly. I was talking about
*hard* schema changes - the kind that aren't just adding or dropping a
column (with an obvious default or no need to preserve any of the
existing data, respectively). There's a set of simple changes that are
fairly easy to automate, but implementing only that subset would only
satisfy some users, some of the time. The rest of the time you'd have
to do it the old-fashioned way and sync up later.

Your description of how the automated change might work makes me
distinctly queasy, but I don't have time to sort out the reasons for
that, let alone offer any examples of potential pitfalls. The short
form is that sometimes the structure of the tables is a more vital part
of the representation of the data than you might expect, and what
appears to be a simple join or a reasonable division of a table may well
destroy data, or create [erroneous] data.

--
A delicate balance is necessary between sticking with the things
you know and can rely upon, and exploring things which have the
potential to be better. Assuming that either of these strategies
is the one true way is silly. -- Graydon Hoare

Reply all
Reply to author
Forward
0 new messages