[GSoC 2012] Schema Alteration API proposal

Kushagra Sinha

unread,

Mar 18, 2012, 7:38:38 AM3/18/12

to django-d...@googlegroups.com

Abstract

------------------------------------------------------------------------------

A database migration helper has been one of the most long standing feature

requests in Django. Though Django has an excellent database creation helper,

when faced with schema design changes, developers have to resort to either

writing raw SQL and manually performing the migrations, or using third party

apps like South[1] and Nashvegas[2].

Clearly Django will benefit from having a database migration helper as an

integral part of its codebase.

From [3], the consensus seems to be on building a Ruby on Rails ActiveRecord

Migrations[4] like framework, which will essentially emit python code after

inspecting user models and current state of the database. The python code

generated will then be fed to a 'migrations API' that will actually handle the

task of migration. This is the approach followed by South (as opposed to

Nashvegas's approach of generating raw SQL migration files). This ensures

modularity, one of the trademarks of Django. Third party developers can create

their own inspection and ORM versioning tools, provided the inspection tool

emits python code conforming to our new migrations API.

To sum up, the complete migrations framework will need, at the highest level:

1. A migrations API that accepts python code and actually performs the

migrations.

2. An inspection tool that generates the appropriate python code after

inspecting models and current state of database.

3. A versioning tool to keep track of migrations. This will allow 'backward'

migrations.

4. Glue code to tie the above three together.

Implementation plan

------------------------------------------------------------------------------

Before discussing the implementation plan for the migrations framework, I

would like to digress for a moment and discuss the final state of the

migrations framework when it will be implemented.

For the user, syncing and migrating databases will consist of issuing the

commands syncdb and a new 'migrate' command.

syncdb will be have to be rewritten and a new migrate command will be written.

South's syncdb:

class Command(NoArgsCommand):

def handle_noargs(self, migrate_all=False, **options):

...

apps_needing_sync = []

apps_migrated = []

for app in models.get_apps():

app_label = get_app_label(app)

if migrate_all:

apps_needing_sync.append(app_label)

else:

try:

migrations = migration.Migrations(app_label)

except NoMigrations:

# It needs syncing

apps_needing_sync.append(app_label)

else:

# This is a migrated app, leave it

apps_migrated.append(app_label)

verbosity = int(options.get('verbosity', 0))

# Run the original syncdb procedure for apps_needing_sync

# If migrate is passed as a parameter, run migrate command for rest

The above code is from South's override of syncdb command. It basically

divides INSTALLED_APPS into apps that have a migration history and will be

handled by the migrations framework and those that do not have a migrations

history and will be handled by Django's syncdb. South expects users to

manually run a 'schemamigration --initial' command for every app they want to

be handled by South's migration framework.

If migrations become a core part of Django, every user app will have a

migration folder(module) under it, created at the time of issuing

django-admin.py startapp. Thus by modifying the startapp command to create a

migrations module for every app it creates, we will be able to use South's

syncdb code as is and will also save the user from issuing

schemamigration --initial for all his/her apps.

Now that we have a guaranteed migrations history for every user app, migrate

command will also be more or less a copy of South's migrate command.

Coming back to the migrations API,

There are three fundamental operations that can be performed during a

migration:

1. Creation of a new model.

2. Alteration in an existing model.

3. Deletion of an existing model.

As much as I would have liked to use Django creation API's code for creating

and destroying models, we cannot. The reason for this is Django's creation API

uses its inspection tools to generate *SQL* which is then directly fed to

cursor.execute. What we need is a migrations API which gobbles up *python*

code generated by the inspection tool. Moreover deprecating/removing Django's

creation API to use the new migrations API everywhere will give rise to

performance issues since time will be wasted in generating python code and then

converting python to SQL for Django's core apps which will never have

migrations anyways.

The creation API and code that depends on it (syncdb, sql, django.test.simple

and django.contrib.gis.db.backends) will be left as is.

Therefore much of the code for our new migrations API will come from South.

For models:

class Author(models.Model):

name = models.CharField(max_length=100)

class Book(models.Model):

title = models.CharField(max_length=100)

author = models.ManyToManyField('Author')

the migrations file created under user app / migrations will look like:

class Migration(SchemaMigration):

def forwards(self, orm):

# Adding model 'Book'

db.create_table('myapp_book', (

('id', self.gf('django.db.models.fields.AutoField')(primary_key=True)),

('title', self.gf('django.db.models.fields.CharField')(max_length=100)),

))

db.send_create_signal('myapp', ['Book'])

# Adding M2M table for field author on 'Book'

db.create_table('myapp_book_author', (

('id', models.AutoField(verbose_name='ID', primary_key=True, auto_created=True)),

('book', models.ForeignKey(orm['myapp.book'], null=False)),

('author', models.ForeignKey(orm['myapp.author'], null=False))

))

db.create_unique('myapp_book_author', ['book_id', 'author_id'])

# Adding model 'Author'

db.create_table('myapp_author', (

('id', self.gf('django.db.models.fields.AutoField')(primary_key=True)),

('name', self.gf('django.db.models.fields.CharField')(max_length=100)),

))

db.send_create_signal('myapp', ['Author'])

def backwards(self, orm):

# Deleting model 'Book'

db.delete_table('myapp_book')

# Removing M2M table for field author on 'Book'

db.delete_table('myapp_book_author')

# Deleting model 'Author'

db.delete_table('myapp_author')

models = {

'myapp.author': {

'Meta': {'object_name': 'Author'},

'id': ('django.db.models.fields.AutoField', [], {'primary_key': 'True'}),

'name': ('django.db.models.fields.CharField', [], {'max_length': '100'})

},

'myapp.book': {

'Meta': {'object_name': 'Book'},

'author': ('django.db.models.fields.related.ManyToManyField', [], {'to': "orm['myapp.Author']", 'symmetrical': 'False'}),

'id': ('django.db.models.fields.AutoField', [], {'primary_key': 'True'}),

'title': ('django.db.models.fields.CharField', [], {'max_length': '100'})

}

complete_apps = ['myapp']

when the initial (blank) migration at app creation time is:

class Migration(SchemaMigration):

def forwards(self, orm):

pass

def backwards(self, orm):

pass

models = {}

complete_apps = ['myapp']

The above code snippets have been generated by South. Thus we already have

functionality for create_table, create_unique, delete_table etc. and the

inspection routines which have to be integrated into Django's codebase

(at django.db.migrations perhaps?)

Schedule and Goal

------------------------------------------------------------------------------

Week 1 : Discussion on API design and overriding django-admin startapp

Week 2-3 : Developing the base migration API

Week 4 : Developing migration extensions and overrides for PostgreSQL

Week 5 : Developing migration extensions and overrides for MySQL

Week 6 : Developing migration extensions and overrides for SQLite

Week 7 : Developing the inspection tools

Week 8 : Developing the ORM versioning tools and glue code

Week 9-10 : Writing tests/documentaion

Week 11-12: Buffer weeks for the unexpected, Oracle DB? and

djago.contrib.gis.backends?

Note: Work on Oracle and GIS may not be possible as part of GSoC

I will personally consider my project to be successful if I have created and

tested at least the base API + PostgreSQL extension and inspection + version

tools.

About me and my inspiration for the project

------------------------------------------------------------------------------

I am Kushagra Sinha, a pre-final year student at Institute of Technology

(about to be converted to an Indian Institute of Technology),

Banaras Hindu University, Varanasi, India.

I can be reached at:

Gmail: sinha.kushagra

Alternative email: kush [at] j4nu5.com

IRC: Nick j4nu5 on #django-dev and #django

Twitter: @j4nu5

github: j4nu5

I was happily using PHP for nearly all of my webdev work since my high school

days (CakePHP being my framework of choice) when I was introduced to Django

a year and a half ago. Comparing Django with CakePHP (which is Ruby on Rails

inspired) I felt more attched to Django's philosophy than RoR's "hidden magic"

approach. I have been in love ever since :)

Last year I had an internship at MobStac[5] (BusinessWorld magazine India's

hottest young startup[6]). Their stack is on Django+MySQL. I was involved in

a heavy database migration that involved their analytics platform. Since, they

had not been using a migrations framework, the situation looked grim.

Fortunately, South came to the rescue and we were able to carry out the

migration but it left everyone a little frustrated and clearly in want of a

migrations framework built within Django itself.

Experience

------------------------------------------------------------------------------

I have experience working in a high voltage database migration through my

internship as stated before. I am also familiar with Django's contribution

guidelines and have written a couple of patches[7]. One patch has been

accepted and the second got blocked by 1.4's feature freeze.

My other projects can be seen on my github[8]

[1] http://south.aeracode.org/

[2] https://github.com/paltman/nashvegas/

[3] https://code.djangoproject.com/wiki/SchemaEvolution

[4] http://api.rubyonrails.org/classes/ActiveRecord/Migration.html

[5] http://mobstac.com/

[6] http://blog.mobstac.com/blog/2011/06/businessworld-declares-mobstac-indias-hottest-young-startup/

[7] https://code.djangoproject.com/query?owner=~j4nu5

[8] https://github.com/j4nu5

--
Kushagra SInha

B. Tech. Part III (Pre-final year)

Indian Institute of Technology

Varanasi

Contact: +91 9415 125 215

Russell Keith-Magee

unread,

Mar 18, 2012, 7:33:23 PM3/18/12

to django-d...@googlegroups.com

On 18/03/2012, at 7:38 PM, Kushagra Sinha wrote:

> Abstract
> ------------------------------------------------------------------------------
> A database migration helper has been one of the most long standing feature
> requests in Django. Though Django has an excellent database creation helper,
> when faced with schema design changes, developers have to resort to either
> writing raw SQL and manually performing the migrations, or using third party
> apps like South[1] and Nashvegas[2].
>
> Clearly Django will benefit from having a database migration helper as an
> integral part of its codebase.
>
> From [3], the consensus seems to be on building a Ruby on Rails ActiveRecord
> Migrations[4] like framework, which will essentially emit python code after
> inspecting user models and current state of the database.

Check the edit dates on that wiki -- most of the content on that page is historical, reflecting discussions that were happening over 3 years ago. There have been many more recent discussions.

The "current consensus" (at least, the consensus of what the core team is likely to accept) is better reflected by the GSoC project that was accepted, but not completed last year. I posted to Django-developers about this a week or so ago [1]; there were some follow up conversations in that thread, too [2].

[1] http://groups.google.com/group/django-developers/msg/cf379a4f353a37f8
[2] http://groups.google.com/group/django-developers/msg/2f287e5e3dc9f459

> The python code
> generated will then be fed to a 'migrations API' that will actually handle the
> task of migration. This is the approach followed by South (as opposed to
> Nashvegas's approach of generating raw SQL migration files). This ensures
> modularity, one of the trademarks of Django.

I don't think you're going to be able to ignore raw SQL migrations quite that easily. Just like the ORM isn't able to express every query, there will be migrations that you can't express in any schema migration abstraction. Raw SQL migrations will always need to be an option (even if they're feature limited).

> Third party developers can create
> their own inspection and ORM versioning tools, provided the inspection tool
> emits python code conforming to our new migrations API.
>
> To sum up, the complete migrations framework will need, at the highest level:
> 1. A migrations API that accepts python code and actually performs the
> migrations.

This is certainly needed. I'm a little concerned by your phrasing of an "API that accepts python code", though. An API is something that Python code can invoke, not the other way around. We're looking for django.db.backends.migration as an analog of django.db.backends.creation, not a code consuming utility library.

> 2. An inspection tool that generates the appropriate python code after
> inspecting models and current state of database.

The current consensus is that this shouldn't be Django's domain -- at least, not in the first instance. It might be appropriate to expose an API to extract the current model state in a Pythonic form, but a fully-fledged, user accessible "tool".

> 3. A versioning tool to keep track of migrations. This will allow 'backward'
> migrations.

If backward migrations is the only reason to have a versioning tool, then I'd argue you don't need versioning.

However, that's not the only reason to have versioning, is it :-)

> South's syncdb:
> class Command(NoArgsCommand):
> def handle_noargs(self, migrate_all=False, **options):

As a guide for the future -- large wads of code like this aren't very compelling as part of a proposal unless you're trying to demonstrate something specific. In this case, you're just duplicating some of South's internals -- "I'm going to take South's lead" is all you really needed to say.

> If migrations become a core part of Django, every user app will have a
> migration folder(module) under it, created at the time of issuing
> django-admin.py startapp. Thus by modifying the startapp command to create a
> migrations module for every app it creates, we will be able to use South's
> syncdb code as is and will also save the user from issuing
> schemamigration --initial for all his/her apps.
>
> Now that we have a guaranteed migrations history for every user app, migrate
> command will also be more or less a copy of South's migrate command.

What does this "history" look like? Are migrations named? Are they dated? Numbered? How do you handle dependencies? Ordering? Collisions between parallel development?

*This* is the sort of thing a proposal should be elaborating.

>
> As much as I would have liked to use Django creation API's code for creating
> and destroying models, we cannot. The reason for this is Django's creation API
> uses its inspection tools to generate *SQL* which is then directly fed to
> cursor.execute. What we need is a migrations API which gobbles up *python*
> code generated by the inspection tool. Moreover deprecating/removing Django's
> creation API to use the new migrations API everywhere will give rise to
> performance issues since time will be wasted in generating python code and then
> converting python to SQL for Django's core apps which will never have
> migrations anyways.

This sounds like a false economy to me. If we're talking about the core pipeline for handling a HTTP request, then every method call and abstraction counts. However, that's not what we're talking about. We're talking about utilities used to synchronize the database. They're called by manual invocation, infrequently, and *never* as part of the request/response cycle.

Yes, there will probably be a slowdown -- but we get the benefit of a consistent interface to database creation. However, unless the slowdown to syncdb is such that it becomes *seriously* observable -- e.g., turns sycndb into a 1 minute operation, rather than a 1 second operation -- then you're advocating for duplicating code paths in order to maintain a false economy.

> The creation API and code that depends on it (syncdb, sql, django.test.simple
> and django.contrib.gis.db.backends) will be left as is.
>
> Therefore much of the code for our new migrations API will come from South.

Again, the code snippet highlights nothing here. Anyone qualified to review your proposal is at least familiar with South, so there's no need to give a page long example of South's usage unless you're trying to say something specific about South's API and usage.

> Schedule and Goal
> ------------------------------------------------------------------------------
> Week 1 : Discussion on API design and overriding django-admin startapp
> Week 2-3 : Developing the base migration API
> Week 4 : Developing migration extensions and overrides for PostgreSQL
> Week 5 : Developing migration extensions and overrides for MySQL
> Week 6 : Developing migration extensions and overrides for SQLite
> Week 7 : Developing the inspection tools
> Week 8 : Developing the ORM versioning tools and glue code
> Week 9-10 : Writing tests/documentaion
> Week 11-12: Buffer weeks for the unexpected, Oracle DB? and
> djago.contrib.gis.backends?
>

Week 13 - profit.

Seriously, this is a very unconvincing timetable. What are you basing these estimates on?

Some of the things that raise flags for me:

* What makes you think that MySQL, PostgreSQL and SQLite are all equally complex when it comes to migrations? SQLite doesn't let you rename a table. Tracking MySQL index changes is non-trivial.

* On what basis do you assert that "developing inspection tools" -- presumably for all three databases covered in weeks 4-6 -- will take 1 week?

* If you're not working on tests until week 9-10, how do you plan to establish that the work you do in week 1 actually works?

> Note: Work on Oracle and GIS may not be possible as part of GSoC
>
> I will personally consider my project to be successful if I have created and
> tested at least the base API + PostgreSQL extension and inspection + version
> tools.

If that's the case, then why does your schedule say you're going to complete MySQL and SQLite, and possibly Oracle as well?

I can see that you're obviously enthused by this project, but as it stands, I can't say this is a very compelling proposal.

* It ignores the most recent activity in the area (last year's GSoC, in particular)

* It is extremely light in detail on how some very big details (like your "versioning tools" will work)

* The proposed schedule reads more like a list of things you know you need to do, not a detailed work breakdown backed by realistic estimates.

Thanks for taking the time to submit this proposal. I'd encourage you to have a second swing at this. Read the recent discussions on the topic; take a look at last year's GSoC proposal; and spend some time elaborating on the details that I've highlighted.

Yours,
Russ Magee %-)

Jonathan French

unread,

Mar 19, 2012, 7:08:13 AM3/19/12

to django-d...@googlegroups.com

On 18 March 2012 23:33, Russell Keith-Magee <rus...@keith-magee.com> wrote:

> 2. An inspection tool that generates the appropriate python code after
> inspecting models and current state of database.

The current consensus is that this shouldn't be Django's domain -- at least, not in the first instance. It might be appropriate to expose an API to extract the current model state in a Pythonic form, but a fully-fledged, user accessible "tool".

Is there a writeup anywhere of why this is the consensus? AFAICT it looks like Django already provides half of this in the form of DatabaseIntrospection, that e.g. South actually uses, which generates a model class from the current state of the database. Doing the diff as well doesn't seem like much of a stretch, and might make it more likely for third party custom fields to be made migrateable, if the interface for doing so is in Django core.

- ojno

Andrew Godwin

unread,

Mar 19, 2012, 7:15:56 AM3/19/12

to django-d...@googlegroups.com

On 19/03/12 11:08, Jonathan French wrote:
> On 18 March 2012 23:33, Russell Keith-Magee <rus...@keith-magee.com

No writeup that I know of - however, the main part of the work here
would be the "model differencing" code, which means creating a versioned
ORM, being able to load and save model definitions to some kind of
format, and the actual difference-creating code, which is all too much
to stick into Django.

I've long maintained that I want South to become just that automatic
differencing code, and to just move the actual database API across; this
is mostly because I see there being scope for other kinds of migration
systems apart from the kind South is (for example, a very declarative
one, whose model states are retrieved using the combination of all
migrations, rather than a lump on the bottom of the last one).

As for your proposal, Kushagra, Russ has said most of the points I would
have thought of and a few more - I'd recommend a good look into previous
discussions on this mailing list for most of the current views on how we
want the schema alteration API to work.

I would, however, definitely recommend not touching the Oracle or MSSQL
backends - three is already a lot of work, and they're harder databases
to get a hold of for testing.

Andrew

Jani Tiainen

unread,

Mar 19, 2012, 8:32:51 AM3/19/12

to django-d...@googlegroups.com

19.3.2012 13:15, Andrew Godwin kirjoitti:
> On 19/03/12 11:08, Jonathan French wrote:
>> On 18 March 2012 23:33, Russell Keith-Magee <rus...@keith-magee.com
>> <mailto:rus...@keith-magee.com>> wrote:
>>
>> > 2. An inspection tool that generates the appropriate python code
>> after
>> > inspecting models and current state of database.
>>
>> The current consensus is that this shouldn't be Django's domain --
>> at least, not in the first instance. It might be appropriate to
>> expose an API to extract the current model state in a Pythonic form,
>> but a fully-fledged, user accessible "tool".
>>
>>
>

> I would, however, definitely recommend not touching the Oracle or MSSQL
> backends - three is already a lot of work, and they're harder databases
> to get a hold of for testing.

Here I would like to rise my concern - specially being long time Django
and Oracle user.. =)

First at all everyone can get hands on Oracle Express database, free of
charge standard Django stuff works in it very well. Geodjango doesn't
work with it. AFAIK MSSQL is something that is not officially supported
by Django so that shouldn't be much a problem if it's not touched.

Secondly Django has been in the past very consistent in support of four
databases: SQLite, PostgreSQL, MySQL and Oracle. All supported pretty
well as well as possible. I'm aware that doing migrations for all
databases is a time taking challenge to tackle around all peculiarities
in different backends. So hopefully that consistency is kept even with
new features like this.

And yes, second thing is of course Geodjango part which takes complexity
to whole new level.

--

Jani Tiainen

Anssi Kääriäinen

unread,

Mar 19, 2012, 9:02:00 AM3/19/12

to Django developers

On Mar 19, 2:32 pm, Jani Tiainen <rede...@gmail.com> wrote:
> Here I would like to rise my concern - specially being long time Django
> and Oracle user.. =)
>
> First at all everyone can get hands on Oracle Express database, free of
> charge standard Django stuff works in it very well. Geodjango doesn't
> work with it. AFAIK MSSQL is something that is not officially supported
> by Django so that shouldn't be much a problem if it's not touched.
>
> Secondly Django has been in the past very consistent in support of four
> databases: SQLite, PostgreSQL, MySQL and Oracle. All supported pretty
> well as well as possible. I'm aware that doing migrations for all
> databases is a time taking challenge to tackle around all peculiarities
> in different backends. So hopefully that consistency is kept even with
> new features like this.

I think what was meant in above postings was that it is not needed to
do all the different backends during the GSoC, not that they do not
need to be done at all before commit.

What is important is that the MSSQL support (or any other 3rd party
backend) is possible to write without core changes. Django's backend
architecture is such that it is unlikely any problems arise, but
nevertheless it is important to verify the support isn't limited to
core backends.

> And yes, second thing is of course Geodjango part which takes complexity
> to whole new level.

Handling Geodjango is an important proof that the API is correct.
Geodjango support should be ideally a completely separate patch to
contrib, without any need for core changes. Of course, doing all of
this in the GSoC is just too much, but trying to do a little part
(preferrably the _hardest_ part) is an important verification that the
API is correct.

My opinion is that in the long term Django should aim for better
support of custom model fields, and having good support for these
fields in schema migrations is one important part of that. Geodjango
is an interesting example of hard to do custom model fields

- Anssi

Kushagra Sinha

unread,

Mar 19, 2012, 4:33:43 PM3/19/12

to django-d...@googlegroups.com

Check the edit dates on that wiki -- most of the content on that page is historical, reflecting discussions that were happening over 3 years ago. There have been many more recent discussions.

Sincere apologies. The page says "Last modified 10 days ago". I thought it was pretty recent and didnt check the history.

I can see that you're obviously enthused by this project, but as it stands, I can't say this is a very compelling proposal.

* It ignores the most recent activity in the area (last year's GSoC, in particular)

I did go through last year's proposal but it was in conflict with the above wiki page. I sided with the one that *appeared* more recent but was actually 3 yrs old :(

Thanks for taking the time to submit this proposal. I'd encourage you to have a second swing at this. Read the recent discussions on the topic; take a look at last year's GSoC proposal; and spend some time elaborating on the details that I've highlighted.

Thanks. I will submit a revised proposal ASAP.

Also I needed help/clarifications regarding some points:

I don't think you're going to be able to ignore raw SQL migrations quite that easily. Just like the ORM isn't able to express every query, there will be migrations that you can't express in any schema migration abstraction. Raw SQL migrations will always need to be an option (even if they're feature limited).

Andrew's thread[1] also mentions - "backends will always be able to

generate SQL for operations, but it won't necessarily be runnable
(things like index names can only be resolved at runtime, so you'd get
code like "DROP INDEX <<User.username-index>> ON users;"."

[1] https://groups.google.com/forum/?fromgroups#!topic/django-developers/usFXJvpelmI

Am I correct to assume that the plan is to allow migration files in python as well as pseudo-SQL like above?

In that case, I think will concentrate on just the core part of migrations API and nothing else as far as GSoC is concerned.

Another query:

Andrew's thread above also mentioned:

Some of these operations are already mostly implemented (add_table,
add_index, etc.) in backends' creation modules, but they'll need a bit
of rearranging and separating into a full public API. I also plan to
modify them to take model names and field names, instead of table names
and column names, so the API is exclusively using the Django model layer
to represent changes (there's a possibility that some changes make sense
for schemaless databases as well, specifically renames, so it's best not
to tie it directly to relational databases).

As it happens xtrqt last year had implemented, documented and tested the migrations API for at least SQLite[2]. However he used explicit table and column names in his API methods. Obviously he put the task of table name translation on the API caller. Is there any consensus on the API design regarding this point?

[2] https://groups.google.com/forum/?fromgroups#!searchin/django-developers/xtrqt/django-developers/pSICNJBJRy8/Hl7frp-O-dMJ

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Joe Tennies

unread,

Mar 19, 2012, 5:15:58 PM3/19/12

to django-d...@googlegroups.com

I have been updating the Wiki a bit (https://code.djangoproject.com/wiki/SchemaEvolutionDesign). If you see the linked page about the design, it's based on a general approval I got from Russell. I haven't finished writing it up, but it has been in the 60's and 70's in Wisconsin in March. Needless to say, I've been outside.

I've been thinking of splitting it into 2 sections. The main part missing is that the database layer needs to grow modification APIs. Then there needs to be a contrib app for actually doing the migrations. Note that detecting changes is not part of this (as has been pointed out on this conversation already).

If you look, you can tell the ideas are mainly based off South at this point, but I have been looking at nashvegas to see if they have anything else to add.

--
Joe Tennies
ten...@gmail.com

Andrew Godwin

unread,

Mar 19, 2012, 6:17:43 PM3/19/12

to django-d...@googlegroups.com

On 19/03/12 20:33, Kushagra Sinha wrote:
> Andrew's thread[1] also mentions - "backends will always be able to
> generate SQL for operations, but it won't necessarily be runnable
> (things like index names can only be resolved at runtime, so you'd get
> code like "DROP INDEX <<User.username-index>> ON users;"."
>
> [1]
> https://groups.google.com/forum/?fromgroups#!topic/django-developers/usFXJvpelmI
>
> Am I correct to assume that the plan is to allow migration files in
> python as well as pseudo-SQL like above?
> In that case, I think will concentrate on just the core part of
> migrations API and nothing else as far as GSoC is concerned.

The actual migration file loading/running system was never intended to
be part of the GSOC (nor my port when I was planning it) - the idea was
to get just the database API in for a release, allowing South to lose
all that code, then work on a migration file/running/dependency API for
the next one.

There's a lot more potential bikeshedding and design issues with writing
a migration-running API, so that's one of the reasons it's separated
out. I'd highly recommend focusing just on the database API - what's in
South currently can't be ported straight across, and it needs quite a
bit of cleanup (especially in the SQLite code), so it's still a decent
amount of work.

> Another query:
> Andrew's thread above also mentioned:
> Some of these operations are already mostly implemented (add_table,
> add_index, etc.) in backends' creation modules, but they'll need a bit
> of rearranging and separating into a full public API. I also plan to
> modify them to take model names and field names, instead of table names
> and column names, so the API is exclusively using the Django model layer
> to represent changes (there's a possibility that some changes make sense
> for schemaless databases as well, specifically renames, so it's best not
> to tie it directly to relational databases).
>
> As it happens xtrqt last year had implemented, documented and tested the
> migrations API for at least SQLite[2]. However he used explicit table
> and column names in his API methods. Obviously he put the task of table
> name translation on the API caller. Is there any consensus on the API
> design regarding this point?

I feel that table names should definitely be explicit, as models expose
their expected name. Column names are harder if you accept Django fields
as valid inputs for the type - South currently uses implicit column
names in add_table (i.e. _id gets auto-added) and explicit elsewhere.
I'd rather it was explicit everywhere, and the field's column
information was ignored in the schema alteration API (it's the migration
runner's job to feed it the right stuff, I'd say).

Don't rely too heavily on xtrqt's work from last year - the work that
was done was basically a copy of South's code into a new module with a
few minor changes. We're looking for a more extensive, clean move, with
some partial rewriting - in particular, I'd like to make the dry_run
system more sensible (or possibly drop it in favour of having a SQL
generation mode that would serve a similar purpose).

SQL generation itself would be a nice feature - it's not always possible
(delete_index, delete_unique), but something that tries where it can,
and puts comments/invalid SQL where it can't, would be nice.

Andrew

Kushagra Sinha

unread,

Mar 21, 2012, 9:27:24 AM3/21/12

to django-d...@googlegroups.com

One more thing:

The current creation API in django has methods like "sql_create_model" which basically return sql and it is the caller's responsibility to either call cursor.execute on it (syncdb) or output the sql itself (sql).

South's (and xtrqt's) design is to have functions like "create_table" which execute the sql themselves. Makes little difference to commands like syncdb when they will be rewritten but commands like sql will have to so something like start_transaction, get_sql, rollback_transaction which is kinda hackish according to me.

What should be the design of the new migrations api?

Any thoughts on this?

--
You received this message because you are subscribed to the Google Groups "Django developers" group.

To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to django-developers+unsubscribe@googlegroups.com.

Andrew Godwin

unread,

Mar 21, 2012, 6:03:17 PM3/21/12

to django-d...@googlegroups.com

On 21/03/12 13:27, Kushagra Sinha wrote:
> One more thing:
> The current creation API in django has methods like "sql_create_model"
> which basically return sql and it is the caller's responsibility to
> either call cursor.execute on it (syncdb) or output the sql itself (sql).
>
> South's (and xtrqt's) design is to have functions like "create_table"
> which execute the sql themselves. Makes little difference to commands
> like syncdb when they will be rewritten but commands like sql will have
> to so something like start_transaction, get_sql, rollback_transaction
> which is kinda hackish according to me.
>
> What should be the design of the new migrations api?

Choosing one of these options and justifying it is a decent part of the
application :)

I'd suggest not having multiple functions to do the same thing, but
rather a modification of the South API to allow it to output SQL
(probably replacing the dry-run mode). It won't be possible for all the
South functions, but crucially it will be possible for the functions in
.creation you'd be replacing.

Of course, this method carries with it a high testing cost as you'll
have to essentially remove .creation, migrate all the core code over to
the new stuff, and add a new shim back into .creation for the bits you
replaced. You'll also have to consider what happens to apps like
GeoDjango throughout all this.

You may want to opt for the simpler method and just get .alteration
working for now and mirror some of the methods. Just make sure that
whatever you choose, it fits into the schedule, and you can justify it.

Andrew

Kushagra Sinha

unread,

Mar 25, 2012, 3:59:35 PM3/25/12

to django-d...@googlegroups.com

Here is a revised proposal.

Abstract

------------------------------------------------------------------------------

A database migration helper has been one of the most long standing feature

requests in Django. Though Django has an excellent database creation helper,

when faced with schema design changes, developers have to resort to either

writing raw SQL and manually performing the migrations, or using third party

apps like South[1] and Nashvegas[2].

[1] http://south.aeracode.org/

[2] https://github.com/paltman/nashvegas/

Clearly Django will benefit from having a database migration helper as an

integral part of its codebase.

From the summary on django-developers mailing list[3], the task of building a

migrations framework will involve:

1. Add a db.backends module to provide an abstract interface to migration

primitives (add column, add index, rename column, rename table, and so on).

2. Add a contrib app that performs the high level accounting of "has migration

X been applied", and management commands to "apply all outstanding

migrations"

3. Provide an API that allows end users to define raw-SQL migrations, or

native Python migrations using the backend primitives.

4. Leave the hard task of determining dependencies, introspection of database

models and so on to the toolset contributed by the broader community.

[3] http://groups.google.com/group/django-developers/msg/cf379a4f353a37f8

I would like to work on the 1st step as part of this year's GSoC.

Implementation plan

------------------------------------------------------------------------------

The idea is to have a CRUD interface to database schema (with some additional

utility functions for indexing etc.) with functions like:

* create_table

* rename_table

* delete_table

* add_column

and so on, which will have the *explicit* names of the table/column to be

modified as its parameter. It will be the responsibility of the higher level

API caller (will not be undertaken as part of GSoC) to translate model/field

names to explicit table/column names. These functions will be directly

responsible for modifying the schema, and any interaction with the database

schema will take place by calling these functions. Most of these functions

will come from South.

These API functions will also have a "dry-run" or test mode, in which they

will output raw SQL representation of the migration or display errors if they

occur. This will be useful in:

1. The MySQL backend. MySQL does not have transaction support for schema

modification and hence the migrations will be run in a dry run mode first

so that any errors can be captured before altering the schema.

2. The django-admin commands sql and sqlall that return the SQL (for creation

and indexing) for an app. They will capture the SQL returned from the API

running in dry run mode.

As for the future of the current Django creation API, it will have to be

refactored (not under GSoC) to make use of the 'create' part of our new CRUD

interface, for consistency purposes.

The GeoDjango backends will also have to be refactored to use the new API.

Since, they build upon the base code in db.backends, firstly db.backends will

have to be refactored.

Last year xtrqt had written, documented and tested code for at least the

SQLite backend[4]. As per Andrew's suggestion, I would not be relying too much

on that code but some parts can still be salvaged.

[4] https://groups.google.com/forum/?fromgroups#!searchin/django-developers/xtrqt/django-developers/pSICNJBJRy8/Hl7frp-O-dMJ

Schedule and Goal

------------------------------------------------------------------------------

Week 1 : Discussion on API design and writing tests

Week 2-3 : Developing the base migration API

Week 4 : Developing extensions and overrides for PostgreSQL

Week 5-6 : Developing extensions and overrides for MySQL

Week 7-8.5 : Developing extensions and overrides for SQLite (may be shorter or

longer (by 0.5 week) depending on how much of xtrqt's code is

considered acceptable)

Week 8.5-10: Writing documentaion and leftover regression tests, if any

Week 11-12 : Buffer weeks for the unexpected

I will consider my project to be successful when we have working, tested and

documented migration primitives for Postgres, MySQL and SQLite. If we can

develop a working fork of South to use these primitives, that will be a strong

indicator of the project's success.

About me and my inspiration for the project

------------------------------------------------------------------------------

I am Kushagra Sinha, a pre-final year student at Institute of Technology

(about to be converted to an Indian Institute of Technology),

Banaras Hindu University, Varanasi, India.

I can be reached at:

Gmail: sinha.kushagra

Alternative email: kush [at] j4nu5 [dot] com

IRC: Nick j4nu5 on #django-dev and #django

Twitter: @j4nu5

github: j4nu5

I was happily using PHP for nearly all of my webdev work since my high school

days (CakePHP being my framework of choice) till I was introduced to Django

a year and a half ago. Comparing Django with CakePHP (which is Ruby on Rails

inspired) I felt more attached to Django's philosophy than RoR's "hidden magic"

approach. I have been in love ever since :)

Last year I had an internship at MobStac[5] (BusinessWorld magazine India's

hottest young startup[6]). Their stack is on Django+MySQL. I was involved in

a heavy database migration that involved their analytics platform. Since, they

had not been using a migrations framework, the situation looked grim.

Fortunately, South came to the rescue and we were able to carry out the

migration but it left everyone a little frustrated and clearly in want of a

migrations framework built within Django itself.

[5] http://mobstac.com/

[6] http://blog.mobstac.com/blog/2011/06/businessworld-declares-mobstac-indias-hottest-young-startup/

Experience

------------------------------------------------------------------------------

I have experience working in a high voltage database migration through my

internship as stated before. I am also familiar with Django's contribution

guidelines and have written a couple of patches[7]. One patch has been

accepted and the second got blocked by 1.4's feature freeze.

My other projects can be seen on my github[8]

[7] https://code.djangoproject.com/query?owner=~j4nu5

[8] https://github.com/j4nu5

--
You received this message because you are subscribed to the Google Groups "Django developers" group.

To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.

j4nu5

unread,

Apr 1, 2012, 5:02:04 AM4/1/12

to django-d...@googlegroups.com

Less than a week remains for student application deadline. Can someone please comment on the above revised proposal. Thanks a lot.

To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to django-developers+unsubscribe@googlegroups.com.

Russell Keith-Magee

unread,

Apr 1, 2012, 5:54:05 AM4/1/12

to django-d...@googlegroups.com

Hi Kushagra,

On the whole, I think this proposal is looking fairly good. You're high-level explanation of the problem is solid, and you've given enough detail of the direction you intend to take the project that it gives me some confidence that you understand what you're proposing to do.

I have a couple of small concerns:

* You aren't ever going to eat your own dogfood. You're spending the GSoC building an API that is intended for use with schema migration, but you're explicitly not looking at any part of the migration process that would actually use that API. How will we know that the API you build is actually fit for the purpose it is intended? How do we know that the requirements of "step 2" of schema migration will be met by your API? I'd almost prefer to see more depth, and less breadth -- i.e., show me a fully functioning schema migration stack on just one database, rather than a fully functioning API on all databases that hasn't actually been shown to work in practice.

* It feels like there's a lot of padding in your schedule.

- A week of discussion at the start
- 2 weeks for a "base" migration API
- 2.5 weeks to write documentation
- 2 "buffer" weeks

Your project is proposing the development of a low level database API. While this should certainly be documented, if it's not going to be "user facing", the documentation requirements aren't as high. Also, because it's a low level database API, I'm not sure what common tools will exist -- yet your schedule estimates 1/6 of your overall time, and 1/3 of your active coding time, will be spent building these common tools. Having 1/6 of your project schedule as contingency is very generous; and you don't mention what you plan to look at if you don't have to use that contingency.

* Your references to testing are a bit casual for my taste. From my experience, testing schema migration code is hard. Normal view code and utilities are easy to test -- you set up a test database, insert some data, and check functionality. However, schema migration code is explicitly about making database changes, so the thing that Django normally considers "static" -- the database models -- are subject to change, and that isn't always an easy thing to accommodate. I'd be interested to see your thoughts on how you plan to test your API.

* Your proposal doesn't make any reference to the existing "migration-like" tasks in Django's codebase. For example, we already have code for creating tables and adding indicies. How will your migration code use, modify or augment these existing capabilities?

Yours,
Russ Magee %-)

> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.

> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>
>

> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.

> To view this discussion on the web visit https://groups.google.com/d/msg/django-developers/-/nfJvnjObKKsJ.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.

j4nu5

unread,

Apr 2, 2012, 5:06:21 PM4/2/12

to django-d...@googlegroups.com

Hi Russell,

Thanks for the prompt reply.

* You aren't ever going to eat your own dogfood. You're spending the GSoC building an API that is intended for use with schema migration, but you're explicitly not looking at any part of the migration process that would actually use that API. How will we know that the API you build is actually fit for the purpose it is intended? How do we know that the requirements of "step 2" of schema migration will be met by your API? I'd almost prefer to see more depth, and less breadth -- i.e., show me a fully functioning schema migration stack on just one database, rather than a fully functioning API on all databases that hasn't actually been shown to work in practice.

'Eating my own dogfood' to check whether my low level migration primitives are actually *usable*, I believe can be done by:

1. Developing a working fork of South to use these primitives as I mentioned in my project goals, or
2. Aiming for less 'breadth' and more 'depth', as you suggested.

I did not opt for 2, since creating the '2nd level' of the migration framework (the caller of the lower level API) is a huge beast by itself. Any reasonable solution will have to take care of 'Pythonic' as well as 'pseudo-SQL' migrations as discussed above. Not to mention taking care of versioning + dependency management + backwards migrations. I am against the development of a half baked and/or inconsistent 2nd level API layer. Trying to fully develop such a solution even for one database will exceed the GSoC timeline, in my humble opinion.

* It feels like there's a lot of padding in your schedule.

- A week of discussion at the start
- 2 weeks for a "base" migration API
- 2.5 weeks to write documentation
- 2 "buffer" weeks

Your project is proposing the development of a low level database API. While this should certainly be documented, if it's not going to be "user facing", the documentation requirements aren't as high. Also, because it's a low level database API, I'm not sure what common tools will exist -- yet your schedule estimates 1/6 of your overall time, and 1/3 of your active coding time, will be spent building these common tools. Having 1/6 of your project schedule as contingency is very generous; and you don't mention what you plan to look at if you don't have to use that contingency.

I think the problem is that the 1st part - development of a lower level migrations API - is a little bit small for the GSoC timeline but the 2nd part - the caller of the API - is way big for GSoC. As I said, I did not want to create a half baked solution. Thats why the explicit skipping of 2nd level and thus the *padding*. I am still open for discussion and suggestions regarding this matter though.

* Your references to testing are a bit casual for my taste. From my experience, testing schema migration code is hard. Normal view code and utilities are easy to test -- you set up a test database, insert some data, and check functionality. However, schema migration code is explicitly about making database changes, so the thing that Django normally considers "static" -- the database models -- are subject to change, and that isn't always an easy thing to accommodate. I'd be interested to see your thoughts on how you plan to test your API.

On a high level, the testing code will have to check:

1. Whether the migration has been applied correctly.

2. Whether models are behaving the way they are supposed to after the migration.

The 1st part will involve checking non-related fields and related fields (ManyToMany, ForeignKey etc.). Checking non-related fields is relatively easy while checking related fields will involve checking for changes to appropriate constraints and as in the case with ManyToMany, whether the changes have *cascaded* properly to all the tables.

The 2nd part involves checking whether the models/fields affected by the migration, either directly or indirectly, are working/throwing errors the way they are supposed to.

* Your proposal doesn't make any reference to the existing "migration-like" tasks in Django's codebase. For example, we already have code for creating tables and adding indicies. How will your migration code use, modify or augment these existing capabilities?

The current code, for e.g. sql_create_model in django.db.backends.creation is a mix of *inspection* part and *sql generation* part. Our new layers of API will basically divide these tasks. The *sql generation* part will be handled by the dry-run mode of the new low level API while the inspection part is the responsibility of the higher levels. The refactored version of creation.py will basically make use of the higher level inspection part of the new API for inspection which in turn will call the lower level API for final sql generation.

I will update the testing and refactoring part in my final proposal. As for the timeline and the *depth* of my work, I am open for suggestions.

> To post to this group, send email to django-developers@googlegroups.com.
> To unsubscribe from this group, send email to django-developers+unsubscribe@googlegroups.com.

> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/django-developers/-/nfJvnjObKKsJ.

> To post to this group, send email to django-developers@googlegroups.com.
> To unsubscribe from this group, send email to django-developers+unsubscribe@googlegroups.com.

Russell Keith-Magee

unread,

Apr 2, 2012, 9:09:37 PM4/2/12

to django-d...@googlegroups.com

On 03/04/2012, at 5:06 AM, j4nu5 wrote:

> Hi Russell,
>
> Thanks for the prompt reply.
>
> * You aren't ever going to eat your own dogfood. You're spending the GSoC building an API that is intended for use with schema migration, but you're explicitly not looking at any part of the migration process that would actually use that API. How will we know that the API you build is actually fit for the purpose it is intended? How do we know that the requirements of "step 2" of schema migration will be met by your API? I'd almost prefer to see more depth, and less breadth -- i.e., show me a fully functioning schema migration stack on just one database, rather than a fully functioning API on all databases that hasn't actually been shown to work in practice.
>
> 'Eating my own dogfood' to check whether my low level migration primitives are actually *usable*, I believe can be done by:
> 1. Developing a working fork of South to use these primitives as I mentioned in my project goals, or
> 2. Aiming for less 'breadth' and more 'depth', as you suggested.
>
> I did not opt for 2, since creating the '2nd level' of the migration framework (the caller of the lower level API) is a huge beast by itself. Any reasonable solution will have to take care of 'Pythonic' as well as 'pseudo-SQL' migrations as discussed above. Not to mention taking care of versioning + dependency management + backwards migrations. I am against the development of a half baked and/or inconsistent 2nd level API layer. Trying to fully develop such a solution even for one database will exceed the GSoC timeline, in my humble opinion.

Ok - there's two problems with what you've said here:

1) You don't make any reference in your schedule to implementing a "working fork of South". This isn't a trivial activity, so if you're planning on doing this, you should tell use how this is factored into your schedule.

2) You're making the assumption that you need to "fully develop" a solution. A proof of concept would be more than adequate. For example, in the 2010 GSoC, Alex Gaynor's project was split into two bits; a bunch of modifications to the core query engine, and a completely separate project, not intended for merging to trunk, that demonstrated that his core query changes would do what was necessary. You could take exactly the same approach here; don't try to delivery a fully functioning schema migration tool, just enough of a tool to demonstrate that your API is sufficient.

> * It feels like there's a lot of padding in your schedule.
>
> - A week of discussion at the start
> - 2 weeks for a "base" migration API
> - 2.5 weeks to write documentation
> - 2 "buffer" weeks
>
> Your project is proposing the development of a low level database API. While this should certainly be documented, if it's not going to be "user facing", the documentation requirements aren't as high. Also, because it's a low level database API, I'm not sure what common tools will exist -- yet your schedule estimates 1/6 of your overall time, and 1/3 of your active coding time, will be spent building these common tools. Having 1/6 of your project schedule as contingency is very generous; and you don't mention what you plan to look at if you don't have to use that contingency.
>
> I think the problem is that the 1st part - development of a lower level migrations API - is a little bit small for the GSoC timeline but the 2nd part - the caller of the API - is way big for GSoC. As I said, I did not want to create a half baked solution. Thats why the explicit skipping of 2nd level and thus the *padding*. I am still open for discussion and suggestions regarding this matter though.

So, to summarize: What you're telling us is that you know, a-priori, that your project isn't 12 weeks of work. This doesn't give us a lot of incentive to pick up your proposal for the GSoC. We have an opportunity to get Google to pay for 12 weeks development. Given that we have that opportunity, why would we select a project that will only yield 6 weeks of output?

The goal here isn't to pick a project, and then make it fit 12 weeks by any means necessary. It's to pick something that will actually be 12 weeks of work. A little contingency is fine, but if you start padding too much, your proposal isn't going to be taken seriously.

My suggestion -- work out some small aspect of part 2 that you *can* deliver. Not necessarily the whole thing, but a skeleton, and try to delivery a fully fleshed out part on that skeleton. If you're smart about it, this can also double as your dogfood requirement.

> * Your references to testing are a bit casual for my taste. From my experience, testing schema migration code is hard. Normal view code and utilities are easy to test -- you set up a test database, insert some data, and check functionality. However, schema migration code is explicitly about making database changes, so the thing that Django normally considers "static" -- the database models -- are subject to change, and that isn't always an easy thing to accommodate. I'd be interested to see your thoughts on how you plan to test your API.
>
> On a high level, the testing code will have to check:
> 1. Whether the migration has been applied correctly.
> 2. Whether models are behaving the way they are supposed to after the migration.
>
> The 1st part will involve checking non-related fields and related fields (ManyToMany, ForeignKey etc.). Checking non-related fields is relatively easy while checking related fields will involve checking for changes to appropriate constraints and as in the case with ManyToMany, whether the changes have *cascaded* properly to all the tables.
>
> The 2nd part involves checking whether the models/fields affected by the migration, either directly or indirectly, are working/throwing errors the way they are supposed to.

I think you're missing my point.

If you're planning on running actual Django code to test the functionality of models, you're going to need two models -- the initial model before migration, and the end model after migration. How do you plan to accomodate the existence of both in Django's app cache?

> * Your proposal doesn't make any reference to the existing "migration-like" tasks in Django's codebase. For example, we already have code for creating tables and adding indicies. How will your migration code use, modify or augment these existing capabilities?
>
> The current code, for e.g. sql_create_model in django.db.backends.creation is a mix of *inspection* part and *sql generation* part. Our new layers of API will basically divide these tasks. The *sql generation* part will be handled by the dry-run mode of the new low level API while the inspection part is the responsibility of the higher levels. The refactored version of creation.py will basically make use of the higher level inspection part of the new API for inspection which in turn will call the lower level API for final sql generation.

None of this is mentioned in your proposal, or accounted for in your schedule. Seems like a large omission to me :-)

Yours
Russ Magee %-)

j4nu5

unread,

Apr 4, 2012, 11:50:26 AM4/4/12

to django-d...@googlegroups.com

Hi Russell,

Thanks for your immense patience :-)

These are some additions to my proposal above, based on your inputs:

Status of current 'creation' code in django:

The current code, for e.g. sql_create_model in
django.db.backends.creation is a mix of *inspection* part and *sql

generation* part. Since the sql generation part will (should) now be
handled by our new CRUD API, I will refactor
django.db.backends.creation (and other backends' creation modules) to
continue using their inspection part but using our new CRUD API for
sql generation. The approach will be to get the fields using

model._meta.local_fields and feeding them to our new CRUD API. This

will serve to be a proof of concept for my API.

As for testing using Django code, my models will be something like:
class UnchangedModel(models.Model):
eg = models.TextField()

if BEFORE_MIGRATION:
class MyModel(models.Model):
f1 = models.TextField()
f2 = models.TextField()
# Deletion of a field
else:
class MyModel(models.Model):
f1 = models.TextField()

The value of BEFORE_MIGRATION will be controlled by the testing code.

A temporary environment variable can be used for this purpose.

Also a revised schedule:
Bonding period before GSoC: Discussion on API design
Week 1 : Writing tests (using 2 part checks (checking the actual

database and using Django models), as discussed above)
Week 2 : Developing the base migration API
Week 3 : Developing extensions and overrides for PostgreSQL
Weeks 4-5 : Developing extensions and overrides for MySQL
Weeks 6-7 : Developing extensions and overrides for SQLite (may be shorter or

longer (by 0.5 week) depending on how much of xtrqt's code is
considered acceptable)

Weeks 8-10 : Refactoring django.db backends.creation (and the PostgreSQL,

MySQL, SQLite creation modules) to use the new API for
SQL generation (approach discussed above)
Week 11 : Writing documentaion and leftover tests, if any
Week 12 : Buffer week for the unexpected

Russell Keith-Magee

unread,

Apr 4, 2012, 7:18:45 PM4/4/12

to django-d...@googlegroups.com

On 04/04/2012, at 11:50 PM, j4nu5 wrote:

> Hi Russell,

> Thanks for your immense patience :-)
>
> These are some additions to my proposal above, based on your inputs:
> Status of current 'creation' code in django:
> The current code, for e.g. sql_create_model in
> django.db.backends.creation is a mix of *inspection* part and *sql
> generation* part. Since the sql generation part will (should) now be
> handled by our new CRUD API, I will refactor
> django.db.backends.creation (and other backends' creation modules) to
> continue using their inspection part but using our new CRUD API for
> sql generation. The approach will be to get the fields using
> model._meta.local_fields and feeding them to our new CRUD API. This
> will serve to be a proof of concept for my API.

Hrm - not exactly ideal, but better than nothing I suppose. Ideally, there would actually be some migration task involved in your proof of concept.

> As for testing using Django code, my models will be something like:
> class UnchangedModel(models.Model):
> eg = models.TextField()
>
> if BEFORE_MIGRATION:
> class MyModel(models.Model):
> f1 = models.TextField()
> f2 = models.TextField()
> # Deletion of a field
> else:
> class MyModel(models.Model):
> f1 = models.TextField()
>
> The value of BEFORE_MIGRATION will be controlled by the testing code.
> A temporary environment variable can be used for this purpose.

Unless your plan also includes writing a lot of extra code to purge and repopulate the app cache, this approach won't work. Just changing a setting doesn't change the class that has already been parsed and processed.

> Also a revised schedule:
> Bonding period before GSoC: Discussion on API design
> Week 1 : Writing tests (using 2 part checks (checking the actual
> database and using Django models), as discussed above)
> Week 2 : Developing the base migration API
> Week 3 : Developing extensions and overrides for PostgreSQL
> Weeks 4-5 : Developing extensions and overrides for MySQL
> Weeks 6-7 : Developing extensions and overrides for SQLite (may be shorter or
> longer (by 0.5 week) depending on how much of xtrqt's code is
> considered acceptable)
> Weeks 8-10 : Refactoring django.db backends.creation (and the PostgreSQL,
> MySQL, SQLite creation modules) to use the new API for
> SQL generation (approach discussed above)
> Week 11 : Writing documentaion and leftover tests, if any
> Week 12 : Buffer week for the unexpected
>

This looks a bit more convincing.

Yours,
Russ Magee %-)

Andrew Godwin

unread,

Apr 5, 2012, 11:55:19 AM4/5/12

to django-d...@googlegroups.com

Just thought I'd chime in now I've had a chance to look over the current
proposal (I looked at the current one you have in the GSOC system):

- When you describe feeding things in from local_fields, are you
referring to that being the method by which you're planning to implement
things like syncdb?

- I'd like to see a bit more detail about how you plan to test the
code - specifically, there are some backend-specific tests you may need,
as well as some detailed introspection in order to make sure things have
applied correctly.

- Russ is correct about your models approach - as I've said before in
other places, the models API in Django is not designed with models as
moveable, dynamic objects. South has one approach to these sorts of
tests, but I'd love to see a cleaner suggestion.

- There's been some discussion on south-users about the benefits of a
column-based alteration API versus a field/model-based alteration API -
why have you picked a column-based one? If you plan to continue using
Django fields as type information (as South does), what potential issues
do you see there?

- Some more detail on your background would be nice - what's your
specific experience with the 3 main databases you'll be handling
(postgres, mysql, sqlite)? What was a "high voltage database migration"?

Sorry for the late feedback, I've been far too busy.

Andrew

j4nu5

unread,

Apr 6, 2012, 1:34:52 AM4/6/12

to django-d...@googlegroups.com

On Thursday, 5 April 2012 21:25:19 UTC+5:30, Andrew Godwin wrote:

Just thought I'd chime in now I've had a chance to look over the current
proposal (I looked at the current one you have in the GSOC system):
- When you describe feeding things in from local_fields, are you
referring to that being the method by which you're planning to implement
things like syncdb?

Actually I am not planning to mess with syncdb and other management

commands. I will only refactor django.db.backends creation functions

like sql_create_model etc. to use the new API. Behaviour and functionality

will be the same after refactor, so management commands like syncdb

will not notice a difference.

- I'd like to see a bit more detail about how you plan to test the
code - specifically, there are some backend-specific tests you may need,
as well as some detailed introspection in order to make sure things have
applied correctly.

Currently, I can only think of things like the unique index on SQLite and

oddities in MySQL mostly again from South's test suite, I will give another

update before today's deadline.

- Russ is correct about your models approach - as I've said before in
other places, the models API in Django is not designed with models as
moveable, dynamic objects.

I have taken care of clearing the app cache, after migrations.

Actually the entire point of using these 'Django code' based tests is that I

wanted to doubly ensure that Django will behave the way its supposed to

after the migrations. I could have gone with a SQL only approach e.g. 'SELECT

table' after calling db.delete_table but using testing using Django code seemed

a bit more comprehensive.

Now, to mimic migrations, I needed to alter model definitions. The closest way

to resemble actual migration scenario seemed to be to change the definitions

in models.py itself. File rename/rewrite is ugly and OS dependent thats why I

used a 'temporary setting' based approach. I know that messing with app cache

looks a bit hackish but I cannot think of anything else for now.

South has one approach to these sorts of
tests, but I'd love to see a cleaner suggestion.

Are you referring to the fake orm? Well if you are satisfied with my above

explanation, there would be no need for it, since we will be using django's

orm.

- There's been some discussion on south-users about the benefits of a
column-based alteration API versus a field/model-based alteration API -
why have you picked a column-based one? If you plan to continue using
Django fields as type information (as South does), what potential issues
do you see there?

Well you said it yourself above that "the models API in Django is not

designed with models as moveable, dynamic objects". That is why I used

a column-based approach. The advantage will be felt in live migrations.

As for using Django fields for type information, I frankly cannot think of a

major valid negative point for now, I will revert later today.

- Some more detail on your background would be nice - what's your
specific experience with the 3 main databases you'll be handling
(postgres, mysql, sqlite)? What was a "high voltage database migration"?

Sure. I will update it.

Sorry for the late feedback, I've been far too busy.

No problem, as long as you reply to this before the deadline :D

j4nu5

unread,

Apr 6, 2012, 8:27:23 AM4/6/12

to django-d...@googlegroups.com

On Thursday, 5 April 2012 21:25:19 UTC+5:30, Andrew Godwin wrote:

If you plan to continue using
Django fields as type information (as South does), what potential issues
do you see there?

The only issue I can think of is the case of custom fields created by the user.

Andrew Godwin

unread,

Apr 6, 2012, 8:36:09 AM4/6/12

to django-d...@googlegroups.com

On 06/04/12 06:34, j4nu5 wrote:
> Actually I am not planning to mess with syncdb and other management
> commands. I will only refactor django.db.backends creation functions
> like sql_create_model etc. to use the new API. Behaviour and functionality
> will be the same after refactor, so management commands like syncdb
> will not notice a difference.

Alright, that's at least going to leave things in a good working state,
then.

> Currently, I can only think of things like the unique index on SQLite and
> oddities in MySQL mostly again from South's test suite, I will give another
> update before today's deadline.

There's a few other ones that South handles - like booleans in SQLite -
but a look through the codebase would hopefully give you hints to most
of those.

> Are you referring to the fake orm? Well if you are satisfied with my above
> explanation, there would be no need for it, since we will be using django's
> orm.

Well, the "fake ORM" is exactly what you described above - models loaded
and then cleared from the app cache. I'm not saying it's a bad thing -
it beats what South had before (nothing) - but there could be alternatives.

> Well you said it yourself above that "the models API in Django is not

> designed with models asmoveable, dynamic objects". That is why I used

> a column-based approach. The advantage will be felt in live migrations.
> As for using Django fields for type information, I frankly cannot think of a
> major valid negative point for now, I will revert later today.

> If you plan to continue using

> Django fields as type information (as South does), what potential issues
> do you see there?
>

> The only issue I can think of is the case of custom fields created by the user.

That's one big issue; one of South's biggest issues today is custom
fields, though that's arguably more the serialisation side of them.
Still, I'd at least like to see how you would want something like, say,
GeoDjango to fit in, even though this GSOC wouldn't cover it - it has a
lot of custom creation code, and alteration types that differ from
creation types (much like SERIAL in postgres, which you _will_ have to
address) and room would have to be made for these kinds of problems.

Andrew

Reply all

Reply to author

Forward