Re: Django should load custom SQL when testing

407 views
Skip to first unread message

Anssi Kääriäinen

unread,
Jul 26, 2012, 12:11:36 PM7/26/12
to Django developers
On 26 heinä, 15:44, Danilo Bargen <gez...@gmail.com> wrote:
> As the Tickethttps://code.djangoproject.com/ticket/16550is closed, I will
> continue the discussion here on the mailing list.
>
> The reason for the ticket is that the Django test runner does not provide a
> way to run custom SQL. This makes sense when users want to load custom SQL
> *data*, but not when modifying the way the database works.
>
> Here's my comment from the issue comment thread:
>
> I agree very much with the comment above, in that there should be a way to
> execute custom SQL before and after running syncdb. This should not be used
> to load custom data, which is truncated anyways, but to alter the schema,
> create new types or load extensions.
>
> In my case, I have to load the Postgres citext extension:
>
>     CREATE EXTENSION IF NOT EXISTS citext;
>
> Because I have to create a database manually before running syncdb, I can
> run my custom SQL in my database and then use it normally with Django. But
> because the extensions are database specific, when running the tests (which
> create their own database) they fail because the extension is not present.
> This means that contrary to the comment by russellm, the *test database*
> introduces an inconsistency between the way tables work during a test case
> and the way they work in production.
>
> Another widely used extension is the hstore extension.
>
> As a workaround, I loaded the extension in the default postgres template.
>
>     psql -d template1 -c 'CREATE EXTENSION citext;'
>
> But that means that any new database I create will contain the citext
> extension, which might not be what I want.
> Therefore I suggest that you add some pre syncdb and post syncdb hooks to
> run custom SQL, and to mention in the docs that this should *not* be used
> to load custom data, but to create new types and alter the schema, so that
> it *matches the production database*. The data is truncated anyways.
>
> I vote for reopen.
>
> What do you think about this issue? Does anyone else agree that there
> should be a way to run custom SQL in order not to create inconsistencies
> between the test and production database?

I agree, the ability to run raw SQL on syncdb would be useful to me.

There is an idea for allowing post_sync SQL for models (see
http://groups.google.com/group/django-developers/browse_thread/thread/2a1b068f56faae75).
It is clear one might want to define something else than post-sync SQL
per model. The full set of hooks seems to be:
- per project (pre + post)
- per app (pre + post).
- per model (per + post)
I wonder if we want to support all of those...

The main gripe against any of the above is that once you start doing
schema migrations, then you need to define the raw SQL in multiple
places. Both in the app and in the migrations files. This raises the
question if migration files is the right place to do this to begin
with. The syntax could be something like:

class Migration:
apply_at = 'pre_sync'/'post_sync'/'migrate'

Where 'migrate' is the default and means the migration is only applied
on "manage.py migrate". pre_sync and post_sync would be applied on
syncdb, and also on migrate if they are not applied before to the DB.
The idea is that you could define the raw SQL only once, and it would
be applied both to testdb on sync, and to production db on migrate.

Still another idea: maybe we want to take the easy route and add
special "pre_sync/post_sync" initial SQL file names. Document that
these files should not contain data, just schema definition. Once we
get the migrations framework in, then deprecate the whole "initial SQL
in a directory" concept.

- Anssi

Andrei Antoukh

unread,
Jul 26, 2012, 12:35:44 PM7/26/12
to django-d...@googlegroups.com
2012/7/26 Anssi Kääriäinen <anssi.ka...@thl.fi>
I think all three are useful, but I think the most useful: per model and per project.
 

The main gripe against any of the above is that once you start doing
schema migrations, then you need to define the raw SQL in multiple
places. Both in the app and in the migrations files. This raises the
question if migration files is the right place to do this to begin
with. The syntax could be something like:

    class Migration:
        apply_at = 'pre_sync'/'post_sync'/'migrate'

Where 'migrate' is the default and means the migration is only applied
on "manage.py migrate". pre_sync and post_sync would be applied on
syncdb, and also on migrate if they are not applied before to the DB.
The idea is that you could define the raw SQL only once, and it would
be applied both to testdb on sync, and to production db on migrate.

Still another idea: maybe we want to take the easy route and add
special "pre_sync/post_sync" initial SQL file names. Document that
these files should not contain data, just schema definition. Once we
get the migrations framework in, then deprecate the whole "initial SQL
in a directory" concept.

Having the hooks as a method or function, I find it much more interesting than having them in files. Above that allows logic in these methods, which with flat files is not possible.
 

 - Anssi

Andrey. 

--
Andrei Antoukh - <ni...@niwi.be>
http://www.niwi.be/page/about/

"Linux is for people who hate Windows, BSD is for people who love UNIX"
"Social Engineer -> Because there is no patch for human stupidity"

Anssi Kääriäinen

unread,
Jul 26, 2012, 2:33:00 PM7/26/12
to Django developers
On 26 heinä, 19:35, Andrei Antoukh <n...@niwi.be> wrote:
> Having the hooks as a method or function, I find it much more interesting
> than having them in files. Above that allows logic in these methods, which
> with flat files is not possible.

I also like raw SQL in Python hooks better. An example why this is
preferred is that we will likely have settings configurable db_schema
one day in the future. How to use the correct schema in raw SQL files
when you don't control the db_schema? This could happen for example in
3rd party apps.

While I am likely making this feature request more complex than it
needs to be, I do think the interaction with migrations is an
interesting aspect.

For normal model definition changes the migrations framework spots the
changes, and you get the migrations for free. Thus, you need to define
the changes in just one place.

For raw SQL it is impossible to automatically create migrations based
on changes in the SQL. Maybe the answer to this is that you have to
define the migrations separately, and the current state separately.
Unfortunately this could lead to situations where you test on
different schema than what your production setup will be after
migrations. Thus, it would be nice to define the changes just once to
ensure you can't have state errors between production and testing.
And, as you can't do the "define just once" in the raw SQL hooks, it
must be the migrations which contain the just once setup.

- Anssi

Andrew Godwin

unread,
Jul 27, 2012, 5:01:13 AM7/27/12
to django-d...@googlegroups.com
On 26/07/12 19:33, Anssi K��ri�inen wrote:
Chiming in here on the migration front, my proposal for the next-gen
migration solution that'll potentially land in Django includes support
for just running chunks of raw SQL, so there's potential to auto-include
those in an initial migration.

However, as you quite rightly point out, detecting changes is a pain.
Still, South sends the post_syncdb signal after model creation anyway,
so there's potential here for people who are lazy and don't want to port
their SQL across to migrations to have it Just Work. (and then Just Fail
when they want to change it). It should be good enough until I get the
migration stuff in, I'd say.

Andrew

Anssi Kääriäinen

unread,
Jul 27, 2012, 7:16:19 AM7/27/12
to Django developers
On 27 heinä, 12:01, Andrew Godwin <and...@aeracode.org> wrote:
> Chiming in here on the migration front, my proposal for the next-gen
> migration solution that'll potentially land in Django includes support
> for just running chunks of raw SQL, so there's potential to auto-include
> those in an initial migration.
>
> However, as you quite rightly point out, detecting changes is a pain.
> Still, South sends the post_syncdb signal after model creation anyway,
> so there's potential here for people who are lazy and don't want to port
> their SQL across to migrations to have it Just Work. (and then Just Fail
> when they want to change it). It should be good enough until I get the
> migration stuff in, I'd say.

The problem with post_syncdb is that it is fired also after db flush.
Thus you get the SQL applied after every transactional test case, and
that means you can't have DDL commands in the post sync SQL. This
should be easy enough to fix by introducing a new kwarg which tells
when the signal is sent (flush, sync, migrate).

In addition we could add post_sync without sender for whole project,
and post_sync with an App as sender when the app-loading refactor gets
in.

It seems we would also need a pre-sync signal, too.

The only downside I can see is that this way you can't show the SQL as
part of sqlall. Maybe we should just forget about this limitation, and
continue with the signals.

Once the migrations get in, we can make sure we have a nice framework
for initial SQL, and then recommend not using the sync signals at all.

So, maybe this would work:
- add a new signal for pre sync.
- add a new kwarg to pre/post sync which tells if the signal is
fired on flush or on sync.
- add pre/post sync for the whole project.
- add pre/post sync for apps once the app-loading refactor gets in.

- Anssi

Andrei Antoukh

unread,
Jul 29, 2012, 10:02:53 AM7/29/12
to django-d...@googlegroups.com
2012/7/27 Anssi Kääriäinen <anssi.ka...@thl.fi>
On 27 heinä, 12:01, Andrew Godwin <and...@aeracode.org> wrote:
> Chiming in here on the migration front, my proposal for the next-gen
> migration solution that'll potentially land in Django includes support
> for just running chunks of raw SQL, so there's potential to auto-include
> those in an initial migration.
>
> However, as you quite rightly point out, detecting changes is a pain.
> Still, South sends the post_syncdb signal after model creation anyway,
> so there's potential here for people who are lazy and don't want to port
> their SQL across to migrations to have it Just Work. (and then Just Fail
> when they want to change it). It should be good enough until I get the
> migration stuff in, I'd say.

The problem with post_syncdb is that it is fired also after db flush.
Thus you get the SQL applied after every transactional test case, and
that means you can't have DDL commands in the post sync SQL. This
should be easy enough to fix by introducing a new kwarg which tells
when the signal is sent (flush, sync, migrate).

In addition we could add post_sync without sender for whole project,
and post_sync with an App as sender when the app-loading refactor gets
in.

I hate to use the signal post_syncdb for everything (project, app, model), I think it is to use a feature for more things you should.


It seems we would also need a pre-sync signal, too.
 
OK
 

The only downside I can see is that this way you can't show the SQL as
part of sqlall. Maybe we should just forget about this limitation, and
continue with the signals.

The sql that runs on the handlers of signals, is the only one who can not see part of the output of sqlall. But we can provide alternatives. See: https://gist.github.com/3198204
 

Once the migrations get in, we can make sure we have a nice framework
for initial SQL, and then recommend not using the sync signals at all.

So, maybe this would work:
  - add a new signal for pre sync.
  - add a new kwarg to pre/post sync which tells if the signal is
fired on flush or on sync.
  - add pre/post sync for the whole project.
  - add pre/post sync for apps once the app-loading refactor gets in.

 - Anssi

Now in my spare time I am working on a prototype. I hope suggestions and improvements!
I think they should make a small refactor on the order of execution of the command "syncdb". And I have doubts as to how to deal with the signals for this. 

Andrey.

 
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Anssi Kääriäinen

unread,
Aug 9, 2012, 4:24:25 PM8/9/12
to Django developers
Coming back to this: I think making the pre/post sync signals usable
for this seems like a good enough approach until migrations framework
gives us a cleaner way for this.

The problem is easiest to see in a custom HSTORE field. There is need
to issue a "create extension hstore" before any model is installed.
Currently there doesn't seem to be any clear way to do this, working
for testing and production databases alike. In addition there might be
need for creating a GIST index for that field, after the model is
created, and again my understanding is that this is hard if not
impossible to do.

So, what we need is a way to run pre/post-project-sync SQL and pre-
post-model-sync SQL. App sync doesn't need to be handled at the
moment. The pre/post signals gives us a crude way to do this, but this
is still a lot better than what we have now.

Clean slate would IMHO call for three signal sets: pre/post_sync, pre/
post_flush and pre/post_project_sync. For backwards compatibility
reasons the pre/post_flush signals need to go together with pre/
post_sync. I think you are right that throwing the project signal in
to the same mix (and later on migrate and app signals, too) is a
little ugly. Go with what you feel works best.

Sorry for ping-ponging this issue, this is just my design style: try a
lot of things and see what works.

- Anssi

On 29 heinä, 17:02, Andrei Antoukh <n...@niwi.be> wrote:
> 2012/7/27 Anssi Kääriäinen <anssi.kaariai...@thl.fi>
> Andrei Antoukh - <n...@niwi.be>http://www.niwi.be/page/about/http://www.kaleidos.net/A5694F/
Reply all
Reply to author
Forward
0 new messages