dump and restore tool for django

8 views
Skip to first unread message

limodou

unread,
Jan 18, 2007, 7:58:22 PM1/18/07
to django...@googlegroups.com
I'v finished a tool for dumping and restoring database for django, you
can find it at:
http://code.djangoproject.com/wiki/DbDumpScript

--
I like python!
UliPad <<The Python Editor>>: http://wiki.woodpecker.org.cn/moin/UliPad
My Blog: http://www.donews.net/limodou

rikl...@gmail.com

unread,
Jan 19, 2007, 9:26:23 AM1/19/07
to Django users
I was looking for something like this :) I'll test it today :)

Russell Keith-Magee

unread,
Jan 20, 2007, 3:54:30 AM1/20/07
to django...@googlegroups.com
On 1/19/07, rikl...@gmail.com <rikl...@gmail.com> wrote:
>
> I was looking for something like this :) I'll test it today :)

You should also keep in mind that the fixtures patch that is part of
ticket #2333 can be used to dump and reload database contents. This
patch should be applied shortly (it has been slated for inclusion
before v1.0)

Yours,
Russ Magee %-)

limodou

unread,
Jan 20, 2007, 4:15:01 AM1/20/07
to django...@googlegroups.com
Yeah, I of cause see your suggestion, but I found the process is not
very correctly and enough. for example: I cannot specify the output
directory. And if there is foreignkey relationship in one app, the
order of output data is very important. Because the restore process
will use model.objects.create to insert a new record, so if the
foreign object is not created yet, the inserting will be failed. And
what I do is using cursor.execute, so there are no these problem. And
also my tool can support the restore after you changed the model, and
I think is handy for me.

And I also want to proposal that manage.py should has a plugin
interface, so that user can contribute their own useful tool, and this
tool can be invoked through:

manage tool customtool

Russell Keith-Magee

unread,
Jan 20, 2007, 5:19:39 AM1/20/07
to django...@googlegroups.com
On 1/20/07, limodou <lim...@gmail.com> wrote:
>
> On 1/20/07, Russell Keith-Magee <freakb...@gmail.com> wrote:
> >
> > On 1/19/07, rikl...@gmail.com <rikl...@gmail.com> wrote:
> > >
> > > I was looking for something like this :) I'll test it today :)
> >
> > You should also keep in mind that the fixtures patch that is part of
> > ticket #2333 can be used to dump and reload database contents. This
> > patch should be applied shortly (it has been slated for inclusion
> > before v1.0)
> >
> Yeah, I of cause see your suggestion, but I found the process is not
> very correctly and enough. for example: I cannot specify the output
> directory.

./manage.py dumpdb > /foo/bar/out.data

dumpdb (this will be changed to 'dumpdata' in the final patch) dumps
to stdout; you can put that data wherever you want.

> And if there is foreignkey relationship in one app, the
> order of output data is very important. Because the restore process
> will use model.objects.create to insert a new record, so if the
> foreign object is not created yet, the inserting will be failed.

If you create objects and foreign key relations in a single
transaction, there shouldn't be any issue with object creation order.

> And
> what I do is using cursor.execute, so there are no these problem. And
> also my tool can support the restore after you changed the model, and
> I think is handy for me.

How does the patch in #2333 prevent you from doing this? Dump your
data, change your model, modify your data to add new required fields,
load your data.

> And I also want to proposal that manage.py should has a plugin
> interface, so that user can contribute their own useful tool, and this
> tool can be invoked through:
>
> manage tool customtool

There is no need for this. If you need to write your own tools, they
can be .py scripts, invoking whatever Django calls are required.
Manage.py doesn't do anything magical that couldn't be easily
replicated in your own script.

Yours,
Russ Magee %-)

limodou

unread,
Jan 20, 2007, 5:36:39 AM1/20/07
to django...@googlegroups.com
> ./manage.py dumpdb > /foo/bar/out.data

I don't like this. Because I want to save the data into different
files according to each table, but not a single file.

> > And if there is foreignkey relationship in one app, the
> > order of output data is very important. Because the restore process
> > will use model.objects.create to insert a new record, so if the
> > foreign object is not created yet, the inserting will be failed.
>
> If you create objects and foreign key relations in a single
> transaction, there shouldn't be any issue with object creation order.

Because the idea above, so it's not very suit for my requirement.

>
> > And
> > what I do is using cursor.execute, so there are no these problem. And
> > also my tool can support the restore after you changed the model, and
> > I think is handy for me.
>
> How does the patch in #2333 prevent you from doing this? Dump your
> data, change your model, modify your data to add new required fields,
> load your data.

I don't want to modify every line of data, but I need a default value
dealing appoach. And #2333 doesn't support this idea. I didn't try
#2333 for: if I remove some fields of a Model, and restore data from
data file, how about #2333 will do?

>
> > And I also want to proposal that manage.py should has a plugin
> > interface, so that user can contribute their own useful tool, and this
> > tool can be invoked through:
> >
> > manage tool customtool
>
> There is no need for this. If you need to write your own tools, they
> can be .py scripts, invoking whatever Django calls are required.
> Manage.py doesn't do anything magical that couldn't be easily
> replicated in your own script.
>

So your thought is very different from mine. I think core.management
has many common methods, and users like me hope to reuse them. If
there is no such plugin system, so I had to write my personal tool
just like db_dump.py, and it'll be a whole program, and I need
introduce many things I need from core.management which I think
useful.

Russell Keith-Magee

unread,
Jan 21, 2007, 8:18:44 PM1/21/07
to django...@googlegroups.com
On 1/20/07, limodou <lim...@gmail.com> wrote:
>
> > ./manage.py dumpdb > /foo/bar/out.data
>
> I don't like this. Because I want to save the data into different
> files according to each table, but not a single file.

#2333 doesn't do per-table, but it does do per-app:

./manage.py dumpdb firstapp > firstapp.data
./manage.py dumpbdb secondapp thirdapp > otherapps.data

Table-level doesn't strike me as a particularly good level for
serialization granularity, specifically because of cross application
dependencies.

> > If you create objects and foreign key relations in a single
> > transaction, there shouldn't be any issue with object creation order.
>
> Because the idea above, so it's not very suit for my requirement.

When you load data using #2333, what you provide is a label; Django
then searches each application for a data file matching that label. As
a result, you can have multiple data files with the same label, with
each application having a fixture directory to store the fixture data
files.

The transaction that is used is per-label, so all the 'initial_data'
fixtures for each app are loaded as a single transaction, eliminating
cross-app problems.

> I don't want to modify every line of data, but I need a default value
> dealing appoach. And #2333 doesn't support this idea. I didn't try
> #2333 for: if I remove some fields of a Model, and restore data from
> data file, how about #2333 will do?

I haven't tried - but either problem (default values for new fields,
and ignoring old fields) should be a relatively easy modification to
the existing serialization code.

> So your thought is very different from mine. I think core.management
> has many common methods, and users like me hope to reuse them. If
> there is no such plugin system, so I had to write my personal tool
> just like db_dump.py, and it'll be a whole program, and I need
> introduce many things I need from core.management which I think
> useful.

manage.py has useful methods? Then use them.

from django.core import management

print management.get_sql_all()

No need to add an extension API to manage.py.

Yours,
Russ Magee %-)

Rob Hudson

unread,
Jan 22, 2007, 12:14:23 PM1/22/07
to Django users
I've looked at the dump code in #2333 and I'm looking forward to this
being added. I brought up a question before about being able to dump
data in a database agnostic way. It's interesting how you simply use
the serializers. And JSON for data is an interesting choice too. It
looks like it works well.

This will be helpful if you want to switch database -- say from MySQL
to SQLite -- for some reason. Just change settings.py, build the
tables via syncdb, and reload the data. Simple!

For us it will also help backup/archive our projects that is more
flexible than just a straight mysqldump.

Nice work. (And I haven't even looked at the fixtures part of it yet.)

-Rob

Chris Brand

unread,
Jan 22, 2007, 12:39:51 PM1/22/07
to django...@googlegroups.com
I have the feeling that this is something that should be straightforward,
but I can't see how to do it.

I have an app with Applications, each of which is for a single Camp.
What I want to do is retrieve all the Camps for which there exists an
Application.

Thanks,

Chris

James Bennett

unread,
Jan 22, 2007, 1:04:54 PM1/22/07
to django...@googlegroups.com
On 1/22/07, Chris Brand <cbr...@redback.com> wrote:
> I have an app with Applications, each of which is for a single Camp.
> What I want to do is retrieve all the Camps for which there exists an
> Application.

The straightforward way is to retrieve a list of distinct values for
'camp_id' from the Application table, and then grab the corresponding
Camp objects for that list. In SQL the query would look something like
this (allowing for DB variations):

SELECT * FROM camps where id in (SELECT DISTINCT camp_id FROM application);

You might be able to do this with the 'extra' method of a QuerySet, by
putting the 'where' clause with subselect into the 'where' keyword
argument to extra(), but I haven't tested that to be certain.

--
"May the forces of evil become confused on the way to your house."
-- George Carlin

Tim Chase

unread,
Jan 22, 2007, 1:15:35 PM1/22/07
to django...@googlegroups.com
> SELECT * FROM camps where id in (SELECT DISTINCT camp_id FROM application);

Computing the DISTINCT portion here may be superfluous, and
possibly (depending on your DB) a premature mis-optimization.
Whether DISTINCT or not, membership via IN will still behave the
same, but DISTINCT will require a pass to determine
distinct-ness. Without the DISTINCT keyword, the DB should be
able to just do a hash-lookup for each ID in its Camps set to
test membership.

Or, alternatively, one could do something like

SELECT c.* FROM camps c INNER JOIN application a ON c.id = a.camp_id

which might also be another way to let the optimizer take a crack
at doing it well.

YMMV, so be sure to use your DB's EXPLAIN abilities to disclose
the nitty-gritty details and choose what works best.

-tim

Russell Keith-Magee

unread,
Jan 22, 2007, 7:07:47 PM1/22/07
to django...@googlegroups.com
On 1/23/07, Rob Hudson <trebor...@gmail.com> wrote:
>
> I've looked at the dump code in #2333 and I'm looking forward to this
> being added. I brought up a question before about being able to dump
> data in a database agnostic way. It's interesting how you simply use
> the serializers. And JSON for data is an interesting choice too. It
> looks like it works well.

Just to clarify - JSON is the default format, but you can use any
other supported serializer format. At the moment, the only alternative
serializer that has been implemented is the XML serializer, but others
could be defined.

> Nice work. (And I haven't even looked at the fixtures part of it yet.)

Glad you like it. I'm hoping to get some time in the next week or so
to finish off the few things that are needed before I bug Adrian for
permission to commit. In the meantime, if you have any suggestions or
comments, let me know.

Yours,
Russ Magee %-)

James Bennett

unread,
Jan 23, 2007, 11:05:17 AM1/23/07
to django...@googlegroups.com
On 1/22/07, Tim Chase <django...@tim.thechases.com> wrote:
> Or, alternatively, one could do something like
>
> SELECT c.* FROM camps c INNER JOIN application a ON c.id = a.camp_id
>
> which might also be another way to let the optimizer take a crack
> at doing it well.

Yeah, though AFAIK the 'extra' method on QuerySets can't specify an
inner join, so you're stuck executing the query via the cursor
instead. It's a tradeoff, really; if you need to squeeze every ounce
of optimization out of the DB, go with that. Otherwise, doing the 'id
in' with subselect is probably a little bit easier for the end user of
Django.

Rob Hudson

unread,
Jan 23, 2007, 1:24:44 PM1/23/07
to django...@googlegroups.com
Russell Keith-Magee wrote:
> Glad you like it. I'm hoping to get some time in the next week or so
> to finish off the few things that are needed before I bug Adrian for
> permission to commit. In the meantime, if you have any suggestions or
> comments, let me know.

Maybe I could just throw in another use case and we can see if the
dump/restore facility will work in this way too...

Right now I'm working on a project that we're developing a new data
model with. The content team is already starting to input data.
Occasionally I find a need to add a new database column to support
something new. This is your typical schema evolution need.

If I could dump the current data, run an sqlreset, and reload the data,
that would be amazingly helpful. (At least until schema evolution comes
online or even as an alternative approach after it's available.)

Would/Could it work like this?

Is the dump/restore facility tied to fixtures? I see a dumpdb option in
your patch but not a reloaddb (or the like)... just a 'installfixture'.

-Rob

Russell Keith-Magee

unread,
Jan 23, 2007, 5:27:30 PM1/23/07
to django...@googlegroups.com
On 1/24/07, Rob Hudson <trebor...@gmail.com> wrote:
> Is the dump/restore facility tied to fixtures? I see a dumpdb option in
> your patch but not a reloaddb (or the like)... just a 'installfixture'.

If you follow the discussion when I revealed the patch, you will see
that one of the suggestions was to rename installfixture to loaddata,
and dumpdb to dumpdata - this was specifically to reinforce the
symmetry between the two commands.

Using the fixtures as a 'schema evolution lite' was also suggested.
The #2333 patch doesn't address the question of filling in new fields
and removing old fields as part of an evolution step, but the
potential is certainly there.

Yours,
Russ Magee %-)

limodou

unread,
Jan 23, 2007, 7:56:55 PM1/23/07
to django...@googlegroups.com
On 1/24/07, Rob Hudson <trebor...@gmail.com> wrote:
>
Your case is just I mentioned before, and for simple situations,
db_dump.py tool can automatically add new column with default value
defined in Model, and if there is not a default value defined in
Model, you can change the data file, and add 'default' key to specify
the default value of a new column, or the new column is renamed from
an old column , you can add 'reference' type for new column, then the
value of new column will use the value of referred column. Details you
can find in wiki page.
Reply all
Reply to author
Forward
0 new messages