DB migration: Loading a massive fixture dump

200 views
Skip to first unread message

Fred Blaise

unread,
Sep 1, 2020, 6:21:19 PM9/1/20
to Django users
Hi,

I have to migrate a database from mysql to postgresql. Unfortunately, there is quite a bit of data in there.

As a strategy, I thought of using Django's ORM feature, dump the data from my setup using mysql, and load the fixtures on the other setup using postgresql, such as:

# on mysql
$ docker exec -ti madchap_ddojo_uwsgi_1 bash -c 'python manage.py dumpdata --indent=4 -e sessions -e admin --natural-foreign --natural-primary -e contenttypes -e auth.permission -v2' > ~/Downloads/datadump_orm_dump_1.json

This yields a 4GB json file.

Reload on the other side, such as:

# on postgresql
$  docker exec -ti madchap_ddojo_uwsgi_1 bash -c 'python manage.py loaddata datadump_orm_dump_small.json'

I let the loaddata run for 4hours, and it eventually returns with no error -- but no success either. I actually now think that the message indicating "Installed x object(s) from 1 fixture(s)" does not show, I will see again.

Looking at the postgresql side, querying metatables such as:

SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size(C.oid) DESC
  LIMIT 5;

tells me loads of data gets in there. a "\d+" does not agree, however (weird, but I don't know enough to get the difference between the two). And yet, I cannot see anything through any SELECT statement (hinting to the transaction isolation).

All things equal, this strategy works fine when working with small json files. I just successfully did it with a file less than 1MB.

I am guessing I have a few questions, I am stuck here.

1. Are there any hidden args/switches to manage.py loaddata or else that I could look at?
2. Are there specific django configuration that could help me load the data?
3. Does a loaddata actually result in a massive single transaction? (looks like so, sounds crazy to me :))
4. Is there a better way to migrate my data?

Thanks a lot for any pointers!
Cheers.

Mike Dewhirst

unread,
Sep 2, 2020, 12:24:37 AM9/2/20
to django...@googlegroups.com
loaddata is intended for small imports such as skeleton data for unit
testing


> 2. Are there specific django configuration that could help me load the
> data?
> 3. Does a loaddata actually result in a massive single transaction?
> (looks like so, sounds crazy to me :))

Probably does because that would be what you most likely want. But as
you have discovered it probably runs out of resources if the load is too
big.

> 4. Is there a better way to migrate my data?

Yes.

https://github.com/lanyrd/mysql-postgresql-converter/blob/master/db_converter.py

I used this successfully a few months ago. I had to convert it from
py2.7 to 3.x.

Briefly, you will have to dump MySQL again using a similar incantation
to this  ...

mysqldump --compatible=postgresql --default-character-set=utf8 -r
databasename.mysql -u root databasename

If you (or anyone) would like a copy of my adjusted db_converter.py,
drop me a line off-list.

Cheers

Mike

> Thanks a lot for any pointers!
> Cheers.
> --
> You received this message because you are subscribed to the Google
> Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-users...@googlegroups.com
> <mailto:django-users...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/d6f5e874-e96b-4984-b69d-d7857e95a0a4n%40googlegroups.com
> <https://groups.google.com/d/msgid/django-users/d6f5e874-e96b-4984-b69d-d7857e95a0a4n%40googlegroups.com?utm_medium=email&utm_source=footer>.


--
Signed email is an absolute defence against phishing. This email has
been signed with my private key. If you import my public key you can
automatically decrypt my signature and be sure it came from me. Just
ask and I'll send it to you. Your email software can handle signing.


signature.asc

Fred Blaise

unread,
Sep 2, 2020, 1:26:04 AM9/2/20
to django...@googlegroups.com
Hi Mike,

Thanks for the answer.

Yes, I was aware yet hopeful regarding the size.

Regarding the converter, I would be interested to take a look at your py3 work, if you could forward it along.

Thanks.


--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/d6f5e874-e96b-4984-b69d-d7857e95a0a4n%40googlegroups.com.

Fred Blaise

unread,
Sep 3, 2020, 7:35:11 AM9/3/20
to Django users
Just for the follow-up, I ended up using pgloader and after some argument tweaking, it worked. Cheers.
Reply all
Reply to author
Forward
0 new messages