Backup from PROD and restore to STAGE database not working

722 views
Skip to first unread message

Alex Leith

unread,
Feb 22, 2016, 5:52:43 PM2/22/16
to Wagtail support
Hi All

I've got a fresh new Wagtail system working, and I'm hoping to set up a nice backup system, including being able to restore to a staging database (as well as prod).

So, I have a production instance and a staging instance, both running 1.9.2 of django and wagtail 1.3.1. I'm using Postgres 9.3 and as the db.

So, I'm backing up two ways now. One uses vanilla postgres (pgdump) and the other uses django (manage.py dumpdata).

I can restore to the database using postgres as follows:

pg_restore --clean --create -h database.url -U website /tmp/restore/website_db_backup.dump


and that works fine, from postgres' point of view. But django/wagtail doesn't like it and errors as attached in pg_restore_error.txt result.

I have used the 'dumpdata' command with all combinations of the flags --natural-foreign and --natural-primary and attempted restoring each of the output .json files using loaddata. Only the dump using both flags restores cleanly, with the output:

Installed 428 object(s) from 1 fixture(s)


(The other dumps complain about duplicate keys and such.)

Visiting the site's root page, the /admin page or the /django-admin page after restoring all result in 'Internal Server Error' and the logs show exactly the same output as the pg_restore_error.txt attached.

I'm at a loss, and would appreciate any assistance that could be provided.

Alex
pg_restore_error.txt

Brett Grace

unread,
Feb 22, 2016, 8:03:07 PM2/22/16
to Wagtail support
Thanks for the stack trace. It looks like Wagtail is trying to reverse the Site from the URL and finding more than one matching. Wagtail's schema is pretty portable, but the hostname of the machine you are running on is significant, so you can get different behavior from machine to machine. If it can't find a Site which matches the domain name in the request (which it probably won't if you are going from a prod to staging machine), then it will fall back to  Site.objects.get(is_default_site=True). There aren't any database constraints which enforce uniqueness so you could easily have two for the same domain name, or even two that have is_default_site set to True. I would examine those to see if you have anything that would result in more than one Site being returned.

Alex Leith

unread,
Feb 22, 2016, 8:07:19 PM2/22/16
to Wagtail support
Thanks Brett

So our production instance has only one site, which is 'www.example.com' and the staging one is hosted at 'www.stage.example.com'.

The production instance's only site is set as the default site.

When you say 'I would examine those' do you mean in the database?

Brett Grace

unread,
Feb 23, 2016, 10:48:06 AM2/23/16
to Wagtail support
To be clear, when I write Site I'm talking about the wagtailcore.models.Site Django model, not "website". So when I say, "examine those," I mean the wagtailcore_sites table in the database (you can do this with psql, the django-admin area, or of the django console).

Take a look at the stack trace you posted, in particular the highlighted line:

 File "/usr/local/lib/python3.4/dist-packages/wagtail/wagtailcore/models.py", line 132, in find_for_request
 return Site.objects.get(is_default_site=True)
 File "/usr/local/lib/python3.4/dist-packages/django/db/models/manager.py", line 122, in manager_method
 
return getattr(self.get_queryset(), name)(*args, **kwargs)
 
File "/usr/local/lib/python3.4/dist-packages/django/db/models/query.py", line 391, in get
 
(self.model._meta.object_name, num)
wagtail
.wagtailcore.models.MultipleObjectsReturned: get() returned more than one Site -- it returned 2!

If I had to place a bet, I would wager that if you connect to the database with psql, the following query would return more than one row:

select * from wagtailcore_site where is_default_site is true;

If I'm wrong, well, you'll have to trace through the code to Site.find_for_request (https://github.com/torchbox/wagtail/blob/master/wagtail/wagtailcore/models.py#L102) to see what's going wrong here.

Alex Leith

unread,
Feb 23, 2016, 7:02:49 PM2/23/16
to Wagtail support
Ok, thanks Brett.

So everything you said was true.

There were two sites set to default (that was clear in the error message) and if I set the old one (stage.website.com) to 'is_default_site='f'' then it works.

So my restore process will be

psql -h stage.example.com -U website -c "update wagtailcore_site set is_default_site = 'f';"

python3 manage.py loaddata /tmp/restore/website_db_backup_nfp.json


I think this makes sense! It does seem to work.

Thanks again, Brett.

Brett Grace

unread,
Feb 24, 2016, 11:09:29 AM2/24/16
to Wagtail support
Glad it worked. My preferred solution would be to set is_default_site to False on that Site within your production database. Then you only have to do this once and your backups will be usable immediately. I would argue that having only one default site is a kind of invariant that just happens to not be enforced in the data model, so this is just a bug waiting to happen. You can do this through the admin so it should be pretty easy.

A caveat to the point above about not needing to tweak the database: if you restore directly from prod.example.com to staging.example.com, you might still want to change the domain for the Site object to match the new deployment, because it is occasionally used to generate some URLs, but in practice most things work fine. (I think it's a minor error that the domain name for each Site is stored in the database, it's really part of the environment and it should be configured at runtime.)

Brett Grace

unread,
Feb 24, 2016, 11:26:19 AM2/24/16
to Wagtail support
Oh, also, not to belabor the point, I think that the database dumps make the most sense as your primary backup strategy. I would be leery of relying on a backup solution based on serialization (dumpdata/loaddata). Eventually something isn't going to serialize or deserialize correctly or there will be an external dependency on some key.

If you're on a Debian/Ubuntu system, you can install the autopostgresqlbackup package.

Another thing to think about in your backup strategy, if you aren't already, is your media directory. For the backup to be strictly correct you need a snapshot of it from the same moment as your database snapshot.

Alex Leith

unread,
Feb 24, 2016, 4:40:24 PM2/24/16
to Wagtail support
Thanks again, Brett.

Strangely, in the production system, there is only one site. And that is obviously the default. On restore, it seems to merge the two systems together, which is a little concerning.

And regarding backups, firstly, I am backing up the /media folder at the same time, and restoring that (that's easy!). But on PG native backup, I started out doing that, but ran into little issues with restoring, and the need to drop the database first (I think). Perhaps I'll spend a bit more time on that to get it working properly. Seems like I need to kick all the users out, drop the database, then restore it again. Sound reasonable?

Finally, the autoPGbackup script looks nice, but I'm using duplicity, which gives me point-in-time restore (for both media and any DB dumps), so I'm happy with a vanilla PG backup.

Brett Grace

unread,
Feb 25, 2016, 11:53:21 AM2/25/16
to Wagtail support
Hmm... do you get the same result (an additional Site) if you drop the database from your staging database first? At first glance it looks like your restore command should be recreating any objects but maybe something else is going on here. If you figure out the cause of this I'm really interested.

As for the rest, it seems like you've got it covered. I agree that the issue with connected users is a pain for restoring the database... usually I can sidestep it because I'm either working with my personal dev environment or in a staging environment from a freshly spun up VM. duplicity looks pretty cool, I hadn't heard of it before.

Alex Leith

unread,
Feb 25, 2016, 6:15:49 PM2/25/16
to Wagtail support
Ok, so I can do a simple postgres restore now. Process is as follows:

psql -h website-rds.internal.stage.example.com -U website -c 'SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pg_stat_activity.datname = current_database() AND pid <> pg_backend_pid();'
dropdb -h website-rds.internal.stage.example.com -U website website

createdb -h website-rds.internal.stage.example.com -U website -T template0 website
pg_restore -h website-rds.internal.stage.example.com -U website -d website /tmp/restore/website_db_backup.dump

this throws up one error and a couple of warnings, but they seem ok to me:

pg_restore: [archiver (db)] Error while PROCESSING TOC:

pg_restore: [archiver (db)] Error from TOC entry 3394; 0 0 COMMENT EXTENSION plpgsql 

pg_restore: [archiver (db)] could not execute query: ERROR:  must be owner of extension plpgsql

    Command was: COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language';


pg_restore: WARNING:  no privileges could be revoked for "public"

pg_restore: WARNING:  no privileges could be revoked for "public"

pg_restore: WARNING:  no privileges were granted for "public"

pg_restore: WARNING:  no privileges were granted for "public"

WARNING: errors ignored on restore: 1


Now, the result is that the ADMIN interface works, but none of the pages do. Loading the / page results in the attached errors.

It does only have one Site, though, so there's that!

Regards,

Alex
pg_restore_results.txt

Alex Leith

unread,
Feb 25, 2016, 6:24:29 PM2/25/16
to Wagtail support
Actually, I think it does work. There may have been something wrong with the server. I redeployed the server and now the pages and the admin interface work. So I'll work with restoring the entire thing using postgres.

Brett Grace

unread,
Feb 28, 2016, 12:14:07 PM2/28/16
to Wagtail support
Thanks for the update. Did dropping the database make the difference? I always do this, more or less out of superstition, yet it seems like it should to cleanly restore a database in one command without leaving old objects behind.

Alex Leith

unread,
Feb 28, 2016, 5:04:55 PM2/28/16
to Wagtail support
Yeah, dropping the database did make a difference.

Using Postgres commands, dropping and restoring the database _just works_.

Using the django restore, for whatever reason, results in there being two sites, with both set as default. Since django/wagtail doesn't work with two default sites, perhaps there should be a database constraint on there? I haven't run through enough scenarios to be certain about the cause of two sites in the restored database, though, really. I wonder if the old and new systems get merged, rather than overwritten (I expected it to be overwritten)?

So, dropping and re-creating the database using standard postgres sounds safer to me, even though it is 4 commands rather than 1 (or 2 including the query to remove the old default site) using django tools.

Tim Heap

unread,
Mar 2, 2016, 11:14:42 PM3/2/16
to wag...@googlegroups.com
The cause is likely due to two things: a default site and page structure is created as part of model migrations, and loaddata does not clear the database out before running. If you do a series of steps similar to:

# On source system
./manage.py dumpdata > data.json

# On destination system
./manage.py migrate
./manage.py loaddata data.json

Then you would have two sites - one from each system, and each set as default. You would also have two page trees, although I have no idea what kind of issues that would cause! In addition, there is at least one known bug when using loaddata with StreamFields.

Dropping and recreating the database, and then restoring from a Postgres dump is much more error proof than the Django dumpdata/loaddata commands.

--
You received this message because you are subscribed to the Google Groups "Wagtail support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wagtail+u...@googlegroups.com.
To post to this group, send email to wag...@googlegroups.com.
Visit this group at https://groups.google.com/group/wagtail.
To view this discussion on the web, visit https://groups.google.com/d/msgid/wagtail/88578ad1-7043-4c6b-b09c-641d138e8901%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Alex Leith

unread,
Mar 4, 2016, 7:21:57 PM3/4/16
to Wagtail support
Thanks Tim, good to know.
Reply all
Reply to author
Forward
0 new messages