On database backups

52 views
Skip to first unread message

Thadeus Burgess

unread,
Dec 7, 2009, 4:52:23 PM12/7/09
to web...@googlegroups.com
What would the preferred method of data backup be?

* Use web2py cron, using export_to_csv file, and then tarfile to
compress and then save the file in its correct location, all from
within web2py?
* Use the backend database built-in backup utilities (such as postgres
pg-dump), using cron to tar.gz the files and put them in their
respective locations.
* Use a filesystem level backup, backing up the actual database files on disk.
* Use online backup with PITR (as noted in the postgres manual:
http://www.postgresql.org/docs/8.1/static/backup-online.html)

-Thadeus

mdipierro

unread,
Dec 7, 2009, 7:47:33 PM12/7/09
to web2py-users
I suggest

1) Use the backend database built-in backup utilities (such as
postgres
pg-dump), using cron to tar.gz the files and put them in their
respective locations.

if available else:

2) Use web2py cron, using export_to_csv file, and then tarfile to
compress and then save the file in its correct location, all from
within web2py

Thadeus Burgess

unread,
Dec 7, 2009, 8:37:05 PM12/7/09
to web...@googlegroups.com
why 1 over 2?

-Thadeus
> --
>
> You received this message because you are subscribed to the Google Groups "web2py-users" group.
> To post to this group, send email to web...@googlegroups.com.
> To unsubscribe from this group, send email to web2py+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/web2py?hl=en.
>
>
>

Yarko Tymciurak

unread,
Dec 7, 2009, 8:50:25 PM12/7/09
to web2py-users


On Dec 7, 7:37 pm, Thadeus Burgess <thade...@thadeusb.com> wrote:
> why 1 over 2?

The simple answer is that csv files are only about saving data, and
may involve conversions at that;
[1] - by using a native database backup, do not risk conversion
effects, and perhaps more to the point will include backup of things
that web2py cannot backup, e.g. meta-data for the db (e.g.
permissions, logins, etc.)

Whenever backing up, use the closest to the "original" source to
backup as possible.

It's the same reason a disk image is more likely to work like the
original than a file-system based backup of an operating system
environment.

- Yarko

villas

unread,
Dec 7, 2009, 9:23:40 PM12/7/09
to web2py-users
Indeed, take Firebird for example, I would only ever choose to back up
using the official utility which backs up all data, stored procedures,
generators (for autoinc), does garbage collection etc and produces a
clean 'transportable' backup which can be restored on any platform.
Sqlite users might appreciate something built in though.

IMO even more useful would be a simple replication feature - I think
that would be a great selling point!
David

Thadeus Burgess

unread,
Dec 8, 2009, 2:24:05 PM12/8/09
to web...@googlegroups.com
Any reason to not use both 1 and 2 if hdd space permits?

-Thadeus

Yarko Tymciurak

unread,
Dec 8, 2009, 2:31:35 PM12/8/09
to web2py-users
On Dec 8, 1:24 pm, Thadeus Burgess <thade...@thadeusb.com> wrote:
> Any reason to not use both 1 and 2 if hdd space permits?

Think of them as 2 separate activities, addresssing 2 very different
needs / outcomes:

[1] Database native backup: provides dependable recovery path for
_this_ deployment (or any w/ this db backend), or any place where a
compatible version of this backend / server is available (e.g.
portable accross machines, but not backends).

[2] csv data backup: provides portable data backup, easily
accessible and manipulatable - e.g. can import into other backends,
can manipulate with spreadsheets, python, etc. easily. Note: May (or
may not) provide a complete one step recovery step for a running
system.

Then choose appropriately.
Note: if you have [1] and _that_ db server available somewhere, then
you can always create [2] from it. If [1] is a proprietary server,
and license may not be portably available, then [2] starts to look
like it has more utility.

- Yarko

Thadeus Burgess

unread,
Dec 8, 2009, 2:39:26 PM12/8/09
to web...@googlegroups.com
I am thinking that if the database on 1 goes down, a sqlite can be set
up fairly quickly if csv is available, while allowing time to get
another 1 database up and running.

That is assuming that there are no warm/cold standby servers.

-Thadeus

Yarko Tymciurak

unread,
Dec 8, 2009, 3:03:17 PM12/8/09
to web2py-users

On Dec 8, 1:39 pm, Thadeus Burgess <thade...@thadeusb.com> wrote:
> I am thinking that if the database on 1 goes down, a sqlite can be set
> up fairly quickly if csv is available, while allowing time to get
> another 1 database up and running.

Don't forget that a native db backup creates a file - it can be on any
machine (e.g. copy it to another machine and start up).

While you _could_ setup sqlite as db on some backup machine, why not
then setup the SAME db for a fallback? (sqlite has all sorts of
limits and behaviors that may not be consistent with "full size" dbs,
so you may have trouble).

That said, a final backup to _whatever_ db from a csv seems like a
fine last line of defense (I am only saying there are other, easily
available and better recovery lines of defense available - set those
up ahead of the "last one" :-)).

- Yarko

villas

unread,
Dec 8, 2009, 3:23:53 PM12/8/09
to web2py-users
On Dec 8, 7:39 pm, Thadeus Burgess <thade...@thadeusb.com> wrote:
> I am thinking that if the database on 1 goes down, a sqlite can be set
> up fairly quickly if csv is available.

Although I appreciate the idea that the backup data could be
transportable across different DBs, it still seems a trivial matter to
restore the original DB. Otherwise any triggers, SPs etc would be
missing. Therefore, apart from maybe Sqlite, this is not so much a
reliable backup but more of a migration tool.

When a machine goes down, it is that day's latest data that is lost.
That's why I suggested a simple replication idea. Or, at least some
way of regularly (every few minutes) copying new and updated records
to another machine. If not replication, then a log. Perhaps an
extension of Massimo's audit trail slice. For free reliable storage,
the log could perhaps be emailed to a Gmail account.

Of course, if we had DB backups, Web2py DB-agnostic backup, and either
simple replication or an audit trail log, we've got everything
available to rebuild our data. Of course we might still lose the last
few minutes, but anyone who has data crucially important will already
have serious raid and replication services which perhaps makes this
discussion irrelevant for them.

D

Thadeus Burgess

unread,
Dec 8, 2009, 3:27:25 PM12/8/09
to web...@googlegroups.com
Agreed, ideally there would be a warm database server ready and
waiting, or at least a cold server that just needs to have the data
uploaded.

However at this point in time, we have no resources to dedicate to a
backup machine (stupid I know.) I am just wanting to cover all my
basis, and if something does happen, be able to get the system
"running (even if slow)" asap.

Thinking about running the system on sqlite while setting a proper
prostres database up, would not be desired, since then exporting from
sqlite to postgres would be a pita.

As far as database redundancy and keeping backups on that end I am ok.
I was curious if the web2py export_to_csv solution should be
considered as on option. At this point, it is looking to not be a
"backup" option, but more of a "migration" option.

-Thadeus

Yarko Tymciurak

unread,
Dec 8, 2009, 6:28:48 PM12/8/09
to web2py-users
On Dec 8, 2:27 pm, Thadeus Burgess <thade...@thadeusb.com> wrote:
> Agreed, ideally there would be a warm database server ready and
> waiting, or at least a cold server that just needs to have the data
> uploaded.
>
> However at this point in time, we have no resources to dedicate to a
> backup machine (stupid I know.) I am just wanting to cover all my
> basis, and if something does happen, be able to get the system
> "running (even if slow)" asap.

A couple of comments: this kind of redundancy is appropriately
achieved outside of the infulence of web2py: that is, db replication
from the db server end, or (lower level) redundant RAID, etc.

>
> Thinking about running the system on sqlite while setting a proper
> prostres database up, would not be desired, since then exporting from
> sqlite to postgres would be a pita.

Setting up postgres server is really pretty easy - even a minimal,
cheap windows pc (! and why, why, oh why would you even bother
running windows! ;-)) will run a postgres server (I think I have one
on my netbook, actually, but that's ubuntu).

I think this is preferable / easier in the bigger scheme of things,
than thinking about cvs / sqlite (blech! spit! arrrr.... hehehe...)

:-)

Thadeus Burgess

unread,
Jan 4, 2010, 3:00:44 PM1/4/10
to web...@googlegroups.com
For the moment, I am running two cron jobs. One will execute web2py
and generate a CSV file of the database, tar it up. The other cron
will run an hour later, and will use PGDUMP and tar the sql statements
up. It then uses SSH to ship the data to our backup file server.

For the moment this seems to be the best solution, considering both
backups combined calculate to only 800kb, as this size increases I
will most likely stick with one or the other.

This has two advantages, one I have access to the data in a web2py
format, that I can install on my development machine running sqlite,
so I can test with real data. The other advantage is I still have a
native database backup, in case the server crashes.

-Thadeus

Reply all
Reply to author
Forward
0 new messages