Thanks
GK
Bacula?
Why not use rsync? I can assure you that it is as good as rsync.
And why not make it a cron job-- then you do not even need a gui or text
box for it.
Amanda? I have always been happy with it. For any backup utility that
works on the file level the MySQL database will have to be dumped to a
file prior to running the backup.
Günther
Which makes rsync or rsnapshot perfect for the job. Do the mysql
backup first, then the rsync or rsnapshot, all from an automated cron
job. No need for any UI at all.
--
Joe - Linux User #449481/Ubuntu User #19733
joe at hits - buffalo dot com
"Hate is baggage, life is too short to go around pissed off all the
time..." - Danny, American History X
As others have said, the obvious choice here would be ... rsync.
Far and away the biggest issue with backups is recovery. If you haven't
tested your recovery system, or if it is inconvenient or unreliable,
then your backups are useless. The great thing about rsync is that you
have a straight copy of your data - recovery is a simple file copy.
rsync has a lot of options and functionality, which can be useful for
automated backups. In particular, when combined with hard link copies
you can get snapshot backups where each backup takes only the space
needed to store the difference (like a traditional incremental backup),
yet you've still got everything directly available.
<http://www.mikerubel.org/computers/rsync_snapshots/>
rsnapshot is an example automation of this sort of system.
For rsync backups of things like databases, the best idea is to do a
database dump before the rsync. However, if you can't conveniently
arrange for that sort of thing, it may not actually be necessary.
Running an rsync on a database's data directory will give you a snapshot
of the database files at the time. If you were then to stop the
database server and copy those files back again (simulating a restore),
and start the server, it would seem to the server that it had suffered a
system crash or power out, and it would recover the data using the
journals and logs in these files. You will very likely lose some of the
latest changes to the database, and it's conceivable that there might be
inconsistencies due to writes to the files during the time taken to do
the rsync, but the server should fix these on startup. Whether this
level of backup is acceptable or not depends on the sort of data you are
dealing with, and how often it is changing. Proper dumps are normally a
better choice, but simple file copies are better than nothing.
In my opinion the most important part of the above is "Proper dumps are
normally a better choice". I would expand that to *much* better. If
your database is small, you might as well just dump it. And if it's
large, then a filesystem rsync (without table locking or similar) might
result in odd issues with your data that might not be resolved by the
dbms' recovery system.
For taking a backup of MyISAM tables in mysql, you can use mysqlhotcopy,
which properly locks the tables, copies the raw database files, then
unlocks the tables. This way you get a consistent database snapshot
without possibly having to generate and store a full text dump, which
might take longer and take more disk space. (The downside is that,
since the backup is stored as binary data, not text, a backup system
that works off of diffs won't be as efficient.)
--keith
--
kkeller...@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
see X- headers for PGP signature information
Grsync.
It depends somewhat on your needs, your database usage, and your
database server. I don't know about MySQL, but postgresql seems to be
good at working with such straight file copy backups. If you don't
write much to your database, your chances of a corrupt copy become
smaller (and then you just use the previous backup instead). If you are
using a system like rsnapshot with multiple backup copies that are
hardlinked when the files are unchanged, you get very small incremental
costs per backup. With monolithic dump files, even a single change to
the database means that the whole file must be saved for each backup
(though rsync will still minimise the traffic transferred in the copy).
You can take it a stage further by putting your database files on an LVM
volume and doing an LVM snapshot, which you then use for the backup.
Then your files are consistent exactly as though the power was turned
off when the snapshot is taken - and a database server worth the name
should be able to recover from that.
> For taking a backup of MyISAM tables in mysql, you can use mysqlhotcopy,
> which properly locks the tables, copies the raw database files, then
> unlocks the tables. This way you get a consistent database snapshot
> without possibly having to generate and store a full text dump, which
> might take longer and take more disk space. (The downside is that,
> since the backup is stored as binary data, not text, a backup system
> that works off of diffs won't be as efficient.)
>
You get a fully consistent database snapshot in this way (and as I said,
proper dumps like this are normally the best choice). But you have
several downsides. First, you have extra processes to run and
synchronise (not a problem with cron and a script, but it might be an
issue for people wanting a simpler system, or if the backup is initiated
from a different computer). Second, you are locking all the tables
during the backup - that may or may not be an issue. Thirdly, as you
say your backups may take more time, and they take more space since you
can't take advantage of hard-linked copies (diff-based backups are
horrible - I wouldn't recommend them).
It's all a matter of balancing your needs. If you are already doing an
rsync backup of everything else on the machine, including the database
files is very simple. It may be good enough for you. Doing database
dumps is the "right" way to do the backups, so that should be the method
to use unless you have good reason not to. But wanting a simple and
easy solution without having to learn about dumps may well count as a
good enough reason.
Whatever method you choose, do a practice restore to make sure you can
recover your data!
I don't know if this is true with newer versions of postgresql, but the
versions I've used, the data store was not portable across different
architectures. Indeed, there was no guarantee of portability even
across different machines on the same architecture, even using the same
version of PostgreSQL!
(I think that I've done it, but that would have been a long time ago.
I did get caught by this issue, though: my main home server's power supply
died, and the only other machine I had available was a ppc box. My naive
solution, simply using the data store from the old machine's hard drive,
didn't work, and I had to resort to an old dump (because I'd recently
broken my backup script). Fortunately this was not a huge problem for
me, but I could imagine it being a big problem if you had no dumps at
all.)
So I think it's not wise to depend on this mode as a primary backup--it
could serve as a desperation backup, but the primary backup should be a
proper dump (or mysqlhotcopy if you can and desire).
For MyISAM tables, MySQL works fine with filesystem backups; I believe
these files are portable across architectures, but don't quote me on
that. I don't know how portable filesystem snapshots are for InnoDB
tables.
> If you don't
> write much to your database, your chances of a corrupt copy become
> smaller (and then you just use the previous backup instead). If you are
> using a system like rsnapshot with multiple backup copies that are
> hardlinked when the files are unchanged, you get very small incremental
> costs per backup. With monolithic dump files, even a single change to
> the database means that the whole file must be saved for each backup
> (though rsync will still minimise the traffic transferred in the copy).
This is all true, but I don't see a way to reliably get a good backup
otherwise. And you'll only have real difficulties if your database is
quite large--most typical dbs should not create enormous dump files
anyway.
> If you are already doing an
> rsync backup of everything else on the machine, including the database
> files is very simple. It may be good enough for you. Doing database
> dumps is the "right" way to do the backups, so that should be the method
> to use unless you have good reason not to. But wanting a simple and
> easy solution without having to learn about dumps may well count as a
> good enough reason.
Well...I think that anyone intent on using a database regularly should
not allow themselves to get lazy and rely on filesystem backups. They
should start the ''right'' way, and only use a filesystem backup if they
really don't care all that much about their databases in the first
place (or can recreate it quickly from what they already have on their
filesystem).
> It's all a matter of balancing your needs.
> Whatever method you choose, do a practice restore to make sure you can
> recover your data!
Double-plus-yes to the above! In addition, do a practice restore to a
different machine if at all possible.
>On 2009-10-27, David Brown <da...@westcontrol.removethisbit.com> wrote:
>>
...
>> If you don't
>> write much to your database, your chances of a corrupt copy become
>> smaller (and then you just use the previous backup instead). If you are
>> using a system like rsnapshot with multiple backup copies that are
>> hardlinked when the files are unchanged, you get very small incremental
>> costs per backup.
I'm using rsync and hard-linked backups run from a cron job each 30 min,
with another daily cron job eating the backup tail -- I think at age 90
days. Only problem I had was slocate filling the /var partition -- so I
uninstalled slocate, I don't use it anyway.
...
>> It's all a matter of balancing your needs.
>> Whatever method you choose, do a practice restore to make sure you can
>> recover your data!
>
>Double-plus-yes to the above! In addition, do a practice restore to a
>different machine if at all possible.
Yes, test the restore ability _before_ you need it, way back I discovered
a booboo in a backup script -- it looked normal but was overwriting files
with altered most recent, instead of saving the altered file -- easy to
fix. But the backup was bust at the time I needed a file and found the
buglet instead.
Grant.
--
http://bugsplatter.id.au
That's a good point - I didn't know about such limitations. It worked
for me in the tests I did, but it certainly emphasises the point that
backups are only as good as the recovery procedure, and that you have to
test recovery.
All I'm really saying is that file-based copies are better than nothing,
and that different backup strategies have different pros and cons. But
you are right to mention other issues with file-copy backups.
>> If you are already doing an
>> rsync backup of everything else on the machine, including the database
>> files is very simple. It may be good enough for you. Doing database
>> dumps is the "right" way to do the backups, so that should be the method
>> to use unless you have good reason not to. But wanting a simple and
>> easy solution without having to learn about dumps may well count as a
>> good enough reason.
>
> Well...I think that anyone intent on using a database regularly should
> not allow themselves to get lazy and rely on filesystem backups. They
> should start the ''right'' way, and only use a filesystem backup if they
> really don't care all that much about their databases in the first
> place (or can recreate it quickly from what they already have on their
> filesystem).
>
That's a reasonable summary, and matches my backup strategy. I've got
dump-based backups on a couple of important databases, and filecopy
backups on a couple of others that could be recreated if necessary.
Remember, "lazy" is not necessarily bad - it's better to have a quick
but poor backup system, than to plan a perfect system but not get the
time to implement it!
If you want something like rdiff and rsync, then rdiff-backup might be a
good match. It combines the features of a mirror and an incremental
backup, so you can restore to any previous backup point. The increments
are kept as a series of reverse diffs from the current mirror. Syntax
is similar to rsync, and librsync is used to generate efficient reverse
diffs. Biggest disadvantage is the difficulty of deleting that
multi-gigabyte file that accidentally got included in your regular
backup. Learning curve can be a bit steep at first.
http://www.nongnu.org/rdiff-backup/
--
Bob Nichols AT comcast.net I am "RNichols42"
> Gabriel Knight wrote:
>> Hi all I need a free program to backup a ubuntu server for my school
>> class, it has to be as good or better than Rdiff and Rsync the server
>> will use SSH, MYSql and be a file and web server and do a couple of
>> other things. I need it to be either a gui or text box program.
>>
>>
> As others have said, the obvious choice here would be ... rsync.
*sync isn't a abckup system. It is just a copy system.
A copy *IS* a backup.
>>*sync isn't a abckup system. It is just a copy system.
>
> And a backup is not a copy?
A backup system has multiple copies, with historical content.
How do you come to that definition? A backup system has as many copies
as desired/needed by business practice. If one copy without history is
sufficient, then that's a good backup.
Well, you _can_ do that with rsync. see e.g. Rubel's article, in
http://www.mikerubel.org/computers/rsync_snapshots/
But, strictly you are right; rsync is not, by itself, a backup system.
So does rsync, if you use it that way. Rsnapshot is rsync, and I can
give you a restore from my backup drive from any day in the last week,
any week in the last month, and any month in the last year. GFS.
Rsync, on it's own, can do the same, but takes a little more leg work.
A backup system means having copies of your data along with a way to
restore it. You are right that rsync on its own is not a backup system,
but it can form the backbone of a backup system. If you use it
appropriately (either with your own scripts or command line arguments,
or using a ready-made tool such as rsnapshot or dirvish), you then have
a backup system.
> On 2009-10-28, terryc <newsnine...@woa.com.au> wrote:
>> On Wed, 28 Oct 2009 14:57:55 +1100, Grant wrote:
>>
>>>>*sync isn't a abckup system. It is just a copy system.
>>>
>>> And a backup is not a copy?
>>
>> A backup system has multiple copies, with historical content.
>
> How do you come to that definition? A backup system has as many copies
> as desired/needed by business practice. If one copy without history is
> sufficient, then that's a good backup.
The basic reason is that problems can happen with "the copy". No matter
what your system, one copy is no copy at certain times.
As others have pointed out, it is the system, not the tools (tar, cpio,
rsync, etc).
As you say, it is about what he really needs.
My 2c is that depending on amount of data and frequency of changing, he
could probably just dump his databases and just burn a DVD each night.
$AUS50 for a dvd burner and AUS50c/DVD. Even cheap DVD should last a few
years before they are toasters.
For an example, consider what happened to a very large software company
recently when trying to do a hardware upgrade of the storage system.
There was only one copy of the data (an old copy at that). Before doing
the hardware upgrade, they wanted to make a new copy - but there was not
enough space on the backup system. So they deleted their only copy, and
started making a new copy. Since it was going to take days to make the
new copy, and they couldn't be bothered waiting, they cancelled the copy
and did the hardware upgrade anyway. When that failed, everything was gone.
As you say, having a single copy is not a backup system by itself.
And even multiple copies on multiple devices at multiple locations is
not a backup system - you need a tried and tested recovery procedure to
make it a backup system.
> As others have pointed out, it is the system, not the tools (tar, cpio,
> rsync, etc).
>
However, the tools form the central part of the backup system, and
determine the main features you have available. Thus it does make sense
to talk about an rsync backup system.
> Since it was going to take days to make the
> new copy, and they couldn't be bothered waiting, they cancelled the copy
> and did the hardware upgrade anyway. When that failed, everything was
> gone.
I actually worked, briefly, for a company that would not buy new backup
tapes. After weeks of requesting new tapes, I eventually worked into the
EDP mangers office and dumped a backup report that took up half a box of
paper, normally 50 pages, and said "It is my belief that if we needed to
recover from a backup, the company could not do it". Then I held up a
sample of the 2nd hand tapes they were so proud of obtaining and left
tape coating over the managers desk. {:-).
>
> As you say, having a single copy is not a backup system by itself.
>
> And even multiple copies on multiple devices at multiple locations is
> not a backup system - you need a tried and tested recovery procedure to
> make it a backup system.
First thing you check when you become responsible for the company
backups. I've also come across a couple of those in my time. This isn't
too bad if you have a good GFS system and the paper trail (or transaction
logs) still exists.
>
>> As others have pointed out, it is the system, not the tools (tar, cpio,
>> rsync, etc).
>>
>>
> However, the tools form the central part of the backup system, and
> determine the main features you have available. Thus it does make sense
> to talk about an rsync backup system.
Between scripts running tar, cpio and diff, there hasn't been much else
in many "branded" backup systems I've used. The part that makes them easy
to use is the database, query and file selection facilities.
The only other trick you need to know is that many put multiple jobs onto
one tape so you need to use /dev/nst0 (no rewind) and move pass the end
of tape(?) marker to be able to manually recover the contents of some of
these tapes.
In case anyone didn't spot the reference:
<http://www.hiptop3.com/archives/sidekick-backup-problems-blamed-on-management>
> I actually worked, briefly, for a company that would not buy new backup
> tapes. After weeks of requesting new tapes, I eventually worked into the
> EDP mangers office and dumped a backup report that took up half a box of
> paper, normally 50 pages, and said "It is my belief that if we needed to
> recover from a backup, the company could not do it". Then I held up a
> sample of the 2nd hand tapes they were so proud of obtaining and left
> tape coating over the managers desk. {:-).
I've heard of that kind of thing many times - people often make the
mistake of equating "tape" with "good backup system", regardless of the
procedures used for the backup. And since backup tapes cost money, it's
easy to think that you can save money by buying fewer tapes without
thinking through the consequences.
>> As you say, having a single copy is not a backup system by itself.
>>
>> And even multiple copies on multiple devices at multiple locations is
>> not a backup system - you need a tried and tested recovery procedure to
>> make it a backup system.
>
> First thing you check when you become responsible for the company
> backups. I've also come across a couple of those in my time. This isn't
> too bad if you have a good GFS system and the paper trail (or transaction
> logs) still exists.
>
You don't necessarily need something like a GFS. It's often okay for a
recovery procedure to be somewhat slow and inconvenient, as long as it
works reliably, and within the required timeframe.
But while /you/ may check the recovery procedure, a great many people do
not - that's why I emphasis it so much each time the topic comes up.
Again, it's often a fault with tape-based backups. People think they've
got a great system with incremental and differential backups, but when
disaster strikes they have huge troubles. Maybe they find that they
need to feed in a week's worth of differential backup tapes just to
restore a single file. Or they find that when the hardware failure /
fire / breakin put their server and backup machine out of commission,
they can't read the tapes on a new machine. Or perhaps their backup
software won't let them restore part of their data - so they can't
restore part of yesterday's backup without destroying today's good data.
There are many ways for restores to fail even when the data is
technically safe, but people often don't think through and test their
recovery procedures.
>>> As others have pointed out, it is the system, not the tools (tar, cpio,
>>> rsync, etc).
>>>
>>>
>> However, the tools form the central part of the backup system, and
>> determine the main features you have available. Thus it does make sense
>> to talk about an rsync backup system.
>
> Between scripts running tar, cpio and diff, there hasn't been much else
> in many "branded" backup systems I've used. The part that makes them easy
> to use is the database, query and file selection facilities.
>
I find rsync particularly useful for doing offsite backups over the
Internet, since it minimises the traffic.
> The only other trick you need to know is that many put multiple jobs onto
> one tape so you need to use /dev/nst0 (no rewind) and move pass the end
> of tape(?) marker to be able to manually recover the contents of some of
> these tapes.
My experience with tape-based backups is that it involved too much
manual work (changing tapes), recovery was slow and awkward, it was very
difficult to be sure of the reliability of the system, and we had no
practical (or cheap) way to test recovery via another machine - if the
server and the backup machine with the tape drive died, we would have
great difficulty getting the data back. For many years now, I've used
rsync to two separate machines in two locations, with hardlinked copies
to give snapshots without taking more space than necessary. Of course,
the type of backup strategy you need depends entirely on the situation
and the quantity of data.
man rsync
Look for --link-dest
And if you want it slightly more automated, use rsnapshot, which moves the old
directories for you.
>terryc wrote:
>> On Tue, 27 Oct 2009 21:48:21 -0700, Keith Keller wrote:
>>
>>> On 2009-10-28, terryc <newsnine...@woa.com.au> wrote:
>>>> On Wed, 28 Oct 2009 14:57:55 +1100, Grant wrote:
>>>>
>>>>>> *sync isn't a abckup system. It is just a copy system.
>>>>> And a backup is not a copy?
>>>> A backup system has multiple copies, with historical content.
>>> How do you come to that definition? A backup system has as many copies
>>> as desired/needed by business practice. If one copy without history is
>>> sufficient, then that's a good backup.
>>
>> The basic reason is that problems can happen with "the copy". No matter
>> what your system, one copy is no copy at certain times.
>>
>For an example, consider what happened to a very large software company
>recently when trying to do a hardware upgrade of the storage system.
>There was only one copy of the data (an old copy at that). Before doing
>the hardware upgrade, they wanted to make a new copy - but there was not
>enough space on the backup system. So they deleted their only copy, and
>started making a new copy. Since it was going to take days to make the
>new copy, and they couldn't be bothered waiting, they cancelled the copy
>and did the hardware upgrade anyway. When that failed, everything was gone.
>As you say, having a single copy is not a backup system by itself.
Sorry, but your example does not show that at all. No backup system is proof
against total human stupidity.
>And even multiple copies on multiple devices at multiple locations is
>not a backup system - you need a tried and tested recovery procedure to
>make it a backup system.
And rsync is a recovery system as well. That is the huge advantage of rsync. What
is saved is the same as what is needed. No compression, no weird format, not
concatenation that can be unreadable if a single byte changes.
> On Tue, 27 Oct 2009 21:48:21 -0700, Keith Keller wrote:
>
>> On 2009-10-28, terryc <newsnine...@woa.com.au> wrote:
>>> On Wed, 28 Oct 2009 14:57:55 +1100, Grant wrote:
>>>
>>>>>*sync isn't a abckup system. It is just a copy system.
>>>>
>>>> And a backup is not a copy?
>>>
>>> A backup system has multiple copies, with historical content.
>>
>> How do you come to that definition? A backup system has as many copies
>> as desired/needed by business practice. If one copy without history is
>> sufficient, then that's a good backup.
>
> The basic reason is that problems can happen with "the copy". No matter
> what your system, one copy is no copy at certain times.
>
> As others have pointed out, it is the system, not the tools (tar, cpio,
> rsync, etc).
>
> As you say, it is about what he really needs.
Well said. The OP did not show up again. So it is not easy to guess what
he needs. One aspect that I miss in the discussion is networking. This is
a networking group, and the OP is a system at school which will likely be
a small network.
If more than one machine has to be included in a backup scheme a simple
script based system running on every single host quickly becomes too much
demanding in administration. In such a situation the big packages like
bacula, amanda or TVM show their strength in maintaining a single
database and backup documentation for all hosts. They are also able to do
load balancing and thus distributing costly full backups over the backup
cycle.
Günther
The point is, before they started this upgrade process, they did have a
single copy. But they did not have a backup system.
But as you say, no backup system is proof against this level of
stupidity and irresponsibility.
>
>
>> And even multiple copies on multiple devices at multiple locations is
>> not a backup system - you need a tried and tested recovery procedure to
>> make it a backup system.
>
> And rsync is a recovery system as well. That is the huge advantage of rsync. What
> is saved is the same as what is needed. No compression, no weird format, not
> concatenation that can be unreadable if a single byte changes.
>
Absolutely agreed.
We can talk about rsync and networks if you like! rsync is very
network-friendly, as it only copies over the differences between
directories when doing a synchronisation (assuming you have appropriate
flags set). For large files, it can even copy only the changed parts of
the file. And all the network transfers can be compressed. This all
makes it very bandwidth friendly, and is very useful for doing offsite
backups.
For backup of multiple machines, it's best to install an rsync server on
the servers, and use an rsync client on the backup machine. This gives
you a single place to organise most of the backup system.
> Günther Schwarz wrote:
>> If more than one machine has to be included in a backup scheme a simple
>> script based system running on every single host quickly becomes too
>> much demanding in administration. In such a situation the big packages
>> like bacula, amanda or TVM show their strength in maintaining a single
>> database and backup documentation for all hosts.
Edit: TSM, not TVM.
>> They are also able to
>> do load balancing and thus distributing costly full backups over the
>> backup cycle.
> We can talk about rsync and networks if you like! rsync is very
> network-friendly, as it only copies over the differences between
> directories when doing a synchronisation (assuming you have appropriate
> flags set). For large files, it can even copy only the changed parts of
> the file. And all the network transfers can be compressed. This all
> makes it very bandwidth friendly, and is very useful for doing offsite
> backups.
That's all true and these are useful features of rsync. I use the tool
often and like it.
> For backup of multiple machines, it's best to install an rsync server on
> the servers, and use an rsync client on the backup machine. This gives
> you a single place to organise most of the backup system.
Still it will be demanding to monitor and administrate such a scheme.
With amanda I get an email every morning showing me which hosts have been
in the backup last night, possible errors, amount of data copied etc. It
takes me just a few seconds to verify that all went well, and I keep my
ass safe. Also I do not have to worry about distributing the load evenly
over the backup cycle as this is done automatically by amanda. I have
tools at hand which allow to analyze the backup cycle in detail. For me
all this makes the time spent on the configuration of a relatively
complex backup package well spent even for a small network with about 200
partitions and directories in the backup pool. It might take me a lot
more work to get the same convenience and functionality from home-made
scripts.
Günther
There are, of course, exceptions to any rule. I ran some tests a while
back and discovered that in our situation it was faster to simply use
rcp than rsync. We were syncing very large files across a fast network
and it took rsync longer to figure out what to copy than to just blindly
copy it all. In fact, for files that were identical at both ends, rcp was
*still* faster.
However, I don't disagree with the above as a general guideline.