Is Bareos suitable for this scenario?

60 views
Skip to first unread message

Steve Eppert

unread,
Apr 15, 2021, 10:49:02 AM4/15/21
to bareos-users
Hi.
I need to backup around 200 TB of data (with many small files) with around 1 TB per week new/changed data. Currently I simply rsync the data to an offsite location using a 100 MBit/s connection.

While searching for solutions for making the rsync faster (because of the many small files an rsync almost never uses the full 100 MBit/s) I stumbled across Bareos.

A question I could not find an answer to in the docs is: how does the bareos-fileseamon check for changed data when doing an incremental backup? Does the daemon hold some kind of database or does it check each file against the Bareos server? I'm wondering if a Bareos incremental backup job might be faster than the rsync.

Also after looking at the docs I'm considering purchasing a tape loader to backup a specific subset of more valuable data to tape.
Is it possible to have incremental backups to disk and do a regular full backup of only a subset of this data to tape?

It it possible to get filesystem access to the incremental backed up data on disk or is the Bareos interface the only way to access this data?

Thanks!
Steve


Brock Palen

unread,
Apr 16, 2021, 9:21:53 AM4/16/21
to Steve Eppert, bareos-users
I have not seen any replies to your question, I can’t speak to that volume of data though I see no reason why it cannot. Here are my thoughts below how I would approach it along with some of your other questions.

* The number of files will impact things more than total data size. It will increase database size, scan time etc.
* I have easily seen Bareos saturate well above 100Mbit networking. Though 100Mbit is very slow to do the initial full backup of 200T You are looking at a minimum of 6 months assuming data does not compress. For initial backup you might want to do sneaker net with a raspberry pi and a drobo. This is what I do, full backup is done on site @ gig speeds then cary the entire setup to the other site and do a volume migration to the real server.
https://fasterdata.es.net/home/requirements-and-expectations/

* Look at the Bareos client side compression options, on bandwidth constrained hosts (this includes cloud because of cost) I use gzip turned all teh way up. This will peg one CPU core but for text data reduces the volume of data over wire drastically. Something like lz4 is a great low CPU impact but still get 70% of the compression of gzip. If you have the CPU core to burn and in a test if it still saturates your 100Mbit maybe use it to get that backup time down. If this is all video or already compressed images cram files it likely just burns CPU for no impact. Baroes give you a report at the end of ajob of how well it compressed.

* how baroes checks for files, using the accurate settings (recommended) the server will upload a list of files it knows about to the client and it compares them. This process is very fast, by default Bareos won’t use checksums to compare, but only 1. does the file exist, 2. is the filesystem metadata newer then in the database/catalog (file has changed). Incrementals with Baroes are much faster than rsync. (I have moved PB of data with Rsync)


With 200TB of data you will want a lot of tape, otherwise you're looking at 400TB+ of disk. If your new to backup you have to build a new “full” every so often. Given your network is 100Mbit I would look at the Always Incremental features of Baroes. This will let you avoid the 180 days of a new full backup. But you still have to write 200TB every so often but it can all be done baroes server side. I recommend tape just for cost, as you need 66 LTO 7 tapes or 33 LTO8 tapes. LTO7 i still the best value but LTO8 has come down in cost a lot and LTO9 is scheduled for GA this year. You will also want a few tape drives and a fast spool pool of disks to do this right. This 2x minimum size is one downside backup systems have to rsync.

An all disk solution will be faster because a big raid z2 will have greater bandwidth or the VirtualFull, but it will be expensive. You could look at something like 45 Drives to turn into your SD. I do a mix (again fraction the size you are with Baroes)

I would personally split this into several jobs using wildcards in filesets, and not have 1 200TB job, but several few TByte jobs. This will also let you run jobs in parlalel recover better from a full backup failure, not have to copy 200TB when you do a full etc.



Brock Palen
bro...@mlds-networks.com
www.mlds-networks.com
Websites, Linux, Hosting, Joomla, Consulting
> --
> You received this message because you are subscribed to the Google Groups "bareos-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/bareos-users/ccddf8c0-4fcc-4230-994f-157b9a2d1b06n%40googlegroups.com.

Spadajspadaj

unread,
Apr 16, 2021, 9:29:56 AM4/16/21
to bareos...@googlegroups.com
I somehow missed the original email.

I suspect that with many many small files you're mostly limited by the
source filesystem (and whole system) performance more than sheer backup.
Regardless of what the method of deciding whether the file needs backing
up, its metadata still have to be read from the filesystem. Probably
tuning the source system (giving more memory for metadata cache) could
help a little.

But I wouldn't expect big differences just by switching from rsync to
bareos.

Just my three cents.
Reply all
Reply to author
Forward
0 new messages