On May 8, 2013, at 1:18 AM, Netflix Boundaries <
netflix.b...@gmail.com> wrote:
> I think based on your statement that suboptimal use may apply then to ANY backup-related type of use. Encrypting or compressing sacrifices performance. We would need to give it more power (more spends) or remove featurings.
No, s3ql should be fine for at least some, if not many, backup scenarios. However, it does optimize for space and not performance, in part because the network is usually one of the biggest bottlenecks in most backup scenarios. It also optimizes for cost of storage, because anyone who is using S3 as their backup medium is someone who wants pretty much the lowest possible cost of storage and is willing to sacrifice most everything else to get it.
You are right that the initial backup will take a long time to execute, and will spend a lot of CPU doing compression. That is true for just about all remote backup solutions that I know of.
My reference to Continuous Data Protection (or Near-CDP) is more about the different kind of backup solution that you seem to be shooting for, where you are pretty much constantly backing up all the data, as soon as the data changes.
CDP and Near-CDP are pretty much the current state-of-the-art for high-end solutions, but they are mega-expensive in terms of the amount of network bandwidth you have to have available between the front-end and back-end systems, and they are mega-expensive in terms of the amount of hardware required to make this kind of thing happen.
See <
http://en.wikipedia.org/wiki/Continuous_data_protection> for more information on CDP. For specific products, see <
http://www.backupcentral.com/wiki/index.php/Continuous_Data_Protection_%28CDP%29_Software> and <
http://www.backupcentral.com/wiki/index.php/Near-Continuous_Data_Protection_%28Near-CDP%29_Software>.
So long as you're running a more traditional backup scenario, and you don't have too much data that changes too quickly, I would imagine that s3ql should be sufficient.
Of course, serving as a filesystem-based backup solution is not the primary intention of s3ql, so there are going to be limits to what you can feasibly achieve in this space.
> As this may impact a lot during first full backup, tendence could be to affect next backup if ongoing a lot of data is input to the system disregarding recurrency window (every 24h, every 12h, every week, etc). At this point, if we would like to skip any files w/ same size or time stamp, will result in investing in more resources (more money) through multiple processes in parallel in order to make the sacrifice balanced-enough to maintain performance.
Note that rsync is pretty good about detecting which files have or have not changed, and doing so with relatively minimal impact on the server and the client. It's the other things which suck up the CPU. The lzma compression algorithms will suck up more CPU than bzip, and bzip will suck up more than gzip. But in return, lzma usually gets just about the best compression currently available, and bzip gets better than gzip, but not as good as lzma.
But your available network bandwidth to your backup medium is usually the single biggest limiting factor for online backup solutions. In this case, that limit is on your side, between your server and S3.