Sorry for attaching to this very ancient thread.
W dniu poniedziałek, 12 sierpnia 2013 10:35:18 UTC+2 użytkownik tonyalbers napisał:
> Hi guys,
>
> Now that we are seeing a huge growth in the number of systems that can deduplicate data, like DataDomain and Quantum DXi's, it would be nice if Bareos could support these.
This is a dead end. HW dedup systems are huge priced. Earlier or later it will be a very small niche. Now we are observing a huge growth of software defined storage. Connect a lot of cheap commodity disks into a network, run a software and you will have a very robust, elastic and fast storage. For backup systems more and more are implementing inline deduplication and benefit from it. With external hw deduplicator you cannot implement deduplication on client which save network bandwith a lot.
> AFAIK bacula has a way of storing data on tape that is optimized so that the space is used effectively. This unfortunately makes the saved data bad for dedupe.
Well. If his hw deduper is using a variable block with sliding window it should dedup Bareos volumes easly. If not - forget. And make inline deduplication.
> So, are there any plans for changing the code so that you can store the data exactly as it is streamed from the client?
> That is without any optimization at all? This would in theory make deduplication possible with Bareos.
>
> That would mean a huge advantage, since we could also use ZFS with dedupe turned on.
If you wait a moment then you can check Bacula Enterprise Aligned Volumes Plugin, as described by Kern in his "Bacula status report":
(...)
4. The Enterprise Aligned Volume plugin (deduplication for zfs, btrfs,
NetApp, ... deduping filesystems) available free for the community.
(...)
I dont know what available free for the community means.
As I described above this is a dead end. Dont waste your time developing it. Better spend your devel time for real deduplication solution.
What I'm thinking of is a real block level storage defined on Storage Daemon with different storage backends and clustering (network mirroring and replication). Storage backends like local filesystems, object stores (S3, etc.), remote filesystems, etc. All connected together, self balanced, distributed. Lack of storage space, just add another storage node and make some space. Or better make some or other backup clients as a storage nodes. Connect all together - the easiest way. When you assure a proper level of automatic redundancy (something like raid5/6 based on nodes/data not disks) you can survive some crashes too. All in software. :)
I know it is not an easy task, especially clustering with self healing and automatic redundancy. Now I'm designing block level storage with storage pools and replication - other features later.
best regards