Support for dedupe H/W or filesystems

337 views
Skip to first unread message

tonyalbers

unread,
Aug 12, 2013, 4:35:18 AM8/12/13
to bareos...@googlegroups.com
Hi guys,

Now that we are seeing a huge growth in the number of systems that can deduplicate data, like DataDomain and Quantum DXi's, it would be nice if Bareos could support these.

AFAIK bacula has a way of storing data on tape that is optimized so that the space is used effectively. This unfortunately makes the saved data bad for dedupe.

So, are there any plans for changing the code so that you can store the data exactly as it is streamed from the client? That is without any optimization at all? This would in theory make deduplication possible with Bareos.

That would mean a huge advantage, since we could also use ZFS with dedupe turned on.


/tony

Philipp Storz

unread,
Aug 13, 2013, 5:20:28 AM8/13/13
to bareos...@googlegroups.com
Hello Tony,

this is an excellent and in my view an totally obvious idea.

The only problem is that baculasystems says they have a patent pending on their solution of
deduplication: (See
http://www.baculasystems.com/wp-content/uploads/bacula-enterprise-v6-deduplication-volumes1.pdf)

We doubt this patent becomes valid, however this shows that software patents are a big problem for
open source software. We wonder what Baculasystems wants to do with this patent while calling
themselves "The Enterprise Open Source Backup Company".

We would really like to have something that supports deduplication, but we have to have an eye on
the patent problems.

Still, it would be very nice to have something dedupable, and anybody that would like to implement
something there is invited to submit his patches.

Best regards,

Philipp
--
Mit freundlichen Grüßen

Philipp Storz philip...@bareos.com
Bareos GmbH & Co. KG Phone: Phone: +49221630693-92
http://www.bareos.com Fax: +49221630693-10

Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
Geschäftsführer: Stephan Dühr, M. Außendorf,
J. Steffens, P. Storz, M. v. Wieringen

lst_...@kwsoft.de

unread,
Aug 13, 2013, 5:59:21 AM8/13/13
to bareos...@googlegroups.com

Zitat von Philipp Storz <philip...@bareos.com>:

> Hello Tony,
>
> this is an excellent and in my view an totally obvious idea.
>
> The only problem is that baculasystems says they have a patent
> pending on their solution of
> deduplication: (See
> http://www.baculasystems.com/wp-content/uploads/bacula-enterprise-v6-deduplication-volumes1.pdf)
>
> We doubt this patent becomes valid, however this shows that software
> patents are a big problem for
> open source software. We wonder what Baculasystems wants to do with
> this patent while calling
> themselves "The Enterprise Open Source Backup Company".
>
> We would really like to have something that supports deduplication,
> but we have to have an eye on
> the patent problems.

From what i understand they basically like to patent alignment to
some blocksize? This is done in software for ages on all occasions, no?

Regards

Andreas

igorkow...@gmail.com

unread,
Jan 14, 2015, 11:42:13 AM1/14/15
to bareos...@googlegroups.com
Sorry for attaching to this very ancient thread.

W dniu poniedziałek, 12 sierpnia 2013 10:35:18 UTC+2 użytkownik tonyalbers napisał:
> Hi guys,
>
> Now that we are seeing a huge growth in the number of systems that can deduplicate data, like DataDomain and Quantum DXi's, it would be nice if Bareos could support these.

This is a dead end. HW dedup systems are huge priced. Earlier or later it will be a very small niche. Now we are observing a huge growth of software defined storage. Connect a lot of cheap commodity disks into a network, run a software and you will have a very robust, elastic and fast storage. For backup systems more and more are implementing inline deduplication and benefit from it. With external hw deduplicator you cannot implement deduplication on client which save network bandwith a lot.

> AFAIK bacula has a way of storing data on tape that is optimized so that the space is used effectively. This unfortunately makes the saved data bad for dedupe.

Well. If his hw deduper is using a variable block with sliding window it should dedup Bareos volumes easly. If not - forget. And make inline deduplication.

> So, are there any plans for changing the code so that you can store the data exactly as it is streamed from the client?
> That is without any optimization at all? This would in theory make deduplication possible with Bareos.
>
> That would mean a huge advantage, since we could also use ZFS with dedupe turned on.

If you wait a moment then you can check Bacula Enterprise Aligned Volumes Plugin, as described by Kern in his "Bacula status report":
(...)
4. The Enterprise Aligned Volume plugin (deduplication for zfs, btrfs,
NetApp, ... deduping filesystems) available free for the community.
(...)
I dont know what available free for the community means.

As I described above this is a dead end. Dont waste your time developing it. Better spend your devel time for real deduplication solution.

What I'm thinking of is a real block level storage defined on Storage Daemon with different storage backends and clustering (network mirroring and replication). Storage backends like local filesystems, object stores (S3, etc.), remote filesystems, etc. All connected together, self balanced, distributed. Lack of storage space, just add another storage node and make some space. Or better make some or other backup clients as a storage nodes. Connect all together - the easiest way. When you assure a proper level of automatic redundancy (something like raid5/6 based on nodes/data not disks) you can survive some crashes too. All in software. :)

I know it is not an easy task, especially clustering with self healing and automatic redundancy. Now I'm designing block level storage with storage pools and replication - other features later.

best regards
Reply all
Reply to author
Forward
0 new messages