i'd love to see actual results also, but based on my reading of the helpful
http://constantin.glez.de/blog/2011/07/zfs-dedupe-or-not-dedupe
i don't see that it would buy much if anything, since you need *entire duplicate* blocks to get anything from zfs deduping.
with big disks and the resulting likely big block sizes there are unlikely to be duplicates of whole blocks in logs.
for example, repeating events would probably have different time stamps.
even if you wrote the same events to multiple different logfiles the chances of the entire block on disk being identical
is quite small for a long-ish message (even for identical messages, the chance of a message aligning at a block
boundary in just the right way is 1/n for messages of size n, right?
on the other hand, zfs compression is likely to win because it can compress substrings such as the timestamp field and the
event field.
this article
http://don.blogs.smugmug.com/2008/10/13/zfs-mysqlinnodb-compression-update
shows substantial improvements in both space and sometimes even time with varying data using default (LZJB) and compares various
compression modes for space and time.
On Oct 24, 2012, at 1:10 PM, Jesse Endahl <
jesse...@gmail.com> wrote:
> Yes, I think it would be incredibly useful, and I do imagine that there is enough duplication in log files to make it worth it. That said, you need to be careful about choosing the right hardware (specifically the right controller) if you want to get the most out of ZFS. If you go forward with it, please let me know what kind of de-dupe numbers you see.
>
> On Thursday, October 4, 2012 9:07:21 AM UTC-7, OrbitData wrote:
>> Since the new version allows for the installation of a custom Ubuntu, does anyone think running SO on ZFS with dedupe would be helpful? Is ZFS fast enough for logging and is there enough duplication in the snort logs to make the space/cost tradeoff worth it?
>>
>> Thanks.
>
> --
>
>