an even faster lzo...

Emmanuel Anne

unread,

Apr 4, 2011, 7:08:54 AM4/4/11

to zfs-...@googlegroups.com

I just added some code to initialize lzo before any operation is done, and to pre-allocate its work memory. The speed improvement is noticeable, it might even be possible to improve it further by also pre-allocating the buffer to uncompress the data but it might be a waste of ram. The same thing can be done for lzma, I'll try it later.

For info, I did some tests with my big 1 Gb 7z file. Uncompressing from a ntfs directory :
to uncompressed zfs version from september 2010 (before the buffers) : 9:54
to uncompressed current zfs-fuse version : 9:35
to zfs-fuse compressed using lzo : 9:23
For comparison, uncompressing it to an ext4 directory takes 3:15. A little frutrating, but I guess it's on a good way !

--
my zfs-fuse git repository : http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

Marcin Szychowski

unread,

Apr 5, 2011, 7:13:11 AM4/5/11

to zfs-fuse

On Apr 4, 1:08 pm, Emmanuel Anne <emmanuel.a...@gmail.com> wrote:
> I just added some code to initialize lzo before any operation is done, and
> to pre-allocate its work memory. The speed improvement is noticeable, it
> might even be possible to improve it further by also pre-allocating the
> buffer to uncompress the data but it might be a waste of ram. The same thing
> can be done for lzma, I'll try it later.

Goooood! ;-)

>
> For info, I did some tests with my big 1 Gb 7z file. Uncompressing from a
> ntfs directory :
> to uncompressed zfs version from september 2010 (before the buffers) : 9:54
> to uncompressed current zfs-fuse version : 9:35
> to zfs-fuse compressed using lzo : 9:23
> For comparison, uncompressing it to an ext4 directory takes 3:15. A little
> frutrating, but I guess it's on a good way !

Btw.: does ZFS-fuse uses high- or low-level fuse interface? Is there a
room for improvement? Is this page: http://b.andre.pagesperso-orange.fr/fuse-interfaces.html
relevant to our issues? (As I said I am not a C-hacker and have no
idea of fuse (nor ZFS nor NTFS) internals). ;-)

Keep up the good work!

Emmanuel Anne

unread,

Apr 5, 2011, 12:35:34 PM4/5/11

to zfs-...@googlegroups.com

Thanks for the link, I didn't know it. Yes ntfs-3g has already been useful for their test suite about posix conformance. This time it's a little outdated (2.6.30), and this war has already been fought !

We use the 3rd case in their table, that is lowlevel, kernel permissons (that is with the default_permissions parameter), and with cache enabled.
It reminds me that if you don't care about permissions (which can be the case if you use zfs only for a data directory, or your home), then you can try to just comment out the line with default_permissions in zfsrc. In this case permission checks become... lousy, and you won't win a lot of speed (I ran it with my 1gb 7z file, and I won 8s on the 1st run, and 4s on the 2nd one, almost nothing).

No the bigest source of slow down is probably the fact that fuse stops too often to call user level functions. For example there was a fuss about the "big writes" at a time, a command line parameter which is now automatically passed to fuse if you use at least fuse-2.8. Well all this does is just that it makes sure that write requests can be bigger than... 4kb !!! Actually even if you don't pass it, it's probably already enabled by default. You are back to the dark ages of computing with this 4k ! And anyway they are still cut at 128k because it's the size of the memory page used to handle requests. Quite a shame...
And I guess that on ext4 opendir/readdir/closedir is cached, and a change of the attributes doesn't stop everything to write the attribute (it's just stored and updated later).
These things are very hard to change, that's what zfs on linux aims to do, but the road is long. Not sure what we can do about it for now.

2011/4/5 Marcin Szychowski <szy...@gmail.com>

--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

Marcin Szychowski

unread,

Apr 5, 2011, 10:15:56 PM4/5/11

to zfs-fuse

I have spent some time testing another fuse filesystem, called lessfs.
It is a database-backed data-de-duplicating solution, exposed to user
as fully POSIX-compliant filesystem. I remember the 4k-block thing: it
required 'recent' kernel (2.6.28? .29?) and fuse to use bigger blocks.
At first I tried to use what I had and cope with 4k-blocks. Well... I
couldn't even sniff 1-megabyte per second limit when writing to it.
After kernel upgrade, I switched to 128k-blocks, which boosted all
operations by ratio of circa 128/4=32 times. Lessfs' internal block
size depends on fuse's max_read/max_write parameter, which is not the
case with ZFS as I understand.

There is however a big difference in philosophy of deduplication
between them. As I understand, ZFS takes a block of data, compresses
it, checks if it gained enough from compression, decides which version
(compressed or not) store, computes a checksum and decides if write
the block or write a reference (correct me if I am wrong). Different
compression method, different checksumming algorithm influences the
deduplication process/ratio.

Lessfs first computes a checksum of arriving data block, decides if it
has it or not, writes reference or compresses and stores always
compressed version (that's my guess) of the block. That's why writing
subsequent copies of e.g. isoimage is circa 4 times faster than with
the first copy, and can reach speed beyond nominal raw device
capabilities.

If the two paragraphs above are true, with ZFS/deduped one can waste
significant amounts of CPU power, right? Suppose copying file between
two zfs filesystems: file gets reasembled (de-deduplicated), possibly
decompressed, and compressed again, and - after checksuming it
(suddenly) turns out there is no need to write anything. With lessfs,
you save the compression step - file needs to be decompressed, re-
assembled and checksummed - after that step it becomes clear that
compression is needless. Now it becomes clear that one should choose
lightweight compression methods, such as lzjb/lzo. In this light what
you did for lzo and deduplication gets hard to overprice! Thank you
once again.

I'll double check that max_read/write fuse parameter. Btw. ZFS-fuse
can easily reach '1550%' of CPU usage on one 8-cored ht-enabled
system, performing better than in-kernel XFS on my laptop ;-)

--
Goodnight... err, good morning,
Marcin.

> > To visit our Web site, click onhttp://zfs-fuse.net/

Emmanuel Anne

unread,

Apr 6, 2011, 4:29:08 AM4/6/11

to zfs-...@googlegroups.com

Yes that's also what wikipedia says :
http://en.wikipedia.org/wiki/ZFS#Encryption

If it's true, then I would call that a bug, it's quite stupid to compute the checksum after compression/encryption, it should be done before that.
It should be checked first, and we should eventually change that in zfs-fuse.

2011/4/6 Marcin Szychowski <szy...@gmail.com>

To visit our Web site, click on http://zfs-fuse.net/

sgheeren

unread,

Apr 6, 2011, 5:37:23 AM4/6/11

to zfs-...@googlegroups.com

On 04/06/2011 10:29 AM, Emmanuel Anne wrote:
> Yes that's also what wikipedia says :
> http://en.wikipedia.org/wiki/ZFS#Encryption
>
> If it's true, then I would call that a bug, it's quite stupid to
> compute the checksum after compression/encryption, it should be done
> before that.
> It should be checked first, and we should eventually change that in
> zfs-fuse.
>

Always the subtle analysis guy :) You might want to think that over just
once (especially regarding the encryption bit)

...

Emmanuel Anne

unread,

Apr 6, 2011, 5:58:24 AM4/6/11

to zfs-...@googlegroups.com

Yeah I might have been a little on the edge after the 2 bugs of yesterday, but even after thinking (longer version !) :
it's probably because they wanted to share the crcs used to check the validity of data on disk, with the ones used for dedup.
Well if it means that all it takes to have a normal order is to add a new field for a crc before compression/encryption, it should be possible and is worth testing (the crc is not what takes the most time in dedup, far from it !). But it's possible it's hard to do, otherwise they would probably have done it already. Anyway it's probably worth investigating.

For encryption, it's not an issue. I know we don't have it, because mainly there are other ways to encrypt filesystems in linux. Now if one day we wanted to add some encryption, it's done the same way compression is done (from wikipedia), so it just means adding some other callbacks to zio on the same model as compression. Doesn't sound too complicated. But for now I'd say it's useless.

2011/4/6 sgheeren <sghe...@hotmail.com>

--

To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

sgheeren

unread,

Apr 6, 2011, 7:29:59 AM4/6/11

to zfs-...@googlegroups.com

On 04/06/2011 11:37 AM, sgheeren wrote:
> Always the subtle analysis guy :) You might want to think that over just
> once (especially regarding the encryption bit)
>

Ok, I'll elaborate (as for the longer version of thinking about it):

encrypted contents is not supposed to be 'known' so it cannot by
definition be deduped before encryption. It wouldn't exactly make sense
really dedup an identical block of content when it already exists in the
case of encryption, because

(a) the block will (possibly) have another encryption key
(b) it would be possible to detect the presence of a file in encrypted
space (by simply adding the same file elsewhere and seeing whether it
gets deduplicated). This would be quite the security hole in say VPS
environments... not to mention the privacy implications

Similarly, would you be willing to 'dedup' with an existing compressed
block even if the existing block used another compression algorithm than
current 'active' filesystem mandates?

Also, would you need to rewrite the dedup master blocks to be compressed
as soon as a single write is done in compressio mode? How would this
behave in the case of dedup=on (or more specific) and copies=5? Would
all copies have to have the same compression/have to be rewritten

?
?

ZFS is versatile and combines several layers of the classic storage
stack. This means that many more subtle combinations can arise,
precisely _because_ the layers are not isolated anyway. Probably
therefore, some 'counterintuitive' restrictions do apply here and there
that you would label 'stupid' on plain sight.

That said, the ecnryption scenario is exactly the same with dm-crypt; If
you ran any deduping FS om dm-crypt, obviously the encryption happens at
the lowest level, i.e. _after_ deduplication. But then, it would simply
be impossible to _also_ store blocks without encryption (or with
differen encryption) on the entire volume; I prefer the zpool way of
work by a mile or ten!

Emmanuel Anne

unread,

Apr 6, 2011, 7:49:38 AM4/6/11

to zfs-...@googlegroups.com

2011/4/6 sgheeren <sghe...@hotmail.com>

On 04/06/2011 11:37 AM, sgheeren wrote:
> Always the subtle analysis guy :) You might want to think that over just
> once (especially regarding the encryption bit)
>

Ok, I'll elaborate (as for the longer version of thinking about it):

encrypted contents is not supposed to be 'known' so it cannot by
definition be deduped before encryption. It wouldn't exactly make sense
really dedup an identical block of content when it already exists in the
case of encryption, because

(a) the block will (possibly) have another encryption key
(b) it would be possible to detect the presence of a file in encrypted
space (by simply adding the same file elsewhere and seeing whether it
gets deduplicated). This would be quite the security hole in say VPS
environments... not to mention the privacy implications

Ok for encryption... But anyway for this to work you would need to have an decrypted version of the files. In this case, encryption becomes mainly useless if you already have the decrypted version ! (except for testing the presence of a given file, it's really a very specific situation).
So all in all it's not a big risk.

Similarly, would you be willing to 'dedup' with an existing compressed
block even if the existing block used another compression algorithm than
current 'active' filesystem mandates?

Yes, because you can mix many compression algorythms, even in the same file. So if for some reason you are making some tests, or just want some part of a disk to be better compressed, you wouldn't want to loose dedup because of that.

Also, would you need to rewrite the dedup master blocks to be compressed
as soon as a single write is done in compressio mode? How would this
behave in the case of dedup=on (or more specific) and copies=5? Would
all copies have to have the same compression/have to be rewritten

?
?

Nope and nope, it's just for reading, not for writing.
The idea : it tries to write a compressed block, then notices it already exists in another form (uncompressed or compressed with another method), then it just uses dedup to the version which already exists, that's all. No rewriting needed of anything.

sgheeren

unread,

Apr 6, 2011, 8:04:51 AM4/6/11

to zfs-...@googlegroups.com

On 04/06/2011 01:49 PM, Emmanuel Anne wrote:

Ok for encryption... But anyway for this to work you would need to have an decrypted version of the files. In this case, encryption becomes mainly useless if you already have the decrypted version ! (except for testing the presence of a given file, it's really a very specific situation).
So all in all it's not a big risk.

I'm sure there are many many people who would love to disagree with you here.

If my life depended on it, I wouldn't like for people to be able to just by 'dedup-counting' be able to find out exactly which version of ssh is installed on my system... And I wouldn't exactly be thrilled to find that a malicious government was able to detect the presence of forbidden media on my encrypted volumes...

In the VPS situation, one might devise an attack to find whether your host hosts VPS-es that still have the default configuration for, say, passwd, emailserver, database server, etc. and be able to target those hosts. Or like in SQL injection (specifically timing error responses) attacks, it could be used to externally check whether an attack over another attack vector had the intended result (by checking the existence of a resulting special block on disk).

The risk in this type of attack has been published before. For reference I just drug up this quote from the IBM Tivoli whitepapers:

One drawback of data deduplication is that it cannot be used effectively with client-side encryption. For example, encrypted data does not deduplicate well (new extents are not likely to match extents stored on the server), and encrypting data after deduplication requires all users to share the same encryption key. However, you can secure data while using deduplication.

http://www.ibm.com/developerworks/wikis/display/tivolistoragemanager/Data+deduplication+in+Tivoli+Storage+Manager+V6.2+and+V6.1#DatadeduplicationinTivoliStorageManagerV6.2andV6.1-Security

HTH

sgheeren

unread,

Apr 6, 2011, 12:02:42 PM4/6/11

to zfs-...@googlegroups.com

Nope and nope, it's just for reading, not for writing.
The idea : it tries to write a compressed block, then notices it already exists in another form (uncompressed or compressed with another method), then it just uses dedup to the version which already exists, that's all. No rewriting needed of anything.

That's what I think too. It is actually 'decidable' but a lot more involved than you made it seem by stating it was 'stupid' :)
The same effects happen now once you alter the 'copies' or 'compression' properties (but: you can't change the casesensitivity from sensitive to insensitive after the fact... :)), so it would be acceptable to just continue working with the new settings from then on.

Emmanuel Anne

unread,

Apr 6, 2011, 12:40:08 PM4/6/11

to zfs-...@googlegroups.com

2011/4/6 sgheeren <sghe...@hotmail.com>

On 04/06/2011 01:49 PM, Emmanuel Anne wrote:
Ok for encryption... But anyway for this to work you would need to have an decrypted version of the files. In this case, encryption becomes mainly useless if you already have the decrypted version ! (except for testing the presence of a given file, it's really a very specific situation).
So all in all it's not a big risk.

I'm sure there are many many people who would love to disagree with you here.

If my life depended on it, I wouldn't like for people to be able to just by 'dedup-counting' be able to find out exactly which version of ssh is installed on my system... And I wouldn't exactly be thrilled to find that a malicious government was able to detect the presence of forbidden media on my encrypted volumes...

Ah ah ! This is geting out of the subject, or the usefulness of encryption here :
- for ssh - well if someone really wants to do an ssh attack against you, he won't ask your version, he will try all the known vulnerabilities. Guessing your version won't help a lot at this stage, unless it's very old, which is unlikely knowing you !
- For goverments : refuse to give your encryption key and go to jail (I don't think they killed anyone yet for that, but they could in some countries !). You have to be much more subtle than that if you want to have a chance.

In the VPS situation, one might devise an attack to find whether your host hosts VPS-es that still have the default configuration for, say, passwd, emailserver, database server, etc. and be able to target those hosts. Or like in SQL injection (specifically timing error responses) attacks, it could be used to externally check whether an attack over another attack vector had the intended result (by checking the existence of a resulting special block on disk).

The risk in this type of attack has been published before. For reference I just drug up this quote from the IBM Tivoli whitepapers:

One drawback of data deduplication is that it cannot be used effectively with client-side encryption. For example, encrypted data does not deduplicate well (new extents are not likely to match extents stored on the server), and encrypting data after deduplication requires all users to share the same encryption key. However, you can secure data while using deduplication.

http://www.ibm.com/developerworks/wikis/display/tivolistoragemanager/Data+deduplication+in+Tivoli+Storage+Manager+V6.2+and+V6.1#DatadeduplicationinTivoliStorageManagerV6.2andV6.1-Security

Encryption after deduplication forces all users to share the same encryption key ???
It must be in a different context, there is nothing to force that here.

I remind you I said ok for encryption ! You could make this an option, but it would probably be too much of a hassle, so if paranoids say it's better this way, then leave it this way and avoid dedup if you can (in most situations it can be avoided anyway !).

HTH

--
To post to this group, send email to zfs-...@googlegroups.com
To visit our Web site, click on http://zfs-fuse.net/

sgheeren

unread,

Apr 6, 2011, 1:41:17 PM4/6/11

to zfs-...@googlegroups.com

On 04/06/2011 06:40 PM, Emmanuel Anne wrote:

Encryption after deduplication forces all users to share the same encryption key ???
It must be in a different context, there is nothing to force that here.

hold that thought (A)

I remind you I said ok for encryption ! You could make this an option, but it would probably be too much of a hassle, so if paranoids say it's better this way, then leave it this way and avoid dedup if you can (in most situations it can be avoided anyway !).

Thanks for the general ack there, but I'd rather you think this over for a while til at least you understand how (A) is _not_ in a different context. This was my main point all along

Avoiding dedup is my +1 :)

Reply all

Reply to author

Forward