On Mon, Nov 23, 2009 at 11:57 PM, ijuma <
ism...@juma.me.uk> wrote:
> Hey Tatu,
>
> Thanks for the update.
>
> On Nov 24, 7:42 am, Tatu Saloranta <
tsalora...@gmail.com> wrote:
>> With defaults, for example, LZF compress was slightly slower than gzip, but when adjusting things, it
>> became much faster.
>
> Any suggestions for changes we should make?
I have not looked at how easy this would be to do, but if caller could
reuse byte arrays being given, this would help with small/medium sized
content (like 4k or less).
Another possibility would be to allow caller to specify hash size,
which defaults to (1 << 13 == 8k ints, i.e. 32k).
Since this needs to be allocated or cleared for each block, it is kind
of big for small content. At least reducing that to 4k would seem
sensible.
Simple benchmark could help to test variations out. And I think it
should also test gzip alternative as baseline. One could then just
give file or file(s) to test, and see how settings fare.
Perhaps with this setting one could manually find good baseline
settings; and users could also run it for their data if they care
enough?
Also: I thing I noticed was the decompression part does not actually
depend on any state in compressor object. Method could thus be made
static (or copied to LZFInputStream), to avoid any allocations
compressor does (I forget if it does those eagerly).
>> ps. One thing I still haven't figured out is whether the 4-byte magic
>> cookie (header) being used is "standard" for LZF, or just one that H2
>> project uses. For interoperability purposes it'd be good to know for
>> sure.
>
> Good point.
I will build LZF library on my linux box and see what it uses, if
anything. I hope it does use something -- main reason for H2 to use
their own marker would be if core lib did not use anything (there is a
disclaimer saying wrapper that comes with lib is not really mature).
-+ Tatu +-