--
You received this message because you are subscribed to the Google Groups "blosc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blosc+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blosc/023c99c6-12b1-43af-a448-482912b81390%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blosc/CAFrp1vpoXhdnhqDREZijUMeS2ii2b_F3mP%3DEkVvxFHAHWGChFQ%40mail.gmail.com.
Francesc,Thank you for your reply, and I appreciate the warning on the beta status. I have observed BLOSCLZ to be really quite fast so perhaps that shouldn't be a surprise to me. I've been enjoying using this library, along with caterva, to store multi-dimensional numeric data. I wonder if you could answer a few follow-up questions too:1) I had assumed bitshuffle on arrays with typesize=1 would take the first bit from 8 elements to fill the 1st shuffled byte, then the 2nd bit from the same 8 and so on, working on blocks of 8*typesize bytes at a time. When I tried doing this myself prior to blosc2, I see that the resulting compression rate is not as good as when I use blosc2 to bitshuffle. Can you describe what bitshuffle is actually doing, does it operate on a whole page or chunk at once? I didn't see it documented and it does seem important if I'm constructing a filter pipeline. I'm also curious what happens when bitshuffle and byteshuffle are both enabled.
2) my real data is typically random-ish signed integers with a mean of 0 and less than full-bit-depth of significant bits, this means the MSBs are often switching between 1-s and 0-s as a result of typical 2's-complement encoding, preventing any of the compression algorithms from achieving any decent compression even after a bitshuffle. Looking at the filter pipeline options I couldn't think of a way to make this data more attractive to the compressor. If I convert my input data to a different signed representation (such as protobuf's zig-zag encoding, or a sign-magnitude encoding, then the compression is successful once again (since now many of the MSB's are the same bit). I wonder if you would consider that out-of-scope for a type-unaware compression library or if since blosc2 (and caterva) seems to be targeted towards binary data if it might be acceptable to propose a new filter that would accomplish something like this.
To view this discussion on the web visit https://groups.google.com/d/msgid/blosc/CAE9Hkk%3DvS6P57e1%2BgFhOaYYyXpOfjcd3tEjY8oyY3Lw9%2BzwJQA%40mail.gmail.com.