Compression with with both input and output as streams

37 views

Skip to first unread message

second arc

unread,

Dec 12, 2020, 5:45:40 PM12/12/20

to LZ4c

Greetings,

Earlier i have used the block compression. For example input was a block of 64K and the output would be an allocated buffer of 64K. So this was easy.

Now the same 64K could be split into multiple blocks. let says four 16K buffers. So this seems to be easy too using the streaming API.

But let's now suppose the output is also four 16K buffers instead of a single 64K buffer. How do we do this ? I think the streaming API would work and something similar to LZ4_compress_destSize_extState() is needed. LZ4_compress_destSize_extState itself would have worked except that does a LZ4_initstream().

Would this be correct (note stream is initialized once for each 64K block)

static int LZ4_compress_destSize_extState_noinit (LZ4_stream_t* state, const char* src, char* dst, int* srcSizePtr, int targetDstSize)

{

if (targetDstSize >= LZ4_compressBound(*srcSizePtr)) { /* compression success is guaranteed */

return LZ4_compress_fast_extState(state, src, dst, *srcSizePtr, targetDstSize, 1);

} else {

if (*srcSizePtr < LZ4_64Klimit) {

return LZ4_compress_generic(&state->internal_donotuse, src, dst, *srcSizePtr, srcSizePtr, targetDstSize, fillOutput, byU16, noDict, noDictIssue, 1);

} else {

tableType_t const addrMode = ((sizeof(void*)==4) && ((uptrval)src > LZ4_DISTANCE_MAX)) ? byPtr : byU32;

return LZ4_compress_generic(&state->internal_donotuse, src, dst, *srcSizePtr, srcSizePtr, targetDstSize, fillOutput, addrMode, noDict, noDictIssue, 1);

} }

}

Now we call LZ4_compress_destSize_extState_noinit() till either the input or output are exhausted. The targetDstSize will be the available space in the 16K chunk.

For example after compressing two 16K chunks and only one output16k was utilized but upto 14K. So in the first output chunk we only have a 2K targetDstSize remaining.

The next 16K input chunk is most likely not going to fit in the 2K. Lets say 4K of the input has been compressed into the remaining 2K, we now move to the next output 16K buffer

So the next call will have

srcSizePtr of 12k and the targetDstSize of 16K

I still have a lot reading to do, but does this seem to be a reasonable approach ?Also i have understand if decompression will have any issues.