Using LZ4HC algorithm in an ARM Cortex-M3 processor with limited SRAM

Mosi

unread,

Jan 5, 2017, 5:41:54 AM1/5/17

to LZ4c

I have this embedded system with a flash memory placed on the board to store a huge number of data. The main controller is an ARM Cortex-M3 processor and I'm supposed to compress the data placed on a part of flash and put the compressed data on another part of the flash.

Now since the amount of SRAM is limited in these kind of systems how exactly can I use the LZ4HC algorithm?

Obviously I can't compress the whole data at once like we do in a PC and I guess I have to do this on a little chunk of data or "block by block" (for example every 512 or 4096 bytes of data). I'm just not sure how. I couldn't totally understand the functions.

Is that even possible to do this block by block?
I couldn't find any example. And the open source code does not come with a good documentation. Actually I think there is no documentation.

Cyan

unread,

Jan 5, 2017, 10:18:21 AM1/5/17

to LZ4c

How limited is SRAM budget ?
This can make a large difference.

The smaller amount of memory spent can be achieved this way :
- Don't use the HC version, use the regular "fast" compression algorithm

- Modify LZ4_MEMORY_USAGE to a value which is compatible with your memory budget

- Cut your input into small blocks, which size will depend on your memory budget

- If you want to keep it simple, just call LZ4_compress_default() repetitively on each successive block.

flush the output before starting a new block.

For better compression ratios :

- Use double-blocks strategy.

So if you have a memory budget of ~12 KB for example, define 3 buffers :
input1 (4 KB), input2 (4 KB), and output (LZ4_compressBound(4 KB)).
Alternate input between input1 and input2.

- Use LZ4_compress_fast_continue() on successive blocks.
It's a more complex API, but it delivers in the form of higher compression ratios.

Mosi

unread,

Jan 5, 2017, 3:06:37 PM1/5/17

to LZ4c

Well the processor has totally 96KB SRAM but this is a pretty big project and most of the SRAM is used so I would say maybe there is something like 10KB more or less available. Maybe more but I can't check for sure right now cause I don't have the board.

- Does this mean I'm still better using regular fast mode than the HC mode? Cause I kind of insist on using the HC mode.

Well about the HC version I have a question first:

Does it matter to the decompression algorithm whether it's a HC version or a regular fast?

Cause in the system specification they clearly asked for HC but I'm not sure would they be able to detect if it has been compressed using HC version or NOT. Cause I thought about this too so if worse comes to worst I may be able to trick them this way. I'm aware of the fact that the output size is different in each mode but I mean beside that. So does the other side (decompression) know?

I mean is there two complete different method too decompress HC and regular fast mode?

(It has said in github:

For more compression at the cost of compression speed, the High Compression variant lz4hc is available. It's necessary to add lz4hc.c and lz4hc.h. The variant still depends on regular lz4 source files. In particular, the decompression is still provided by lz4.c. )

So I figured that LZ4_compress_default() uses LZ4_compress_fast().

- What do you mean by "flush the output before starting a new block" ?

- I'm sort of confused. When it comes to blocks Do I have to use LZ4_compress_fast() or LZ4_compress_fast_continue()? What are the differences?

- Does the continue word at the end means it's meant for blocks of data? If that is the case then why would you say use LZ4_compress_default() repetitively to keep it simple.

More important question is that if I use LZ4_compress_default() repetitively and save the output on the flash (append it to the previous generated outputs) is the decompressor still able to decompress them correctly? Because the outputs are appended to each other.

Cyan

unread,

Jan 5, 2017, 4:50:36 PM1/5/17

to LZ4c

That's many questions

> the processor has totally 96KB SRAM but this is a pretty big project and most of the SRAM is used so I would say maybe there is something like 10KB

That's really very small.

The capabilities of the algorithm will be impacted

That being said, you can make the "fast" version work in this budget.

You'll need at a minimum :

- a memory segment for compression table (LZ4_MEMORY_USAGE)

- an input buffer

- an output buffer

Try different mixes, keep the better one for your usages.

The LZ4_HC algorithm requires a lot more memory.

Though, it's kinda possible to create a "lower memory" version, this is a fairly advanced stuff,

so I wouldn't advise such direction.

One need to properly master basics before attempting more advanced topics.

> Does it matter to the decompression algorithm whether it's a HC version or a regular fast?

No

> What do you mean by "flush the output before starting a new block" ?

It's an obvious thing :

do whatever you need to do with the output block (apparently, save the result into flash)

since compressing next block will overwrite the content of output buffer.

> I figured that LZ4_compress_default() uses LZ4_compress_fast().

Yes

> I'm sort of confused. When it comes to blocks Do I have to use LZ4_compress_fast() or LZ4_compress_fast_continue()? What are the differences?

LZ4_compress_fast() deals with each block independently.

LZ4_compress_fast_continue() makes each block dependent on previous one.

This is pretty advanced topic.

If you are unfamiliar with LZ4 API, I suggest you keep it simple, and only use the first variant (LZ4_compress_default()).

> if I use LZ4_compress_default() repetitively and save the output on the flash (append it to the previous generated outputs) is the decompressor still able to decompress them correctly?

You will also need to save relevant metadata somewhere.

Typically, the compressed size, and potentially the original size (or at least a guaranteed upper bound of original size).

Mosi

unread,

Jan 6, 2017, 3:36:33 AM1/6/17

to LZ4c

That's many questions

Yeah I'm sorry. I wouldn't have asked them if there were some documents and yeah this is not my area so I apologize again.

That's really very small.
The capabilities of the algorithm will be impacted
That being said, you can make the "fast" version work in this budget.
You'll need at a minimum :
- a memory segment for compression table (LZ4_MEMORY_USAGE)
- an input buffer
- an output buffer

So basically I need something like 4KB for LZ4_MEMORY_USAGE and a 4KB for input buffer and a "LZ4_COMPRESSBOUND(InputBufferSize) KB" for output buffer.

I might not be able to allocate these buffers statically so I have to allocate them on the heap which is something I hate the most in an embedded system!

Plus what is the relationship between LZ4_MEMORY_USAGE size and input/ output buffer sizes? Would you say the bigger the LZ4_MEMORY_USAGE the better the compression ratio is?

Try different mixes, keep the better one for your usages.

Not sure better in what way.

The LZ4_HC algorithm requires a lot more memory.
Though, it's kinda possible to create a "lower memory" version, this is a fairly advanced stuff,
so I wouldn't advise such direction.
One need to properly master basics before attempting more advanced topics.

Ok. Great. So HC is off the table. Since they wouldn't know about it I think it's better this way.

LZ4_compress_fast() deals with each block independently.
LZ4_compress_fast_continue() makes each block dependent on previous one.
This is pretty advanced topic.
If you are unfamiliar with LZ4 API, I suggest you keep it simple, and only use the first variant (LZ4_compress_default()).

Yeah of course I'm not familiar with the API.

But I'm gonna assume that using LZ4_compress_fast() on each block and append the outputs together will make a bigger output at the end rather than using LZ4_compress_fast_continue() on each block. Right?

As far as the decompressor wouldn't mind that I have compressed each block independently from other blocks it's fine. The output size does not matter very much.

> if I use LZ4_compress_default() repetitively and save the output on the flash (append it to the previous generated outputs) is the decompressor still able to decompress them correctly?
You will also need to save relevant metadata somewhere.
Typically, the compressed size, and potentially the original size (or at least a guaranteed upper bound of original size).

Ok.I didn't understand this. See, to make it more clear the thing is that I want to compress each block (say 4KB) independently and save them on flash by appending them. After a while these blocks will get written directly in a file by sending them to another system using a serial protocol. Decompressor will get the whole file since the decompressor on the other system is running on a operating system without these limitations. I mean I'm NOT gonna send each compressed block individually to the decompressor. I will send the whole file. Meaning compression is block by block but decompression is not. Now would you say decompressor can decompress this file correctly?

Also where would I save those metadata your talking about on the file?

Cyan

unread,

Jan 6, 2017, 6:31:28 PM1/6/17

to LZ4c

> I will send the whole file. Meaning compression is block by block but decompression is not.

> Now would you say decompressor can decompress this file correctly?

> Also where would I save those metadata your talking about on the file?

If you want to maximize chances that the resulting file will be decompressed correctly,

the better advise is to follow the Frame Format specification :

https://github.com/lz4/lz4/blob/dev/doc/lz4_Frame_format.md#lz4-frame-format-description

This way, whatever the block size, whatever the relation between block (independent, linked),

the resulting file will be decodable by any system conformant with the specification,

including the lz4 command line utility,

and including the lz4frame API (which is more likely to be used on the server side).

lz4frame API makes the job of respecting the specification,

but it requires too much memory for your usage.

So for such a small amount of RAM, you'll have to use the "low level" compression functions, where you can precisely control memory usage.

Inserting the required metadata will be a matter of following the specification, which precisely tells which field must be present, where, which format, which limitation, etc.

Le jeudi 5 janvier 2017 11:41:54 UTC+1, Mosi a écrit :

Reply all

Reply to author

Forward