Fast LZ4 unpacking for 68000 CPU (LZ4W)

448 views
Skip to first unread message

Stephane Dallongeville

unread,
Aug 23, 2017, 7:46:36 AM8/23/17
to LZ4c
Hi there,

I just wanted to share out my own implementation of LZ4 on 68000 CPU as i think it can be useful for people programming with this CPU :)
Myself I'm programming on the Sega Megadrive system which use a 7.67 Mhz 68000 CPU as main CPU (and a Z80 for the sound part) and i was looking for a fast decompression algo so i could use it for real time sprite data unpacking. After having investigated many codecs i found LZ4 was the best candidate for what i was looking for (very fast unpacking).

I started by porting LZ4 unpacking on the 68000 CPU, my first port attempt was in C and i was able to achieve a decompression rate of ~3 KB/frame (about 180 KB/s). As i wasn't not satisfied with the decompression speed rate I moved to assembly and after a lot of optimizations i was able to reach about 5.5 KB/frame (330 KB/s). Much better... but still not fast enough for my needs :-(
I wanted to reach about 10 KB/frame: ideally i would only require to unpack up to 5/6 KB of data per frame but i need some free CPU time for other tasks so unpacking at 10KB/frame would leave me a bit of free CPU time, that was the idea...
I quickly realized i couldn't get more than that with the classical LZ4 format so i customized it to improve the decompression speed at the trade off of a lower compression rate, then this bring me to LZ4W format which is based on LZ4 but optimized for word operation (the 68000 does word operation as fast than byte operation so it's better to stick with word when possible).

So with LZ4W i was able to achieve a much better decompression speed: an average 12KB/frame (760KB/s) varying between 10.5 up to 15KB/frame depending the dataset. I think that is a really good decompression speed for a simple 7.67 Mhz 68000 CPU.

Also here are some numbers about the compression level of LZ4W compared to LZ4 (HC mode) and UFTC (another very fast custom unpacker)  :

Tiles data:                  original=12672   LZ4HC=6622    LZ4W=6746    UFTC=11524
single BMP sprite
:           original=4798    LZ4HC=2202    LZ4W=2168    UFTC=3580
text file
:                   original=40992   LZ4HC=16088   LZ4W=27458   UFTC=44700
highly compressible tiles
:   original=346656  LZ4HC=24613   LZ4W=48202   UFTC=114932
map data
:                    original=4576    LZ4HC=1367    LZ4W=1382    UFTC=3212

As you can see the compression ratio is still really good for small dataset (which is my use case, usually my sprite data block are below 1 KB) but become worst when dataset size increase (due to the limited LZ4W search window size).

You can find a more detailed description of LZ4W here:
https://github.com/Stephane-D/SGDK/blob/master/bin/lz4w.txt

And here you can find the 68000 LZ4W unpacking code and the java LZ4W compressor code :
https://www.dropbox.com/sh/smgwbi8g6y50n60/AAC47jpBZKxPqeY_NXo703vQa?dl=0

Hope it can be useful :)

- Stephane

Cyan

unread,
Aug 23, 2017, 2:27:09 PM8/23/17
to LZ4c
It's a great story, you could build a blog entry with it.

Minor detail : Megadrive is the french name for Sega's console. In the rest of the world, it's more likely known as Genesis.

I'm curious to know which use case requires you to decompress sprites at each and every frame.
In most designs I'm aware of, these sprites are cached after decompression, and only dropped from memory when they are no longer needed.

Technicality : you probably noticed, LZ4 expresses distances and length in bytes, but tries to use "register size" access (either 32 or 64 bits depending on target) as much as possible.
It makes the code slightly more complex to handle remainings, but that part is manageable.
More critically, this design works well with CPU supporting unaligned access, but those which don't (and I guess 68000 is among them) have to downgrade to byte access.

As a fix, LZ4W design enforces every match to start and end exactly at word boundaries.
Some classes of matches become impossible to recover with this limitation (say, an identical sequence, just shifted by one byte),
but it's an acceptable trade off to get more speed.

In theory, it's possible to apply this restriction to LZ4 too : on the compression side, it would only search on even bytes, and all lengths would be even.
It would keep compatibility with LZ4, but is less efficient than a custom scheme, since one bit of every field becomes useless. 
That being said, for the usage you have in mind, compatibility is an unlikely objective.

Cyan

unread,
Aug 23, 2017, 2:34:27 PM8/23/17
to LZ4c
Seems I got it wrong for the name :
I used to think that Genesis was console's worldwide name, but after verification, it seems to be an American thing.
For the rest of the world, Megadrive is indeed correct.

Stephane Dallongeville

unread,
Aug 24, 2017, 5:26:52 AM8/24/17
to LZ4c
Haha, thanks for the technical information ! i should have give more details why i did these choices indeed ;)
Megadrive is the official name of the system, it was intended to be called Megadrive as well in US but because of some copyright issue  (name was already in use) they changed it to Genesis (but so only in US).
In fact what happen normally about the sprite is that you can indeed decompression graphics data at level loading then load these data inside the video memory and don't touch it during the level. Almost games does that for background graphics and some redundant enemies sprites when possible, but sprites having many different animation (as main player) couldn't fit in VRAM (video memory), you have to upload the new frame in real time, because of that almost time the sprites data were not compressed in ROM and directly send to VRAM through DMA. Still these sprite data eats a lot of ROM space, if you want to have big sprite with smooth animation you really need to spent a big amount of ROM in it. Allowing real time decompression could save a lot of ROM for that kind of big and smooth animation, that is something i really wanted to address. Depending your data, graphics generally compress down to 50%, it's already a nice improvement :)

About maintaining LZ4 compatibility, first that was my objective but i realized it will prevent me to achieve the best i could, also i believe it's a very specific use. If needed i can probably find back my original LZ4 unpacking implementation (hope i didn't trashed it :p) which was LZ4 compliant :)
Reply all
Reply to author
Forward
0 new messages