efinix bit stream question

John Larkin

未讀,

2022年11月26日晚上11:34:292022/11/26

收件者：

We use the efinix T20 trion FPGA.

Questions about the config bit streams:

Are they always the same size, or does it depend on how much logic is
compiled? Would a simple application use less?

Are the streams very compressible? We have done some simple run-length
coding to greatly reduce the storage requirement for other FPGAs.
Configs tend to have long runs of 0's.

The T20/256 claims to need 5.4 megabits. I'd like to store the fpga
config and application code in a Raspberry Pi Pico, which has 2 MB of
onboard flash. Storing the full config would use about a third of
that, so reducing that would be useful.

Gerhard Hoffmann

未讀,

2022年11月27日凌晨4:46:522022/11/27

收件者：

Am 27.11.22 um 05:34 schrieb John Larkin:

> We use the efinix T20 trion FPGA.
>
> Questions about the config bit streams:
>
> Are they always the same size, or does it depend on how much logic is
> compiled? Would a simple application use less?

With Xilinx it would for sure. Never used efinix, but I would
consider it broken if it didn't.

> Are the streams very compressible?

I would simply test example files with zip, zcat and similar.
IIRC, there is even a flow-through decompressor.

We have done some simple run-length
> coding to greatly reduce the storage requirement for other FPGAs.
> Configs tend to have long runs of 0's.
>
> The T20/256 claims to need 5.4 megabits. I'd like to store the fpga
> config and application code in a Raspberry Pi Pico, which has 2 MB of
> onboard flash. Storing the full config would use about a third of
> that, so reducing that would be useful.

cheers, Gerhard

John Larkin

未讀,

2022年11月27日上午11:17:102022/11/27

收件者：

On Sun, 27 Nov 2022 10:46:47 +0100, Gerhard Hoffmann <dk...@arcor.de>
wrote:

I'm at home and don't have access to a compiled bitstream, and this is
a discussion group.

I'll get a T20 bit stream Monday or Tuesday and see what it looks
like. If there are many runs of 0's, compression and decompression are
very simple. Or maybe a typical stream is just shorter than the max.

I recall a Xilinx or maybe Altera stream that compressed about 3:1
with a very simple algorithm. I think I compressed runs of 0's and 1's
on that one, with a PowerBasic program.

We considered fancier dictionary-based schemes, sort of like Zip, but
they weren't worth the hassle.

John Larkin

未讀,

2022年11月27日上午11:24:192022/11/27

收件者：

I recall the conclusion that the best dictionary entry for a random
data block is itself. Zip doesn't compress random binary data files
very well.

FPGA bit streams are nonrandom in having long runs of 0's.

John Larkin

未讀,

2022年12月1日下午4:12:312022/12/1

收件者：

Here's a T20 bit stream. The length seems to be constant vs functions
coded, but there are enough runs of all 0's that it's probably worth
compressing.

https://www.dropbox.com/s/vm247lntp78jm20/Efinix_T20_bitstream.hex?dl=0

The actual config file will be binary, not hex of course.

Clifford Heath

未讀,

2022年12月1日下午4:51:062022/12/1

收件者：

Gzip compresses your 2.0MB down to 105kB. The decompressor isn't tiny,
but it's fairly small. The lz4 decompressor is tiny and still gets to
221kB. Possibly less if you RLE first. bz2 gets it to 76kB, and xz or
lzma to 72kB.

Compression is one area where it's best to rely on work done by people
who understand the theory. Some of these algorithms have a tiny
decompressor, the magic is in the compressor.

CH

Martin Brown

未讀,

2022年12月1日下午6:06:012022/12/1

收件者：

Quick scan with one of my utilities gives:

Filename : \users\martin\downloads\Efinix~1.hex
File size = 4071902
Entropy = 1.225 ( max. 5.545 )
States used = 3.40 ( max. 256 )

Zero frequency : 0-9 11-47 58-64 71-255

Most frequent bytes:
48 30 "0" 2198086
10 A ... 1357302
49 31 "1" 98740
52 34 "4" 97072
56 38 "8" 96870
50 32 "2" 94906
54 36 "6" 26994
51 33 "3" 26880
67 43 "C" 26478
57 39 "9" 25500
65 41 "A" 6820
53 35 "5" 5944

The hex file consists mostly of character "0" bytes and linefeeds.
Simple run length encoding would compact it a lot.
It seems "7","B","D","E","F" are quite rare in these files.

The raw binary file obviously won't have the linefeeds and will be only
one byte for every three in the ASCII .hex file so about 1.3M.

Back of the envelope RLE might get you a ~20x decrease in size.

The right compressor and it could be made a lot smaller.
If you put up the binary I'll scan that for byte entropy too.

--
Regards,
Martin Brown

Martin Brown

未讀,

2022年12月2日清晨7:16:022022/12/2

收件者：

Binary looks to have incredibly high redundancy and compressibility.
One of the lowest byte entropy scores I have seen in a long time.

There appear to be strong correlations of identical blocks at strides of
9, 12, 24, 36 as well as huge runs of nul bytes. The odd one of 0a.

Also a quick eyeball reveals walking ones 80,40,20,10,08,04,02,01,00
at around 107227 (stride 9).

There is an incredibly long run of 15372 nul bytes at offset 143811

RLE the nul bytes should get you most of the way there and maybe some
code to RLE the most obvious repeated sequences if you need a bit more.

--
Regards,
Martin Brown

John Larkin

未讀,

2022年12月2日上午10:22:502022/12/2

收件者：

My comment was about really random data. An FPGA bit stream certainly
has repeated patterns. One might build a N-bit structure, a multiplier
or accumulator or filter or DDS, and bit-slice blocks are very likely
repeated N times.

Maybe I can find some college kid who'd like to do a project or thesus
to find or code a minimal decomp algorithm for efinix+rasperry pi, in
exchange for some pittance.

I can imagine some dictionary-based thing where a dictionary entry is
its own first occurrence in the bit file. The decompressor is
basically scissors and a pot of glue.

>
>There appear to be strong correlations of identical blocks at strides of
>9, 12, 24, 36 as well as huge runs of nul bytes. The odd one of 0a.
>
>Also a quick eyeball reveals walking ones 80,40,20,10,08,04,02,01,00
>at around 107227 (stride 9).
>
>There is an incredibly long run of 15372 nul bytes at offset 143811
>
>RLE the nul bytes should get you most of the way there and maybe some
>code to RLE the most obvious repeated sequences if you need a bit more.

I was thinking of just compressing runs of 0's, but there could be a
few other smallish patterns that might not be horrible to stash in the
decompressor dictionary. That presents the question, are there
patterns that are common to *all* T20 bit streams?

I need a low-paid lackey.

Martin Brown

未讀,

2022年12月5日清晨6:01:402022/12/5

收件者：

I don't think an FPGA bitstream is anything remotely like random data.
The vast majority of the bytes are zeroes (70%), then bytes with 1 bit
set ~2% each, 2 bits set <0.7%. It depends how hard you are prepared to
work. Bytes with more than 3 bits set are comparatively rare.

In your example the bytes 8A, A7, BF, DB, ED all appeared just once and
the token BE did not occur at all.

In principle for this application you can afford to use insane amounts
of CPU power to encode if it makes the decoder simpler and faster. My
instinct is that it is only worth compressing enough to make room for
whatever code has to fit into the same space.

I recall way back jumping through endless hoops to fit slightly more
firmware code into 8k ROMs back in the days when 64k was a lot of ram.

> Maybe I can find some college kid who'd like to do a project or thesus
> to find or code a minimal decomp algorithm for efinix+rasperry pi, in
> exchange for some pittance.

I used to have a university sandwich student for a year and sometimes a
student over the long vacation and give them projects that were
interesting and otherwise wouldn't get done. The occasional one turned
out to be exceptionally good. The rest did an OK job. It is only worth
doing if they can finish a project that you don't have the time to do.

Usually something that involves taking a lot of raw data and looking to
see if there is anything interesting going on.

>
> I can imagine some dictionary-based thing where a dictionary entry is
> its own first occurrence in the bit file. The decompressor is
> basically scissors and a pot of glue.

Judging by the way it looks to my correlator I would expect LHA type
algorithms to do rather well on it. There is an inordinate amount of
block duplication. A few simple subs will easily get you under 250k.

>> There appear to be strong correlations of identical blocks at strides of
>> 9, 12, 24, 36 as well as huge runs of nul bytes. The odd one of 0a.
>>
>> Also a quick eyeball reveals walking ones 80,40,20,10,08,04,02,01,00
>> at around 107227 (stride 9).
>>
>> There is an incredibly long run of 15372 nul bytes at offset 143811
>>
>> RLE the nul bytes should get you most of the way there and maybe some
>> code to RLE the most obvious repeated sequences if you need a bit more.
>
> I was thinking of just compressing runs of 0's, but there could be a
> few other smallish patterns that might not be horrible to stash in the
> decompressor dictionary. That presents the question, are there
> patterns that are common to *all* T20 bit streams?
>
> I need a low-paid lackey.

What stops you from having one?
But you will get more use out of one that is paid the going rate.

--
Regards,
Martin Brown

John Larkin

未讀,

2022年12月5日上午10:28:022022/12/5

收件者：

On Mon, 5 Dec 2022 11:01:34 +0000, Martin Brown

Just kidding. We pay very well.

If we do a product line around raspberry pi, we could piggyback on the
enormous physical and people culture. I've never seen anything like
it.

https://www.raspberrypi.org/

We might sponsor 5 or 10 smart poor high school or college kids, steer
their paths a bit, give them summer projects or jobs, hire a couple of
the best when they graduate.

Pi has enormous momentum so should be around for a while.