I don't think an FPGA bitstream is anything remotely like random data.
The vast majority of the bytes are zeroes (70%), then bytes with 1 bit
set ~2% each, 2 bits set <0.7%. It depends how hard you are prepared to
work. Bytes with more than 3 bits set are comparatively rare.
In your example the bytes 8A, A7, BF, DB, ED all appeared just once and
the token BE did not occur at all.
In principle for this application you can afford to use insane amounts
of CPU power to encode if it makes the decoder simpler and faster. My
instinct is that it is only worth compressing enough to make room for
whatever code has to fit into the same space.
I recall way back jumping through endless hoops to fit slightly more
firmware code into 8k ROMs back in the days when 64k was a lot of ram.
> Maybe I can find some college kid who'd like to do a project or thesus
> to find or code a minimal decomp algorithm for efinix+rasperry pi, in
> exchange for some pittance.
I used to have a university sandwich student for a year and sometimes a
student over the long vacation and give them projects that were
interesting and otherwise wouldn't get done. The occasional one turned
out to be exceptionally good. The rest did an OK job. It is only worth
doing if they can finish a project that you don't have the time to do.
Usually something that involves taking a lot of raw data and looking to
see if there is anything interesting going on.
>
> I can imagine some dictionary-based thing where a dictionary entry is
> its own first occurrence in the bit file. The decompressor is
> basically scissors and a pot of glue.
Judging by the way it looks to my correlator I would expect LHA type
algorithms to do rather well on it. There is an inordinate amount of
block duplication. A few simple subs will easily get you under 250k.
>> There appear to be strong correlations of identical blocks at strides of
>> 9, 12, 24, 36 as well as huge runs of nul bytes. The odd one of 0a.
>>
>> Also a quick eyeball reveals walking ones 80,40,20,10,08,04,02,01,00
>> at around 107227 (stride 9).
>>
>> There is an incredibly long run of 15372 nul bytes at offset 143811
>>
>> RLE the nul bytes should get you most of the way there and maybe some
>> code to RLE the most obvious repeated sequences if you need a bit more.
>
> I was thinking of just compressing runs of 0's, but there could be a
> few other smallish patterns that might not be horrible to stash in the
> decompressor dictionary. That presents the question, are there
> patterns that are common to *all* T20 bit streams?
>
> I need a low-paid lackey.
What stops you from having one?
But you will get more use out of one that is paid the going rate.
--
Regards,
Martin Brown