I'm working on a hardware implementation (FPGA) of a lossless compression
algorithm for a real-time application. The data will be fed in to the
system, will then be compressed on-the-fly and then transmitted further.
The average compression ratio is 3:1, so I'm gonna use some FIFOs of a
certain size and start reading data out of the FIFO after a fixed
startup-time. The readout rate will be 1/3 of the input data rate The size
of the FIFOs is determined by the experimental variance of the mean
compression ratio. Nonetheless there are possible circumstances in which no
compression can be achieved. Since the overall system does not support
variable bitrates a faster transmission is no solution here.
So my idea was to put the question to all of you what to do in case of
uncompressibility? Any ideas?
Denkedran Joe
Given the constraints you have put on the problem, you're kinda left
with reducing the incoming data rate or dropping packets.
Best regards,
Spehro Pefhany
--
"it's the network..." "The Journey is the reward"
sp...@interlog.com Info for manufacturers: http://www.trexon.com
Embedded software/hardware/analog Info for designers: http://www.speff.com
If the compression must be lossless, and you can not increase the bit
rate, you need to allow for the buffering of the input data to grow in
size to accommodate the worst case. If you can not build a big enough
FIFO inside the FPGA, add some external memory and use it as a FIFO.
Is the hardware already designed? What are your data rates, and do you
know what the worst case compression is?
Regards,
John McCaskill
www.fastertechnology.com
Bigger buffers. Smarter algos.
--
Chuck F (cbfalconer at maineline dot net)
<http://cbfalconer.home.att.net>
Try the download section.
--
Posted via a free Usenet account from http://www.teranews.com
Ah okay, now that's why I'm here! Let's discuss some ideas of smarter algos!
>Ah okay, now that's why I'm here! Let's discuss some ideas of smarter algos!
Let's not.
Instead, start here and study for a few weeks, then get back to us:
You cannot solve your problem losslessly. You must guarantee your
image is in a state that will guarantee compressibility or your stream
will occasionally require more bandwidth than is available; you'd need
infinite FIFOs to cover worst-case situations.
You MUST have a lossy fallback OR supply enough bandwidth to
accommodate uncompressed data as a fallback. Variable bit-rate multi-
channel systems can borrow bandwidth from each other since -
statistically - all channels do not experience poor compression at
once unless they're all transmitting similarly uncompressible images.
If your video is dynamic in movement and in color for a length of
time, your stream will exceed your channel bandwidth. If you have
very large (disk based?) FIFOs, you can drop video for a short time
and pick back up when the compressed stream is better behaved and you
can receive continuous video again. You will not be able to recover
the delay that you introduced from the compression on the receive side
unless you skip some received video (which is lossy) or speed up the
playback. Can you deal with a fixed delay of seconds or minutes once
you've experienced a period of poor compression?
No lossless compression scheme can compress everything. You can only
have better compression schemes that will fail less often or present a
fallback: lossy compression or variable bit rates.
There are no finite alternatives; it's one of the basic principles of
compression. There are no smarter algos, just better compression
schemes that fail less often.
- John_H
You will need to specify your data integrity contraints.
If your real-time constraints are soft, then big buffers
will make the problem less common, but not get rid of it
entirely. If you've got hard real time constraints, then
they won't, as the data will be stale before you've had
a chance to send it. Either way, you must be prepared to
throw some data away under some situations. The question
then becomes chosing what to throw away, and how to make
sure that the recipient can also cope with the lost data.
However, if you know something about your source data that
general purpose algorithms don't know, then perhaps you
can achieve a guaranteed compression ratio from a bespoke
algorithm. What's in your data?
Phil
--
Dear aunt, let's set so double the killer delete select all.
-- Microsoft voice recognition live demonstration
If you have cast this in concrete : "The readout rate will be 1/3 of the
input data rate" and you hit any compression case above 33.33%, then
you are dead in the water : something HAS to give - either discard data,
or take longer.
You can tolerate 'errant peaks' in the data compression,
by using larger buffers, but the _average_ must remain under 33.33% over
the buffer size.
-jg
Lossless algorithms are perfect in storage systems for space saving.
But transport channel should be wide enough
for the worst case.
From my knowledge of life the robust solution in your case is the
redesigning system with wide channel at the current stage.
Don't play with thin air, force oneself now and eliminate great
troubles in the future.
Digitally yours,
Michael Tsvetkov (JPEG-lossless IP Core developer)
> It isin't right to applicate lossless algorithms to fixed-bandwidth
> systems.
Depends on the algorithm. E.g. RLE will never cause an increase in size,
what's sufficient even for use with fixed-bandwidth channels.
DoDi
> Depends on the algorithm. E.g. RLE will never cause an increase in size,
Wrong. It's trivially easy to prove that *no* lossless compression
algorithm can avoid to sometimes cause an increase in size, while still
compressing at least some inputs:
There are 2^N different inputs with N bits, and as long as one of them
is compressed by at least 1 bit, that means you have only 2^N-2 possible
outputs left to represent the remaining 2^N-1 inputs with --- impossible.
[F'up2 comp.compression, which is the only place this could possibly be
on-topic]
Even RLE can increase the size. Two of the most used methods to
incidate runs are 1) prefix byte, 2) two consequtive bytes are
the same. For both of these methods you can construct (or the
file sometimes happens to be) a file containing 1) the value
of the prefix byte 2) exactly two equal bytes.
Even if you use some other method of indicating the runs, you
can always constuct the pathological file that increases in size
after compression. This fact does not change depending on what
the compression algorithm is.
The best you can do is have one bit to select uncompressed/compressed.
followups set to comp.compression
-Pasi
--
"You're the type of guy who doesn't like to give up control."
-- Wade to Garibaldi in Babylon 5:"The Exercise of Vital Powers"
Hi,
Generally, a compression algorithm may be optimized for a type of data, or for a
"packet" of data (ie with dynamic table building huffman compression, with
different algorithms for video, sound or exe files, etc).
If you can find two (or more) complementary algorithms (or sets of fixed
parameters for a single algo) that covers the whole flows, you can dynamically
switch between them, without transmitting the parameters.
The known of the data flow may help you. Ie, if you may have to transmit an
already compressed packet, your problem may have not any solution.
Nit. Depends on the data. You have to have something to separate
a repetition count from a new value, which implies a lost char (or
more) somewhere. However, any such expansion can be expected to be
small.
Most RLEs can be caused to explode horribly, simply by
peppering the file with the value used as the escape code(s).
Packbits solves this problem, and it will never expand
more than a small fraction, as it escapes both blocks
of literals and runs.
>> Depends on the algorithm. E.g. RLE will never cause an increase in size,
>
>
> Wrong. It's trivially easy to prove that *no* lossless compression
> algorithm can avoid to sometimes cause an increase in size, while still
> compressing at least some inputs:
Right, in the RLE case I missed that at least one symbol must be
reserved for a repeated sequence. Sorry :-(
> There are 2^N different inputs with N bits, and as long as one of them
> is compressed by at least 1 bit, that means you have only 2^N-2 possible
> outputs left to represent the remaining 2^N-1 inputs with --- impossible.
That's a valid argument for the net data size. Talking about channels
and protocols, there might be an acceptable overhead for marking
compressed and uncompressed parts (packages...) of an data stream.
DoDi
Given that uncompressible data often resembles noise, you have to ask
yourself: what would be lost?
> Since the overall system does not support
> variable bitrates a faster transmission is no solution here.
>
> So my idea was to put the question to all of you what to do in case of
> uncompressibility? Any ideas?
If you can identify the estimated compression beforehand and then split
the stream into a 'hard' part and an 'easy' part, then you have a way to
retain the average.
--
Gemaakt met Opera's revolutionaire e-mailprogramma:
http://www.opera.com/mail/
*Much* more information than if the signal was highly redundant.
> > Since the overall system does not support
> > variable bitrates a faster transmission is no solution here.
> >
> > So my idea was to put the question to all of you what to do in case of
> > uncompressibility? Any ideas?
>
> If you can identify the estimated compression beforehand and then
> split the stream into a 'hard' part and an 'easy' part, then you have
> a way to retain the average.
Yeah, right. And if you juggle the bowling balls when crossing
the rope-bridge, you'll not break the bridge.
The message! Just because the message "resembles" noise does not mean
it has no information. In fact, just the opposite. Once you have a
message with no redundancy, you have a message with optimum
information content and it will appear exactly like noise.
Compression takes advantage of the portion of a message that is
predictable based on what you have seen previously in the message.
This is the content that does not look like noise. Once you take
advantage of this and recode to eliminate it, the message looks like
pure noise and is no longer compressible. But it is still a unique
message with information content that you need to convey.
> > Since the overall system does not support
> > variable bitrates a faster transmission is no solution here.
>
> > So my idea was to put the question to all of you what to do in case of
> > uncompressibility? Any ideas?
>
> If you can identify the estimated compression beforehand and then split
> the stream into a 'hard' part and an 'easy' part, then you have a way to
> retain the average.
Doesn't that require sending additional information that is part of
the message? On the average, this will add as much, if not more to
the message than you are removing...
If you are trying to compress data without loss, you can only compress
the redundant information. If the message has no redundancy, then it
is not compressible and, with *any* coding scheme, will require some
additional bandwidth than if it were not coded at all.
Think of your message as a binary number of n bits. If you want to
compress it to m bits, you can identify the 2**m most often
transmitted numbers and represent them with m bits. But the remaining
numbers can not be transmitted in m bits at all. If you want to send
those you have to have a flag that says, "do not decode this number".
Now you have to transmit all n or m bits, plus the flag bit. Since
there are 2**n-2**m messages with n+1 bits and 2**m messages with m+1
bits, I think you will find the total number of bits is not less then
just sending all messages with n bits. But if the messages in the m
bit group are much more frequent, then you can reduce your *average*
number of bits sent. If you can say you will *never* send the numbers
that aren't in the m bit group, then you can compress the message
losslessly in m bits.
If you are compressing reliably transmitted pure binary data, then you are
absolutely right. But if there is less information per datum, like in an
analog TV signal, something that resembles noise might very well be noise.
> Once you have a
> message with no redundancy, you have a message with optimum
> information content and it will appear exactly like noise.
>
> Compression takes advantage of the portion of a message that is
> predictable based on what you have seen previously in the message.
> This is the content that does not look like noise. Once you take
> advantage of this and recode to eliminate it, the message looks like
> pure noise and is no longer compressible. But it is still a unique
> message with information content that you need to convey.
>
>
>> > Since the overall system does not support
>> > variable bitrates a faster transmission is no solution here.
>>
>> > So my idea was to put the question to all of you what to do in case of
>> > uncompressibility? Any ideas?
>>
>> If you can identify the estimated compression beforehand and then split
>> the stream into a 'hard' part and an 'easy' part, then you have a way to
>> retain the average.
>
> Doesn't that require sending additional information that is part of
> the message?
Usually, yes.
> On the average, this will add as much, if not more to
> the message than you are removing...
Possibly.
> If you are trying to compress data without loss, you can only compress
> the redundant information. If the message has no redundancy, then it
> is not compressible and, with *any* coding scheme, will require some
> additional bandwidth than if it were not coded at all.
>
> Think of your message as a binary number of n bits. If you want to
> compress it to m bits, you can identify the 2**m most often
> transmitted numbers and represent them with m bits. But the remaining
> numbers can not be transmitted in m bits at all. If you want to send
> those you have to have a flag that says, "do not decode this number".
> Now you have to transmit all n or m bits, plus the flag bit. Since
> there are 2**n-2**m messages with n+1 bits and 2**m messages with m+1
> bits, I think you will find the total number of bits is not less then
> just sending all messages with n bits. But if the messages in the m
> bit group are much more frequent, then you can reduce your *average*
> number of bits sent. If you can say you will *never* send the numbers
> that aren't in the m bit group, then you can compress the message
> losslessly in m bits.
--
But noise and signal that "resembles noise" are two different things.
You can characterize noise and send a description of it. But it is
*isn't* noise you have just turned part of your signal into noise. So
to take advantage of the fact that noise can be compressed by saying
"this is noise" requires that you separate the noise from the signal.
If you could do that, why would you even transmit the noise? You
wouldn't, you would remove it.
So the only type of "noise like" signal left is the part that *is*
signal and the part that can't be separated from the signal. Since
you can't distinguish between the two, you have to transmit them both
and suffer the inability to compress them.
> > Once you have a
> > message with no redundancy, you have a message with optimum
> > information content and it will appear exactly like noise.
>
> > Compression takes advantage of the portion of a message that is
> > predictable based on what you have seen previously in the message.
> > This is the content that does not look like noise. Once you take
> > advantage of this and recode to eliminate it, the message looks like
> > pure noise and is no longer compressible. But it is still a unique
> > message with information content that you need to convey.
>
...snip...
>
> >> If you can identify the estimated compression beforehand and then split
> >> the stream into a 'hard' part and an 'easy' part, then you have a way to
> >> retain the average.
>
> > Doesn't that require sending additional information that is part of
> > the message?
>
> Usually, yes.
How can you flag the "easy" (compressible) part vs. the "hard" part
without sending more bits?
> > On the average, this will add as much, if not more to
> > the message than you are removing...
>
> Possibly.
As I describe below, compression only saves bits if your *average*
content has sufficient redundancy. So what does "possibly" mean?
The algorithm is either lossless, or it isn't. Identifying and
removing the noise is
exactly what a lossy algortihm does.
For any lossless algorithm there must be an input that produces an
output at
least the size of the input. The proof is rather simple. You just
count the number of possible
inputs (2^N for N bits) and the number of bits necessary to
distinguish that many different outputs.
(That's N bits, surprise). With less bits two inputs would code to the
same output.
To guarantee compression for a lossless algorithm therefore is only
possible if most input sequences
can't occcur. If you can guarantee that half of the input combinations
can't happen you could
guarantee to save a single bit.
Cases where that occurs is image half toning with a fixed frequency.
In that case you have a worst
case run length distribution .
Kolja Sulimma
If you could, yes. Costs put limits on the available processing power.
> So the only type of "noise like" signal left is the part that *is*
> signal and the part that can't be separated from the signal. Since
> you can't distinguish between the two, you have to transmit them both
> and suffer the inability to compress them.
>
>
>> > Once you have a
>> > message with no redundancy, you have a message with optimum
>> > information content and it will appear exactly like noise.
>>
>> > Compression takes advantage of the portion of a message that is
>> > predictable based on what you have seen previously in the message.
>> > This is the content that does not look like noise. Once you take
>> > advantage of this and recode to eliminate it, the message looks like
>> > pure noise and is no longer compressible. But it is still a unique
>> > message with information content that you need to convey.
>>
> ...snip...
>>
>> >> If you can identify the estimated compression beforehand and then
>> split
>> >> the stream into a 'hard' part and an 'easy' part, then you have a
>> way to
>> >> retain the average.
>>
>> > Doesn't that require sending additional information that is part of
>> > the message?
>>
>> Usually, yes.
>
> How can you flag the "easy" (compressible) part vs. the "hard" part
> without sending more bits?
In the context of the OP's hardware implementation, you may be able to
distribute these two streams over the available output pins without
sending extra bits.
>> > On the average, this will add as much, if not more to
>> > the message than you are removing...
>>
>> Possibly.
>
> As I describe below, compression only saves bits if your *average*
> content has sufficient redundancy. So what does "possibly" mean?
If compression saves 'a lot of' bits ands flagging needs 'a few' bits,
then it will not "add as much, if not more to the message than [I am]
removing..." Your description below only applies to certain compression
algorithms, so any conclusion derived from it may or may not apply to the
general case.
>> > If you are trying to compress data without loss, you can only compress
>> > the redundant information. If the message has no redundancy, then it
>> > is not compressible and, with *any* coding scheme, will require some
>> > additional bandwidth than if it were not coded at all.
>>
>> > Think of your message as a binary number of n bits. If you want to
>> > compress it to m bits, you can identify the 2**m most often
>> > transmitted numbers and represent them with m bits. But the remaining
>> > numbers can not be transmitted in m bits at all. If you want to send
>> > those you have to have a flag that says, "do not decode this number".
>> > Now you have to transmit all n or m bits, plus the flag bit. Since
>> > there are 2**n-2**m messages with n+1 bits and 2**m messages with m+1
>> > bits, I think you will find the total number of bits is not less then
>> > just sending all messages with n bits. But if the messages in the m
>> > bit group are much more frequent, then you can reduce your *average*
>> > number of bits sent. If you can say you will *never* send the numbers
>> > that aren't in the m bit group, then you can compress the message
>> > losslessly in m bits.
>
--
This proof shows, that for ANY lossless algorithm there is an input
that can't be
compressed. I find it rather funny that you counter that proof by the
assertion
that it only applies to certain algorithms.
For the fun of it: Would you be so kind and present a single example
of a compression
algorithm that the proof does not apply to? Could be worth a PhD if
you manage.
Kolja Sulimma
Yes. Multiple times.
> He outlined the formal prove that I was referencing to in a little
> more detail.
>
> This proof shows, that for ANY lossless algorithm there is an input
> that can't be
> compressed. I find it rather funny that you counter that proof by the
> assertion that it only applies to certain algorithms.
I didn't mean to counter the proof itself, only claims to the relation
between compression ratio and the bandwidth needed to split a stream.
> For the fun of it: Would you be so kind and present a single example
> of a compression
> algorithm that the proof does not apply to? Could be worth a PhD if
> you manage.
Nah. I'd rather waste my time on something else. :)
That is irrelevant to the conversation. No one has mentioned data
rates, processing power or time requirements. So there is no point is
raising issues that we have no information on. But my point remains.
If your algorithm can remove the true noise separated from signal that
looks like noise, then you would just toss the real noise and improve
the signal while compressing at the same time. That is my point and
from what you have said, it requires no extra processing, but is part
of your compression.
> > How can you flag the "easy" (compressible) part vs. the "hard" part
> > without sending more bits?
>
> In the context of the OP's hardware implementation, you may be able to
> distribute these two streams over the available output pins without
> sending extra bits.
If the OP has two streams, one for compressible signal and one for
uncompressible signal, then he could just send the original message
over the uncompressible channel and avoid the issue of compression
altogether.
> > As I describe below, compression only saves bits if your *average*
> > content has sufficient redundancy. So what does "possibly" mean?
>
> If compression saves 'a lot of' bits ands flagging needs 'a few' bits,
> then it will not "add as much, if not more to the message than [I am]
> removing..." Your description below only applies to certain compression
> algorithms, so any conclusion derived from it may or may not apply to the
> general case.
Compression can only save bits in the subset of signals that actually
are reducible. If you do the math you will find that if the signal is
randomly distributed, any coding scheme can not reduce the total
number of bits sent. It is only when some signal patterns are more
frequent than others that you can exploit the non-randomness of the
signal to compress it into fewer bits.
I haven't said anything about algorithms, so everything I have said on
this applies to *all* compression algorithms. My discussion below is
intended to apply to the general case. That is why I reduce the
discussion to one of compressing a large number to a smaller number.