For lossless compression, the only way you can know how many times you can gain by recompressing a file is by trying. It's going to depend on the compression algorithm and the file you're compressing.
The reason that the second compression sometimes works is that a compression algorithm can't do omniscient perfect compression. There's a trade-off between the work it has to do and the time it takes to do it. Your file is being changed from all data to a combination of data about your data and the data itself.
We'll grow by one byte per iteration for a while, but it will actually get worse. One byte can only hold negative numbers to -128. We'll start growing by two bytes when the file surpasses 128 bytes in length. The growth will get still worse as the file gets bigger.
There's a headwind blowing against the compression program--the meta data. And also, for real compressors, the header tacked on to the beginning of the file. That means that eventually the file will start growing with each additional compression.
RLE is a starting point. If you want to learn more, look at LZ77 (which looks back into the file to find patterns) and LZ78 (which builds a dictionary). Compressors like zip often try multiple algorithms and use the best one.
Generally the limit is one compression. Some algorithms results in a higher compression ratio, and using a poor algorithm followed by a good algorithm will often result in improvements. But using the good algorithm in the first place is the proper thing to do.
If you have a large number of duplicate files, the zip format will zip each independently, and you can then zip the first zip file to remove duplicate zip information. Specifically, for 7 identical Excel files sized at 108kb, zipping them with 7-zip results in a 120kb archive. Zipping again results in an 18kb archive. Going past that you get diminishing returns.
Suppose we have a file N bits long, and we want to compress it losslessly, so that we can recover the original file. There are 2^N possible files N bits long, and so our compression algorithm has to change one of these files to one of 2^N possible others. However, we can't express 2^N different files in less than N bits.
This means that a compression algorithm can only compress certain files, and it actually has to lengthen some. This means that, on the average, compressing a random file can't shorten it, but might lengthen it.
Practical compression algorithms work because we don't usually use random files. Most of the files we use have some sort of structure or other properties, whether they're text or program executables or meaningful images. By using a good compression algorithm, we can dramatically shorten files of the types we normally use.
However, the compressed file is not one of those types. If the compression algorithm is good, most of the structure and redundancy have been squeezed out, and what's left looks pretty much like randomness.
No compression algorithm, as we've seen, can effectively compress a random file, and that applies to a random-looking file also. Therefore, trying to re-compress a compressed file won't shorten it significantly, and might well lengthen it some.
Corruption only happens when we're talking about lossy compression. For example, you can't necessarily recover an image precisely from a JPEG file. This means that a JPEG compressor can reliably shorten an image file, but only at the cost of not being able to recover it exactly. We're often willing to do this for images, but not for text, and particularly not executable files.
In this case, there is no stage at which corruption begins. It starts when you begin to compress it, and gets worse as you compress it more. That's why good image-processing programs let you specify how much compression you want when you make a JPEG: so you can balance quality of image against file size. You find the stopping point by considering the cost of file size (which is more important for net connections than storage, in general) versus the cost of reduced quality. There's no obvious right answer.
since there are no repeating patterns. Similarly if the pattern replacement methods converts long patterns to 3 char ones, reapplying it will have little effect, because the only remaining repeating patterns will be 3-length or shorter. Generally applying compression to a already compressed file makes it slightly bigger, because of various overheads. Applying good compression to a poorly compressed file is usually less effective than applying just the good compression.
In general, not even one. Whatever compression algorithm you use, there must always exists a file that does not get compressed at all, otherwise you could always compress repeatedly until you reach 1 byte, by your same argument.
It is a very good question. You can view to file from different point of view. Maybe you know a priori that this file contain arithmetic series.Lets view to it as datastream of "bytes", "symbols", or "samples".
One of the main concept in information theory is entropy.If you have a stream of "bytes"....Entropy of that bytes doesn't depend on values of your "bytes", or "samples"...If was defined only by frequencies with which bytes retrive different values.Maximum entropy has place to be for full random datastream.Minimum entropy, which equal to zero, has place to be for case when your "bytes" has identical value.
So the entropy is minimum number of bits per your "byte", which you need to use when writing information to the disk. Of course it is so if you use god's algorithm. Real life compression lossless heuristic algorithms are not so.
I dont understand sense of the question. You can write no bits to the disk and you will write a corrupted file to the disk with size equal to 0 bits. Of course it is corrupted, but his size is zero bits.
Here is the ultimate compression algorithm (in Python) which by repeated use will compress any string of digits down to size 0 (it's left as an exercise to the reader how to apply this to a string of bytes).
The program outputs 12 11 10 09 08 07 06 05 04 03 02 01 00 9 8 7 6 5 4 3 2 1 0 then empty string. It doesn't compress the string at each pass but it will with enough passes compress any digit string down to a zero length string. Make sure you write down how many times you send it through the compressor otherwise you won't be able to get it back.
I would like to state that the limit of compression itself hasn't really been adapted to tis fullest limit. Since each pixel or written language is in black or write outline. One could write a program that can decompile into what it was, say a book, flawlessly, but could compress the pixel pattern and words into a better system of compression. Meaning It would probably take a lot longer to compress, but as a system file gets larget gigs or terra bytes, the repeated letters of P and R and q and the black and white deviations could be compressed expotentially into a complex automated formula. THe mhcien doesn't need the data to make sense, it just can make a game making a highly compressed pattern. This in turn then allows us the humans to create a customized compression reading engine. Meaning now we have real compression power. Design an entire engine that can restore the information on the user side. The engine has its own language that is optimal, no spaces, just fillign black and white pixel boxes of the smallest set or even writing its own patternaic language. Nad thus it can at the same time for the mostoptiaml performace, give out a unique cipher or decompression formula when its down, and thus the file is optimally compressed and has a password that is unique for the engine to decompress it later. The machine can do amost limitlesset of iterations to compress the file further. Its like having a open book and putting all the written stories of humanity currently on to one A4 sheet. I don't know but it is another theory. So what happens is split volume, because the formula to decrompress would have its own size, evne the naming of the folder and or icon information has a size so one could go further to put every form of data a a string of information. hmm..
In computer science and mathematics, the term full employment theorem has been used to refer to a theorem showing that no algorithm can optimally perform a particular task done by some class of professionals. The name arises because such a theorem ensures that there is endless scope to keep discovering new techniques to improve the way at least some specific task is done. For example, the full employment theorem for compiler writers states that there is no such thing as a provably perfect size-optimizing compiler, as such a proof for the compiler would have to detect non-terminating computations and reduce them to a one-instruction infinite loop. Thus, the existence of a provably perfect size-optimizing compiler would imply a solution to the halting problem, which cannot exist, making the proof itself an undecidable problem.
One "high-profile Hollywood sound professional who wishes to remain anonymous even blamed an abundance of new technologies: "more tracks to play with, more options, therefore more expected and asked for from the sound editors... We literally have hundreds of tracks at our disposal."
Watch the movie Snatch [wikipedia.org] on DVD with the "Pikey" subtitle track on, which only shows subtitles for the character Mickey (Brad Pitt), and you'll see that sometimes even it doesn't know what he's saying.
If theaters turned the volume up any higher, the theater floor would be slick with the blood of ruptured eardrums. It's already painful. Loud does not equal intelligible, beyond a certain volume it becomes less intelligible.
I am an old man saying this.
If I watch a movie made in the 70s or earlier, I have no problem hearing the dialog. Newer movies, it's very hard to pick it out of all the background noise. Have the same problem hearing an individual speak while in a big loud crowd. Could be the 55 years of motorcycles, helicopters, and airplanes tho!