Algorithm that makes maximum compression of completly diffused data.

jonas.t...@gmail.com

unread,

Oct 30, 2013, 2:21:54 PM10/30/13

to

I am searching for the program or algorithm that makes the best possible of completly (diffused data/random noise) and wonder what the state of art compression is.

I understand this is not the correct forum but since i think i have an algorithm that can do this very good, and do not know where to turn for such question i was thinking to start here.

It is of course lossless compression i am speaking of.

Mark Lawrence

unread,

Oct 30, 2013, 2:53:59 PM10/30/13

to pytho...@python.org

I can't help with compression but I can help with a marvellous source of
the opposite, expansion, it's google groups.

--
Python is the second best programming language in the world.
But the best has yet to be invented. Christian Tismer

Mark Lawrence

jonas.t...@gmail.com

unread,

Oct 30, 2013, 3:01:34 PM10/30/13

to

And your still a stupid monkey i dare you to go test your IQ.

Mark Lawrence

unread,

Oct 30, 2013, 3:18:30 PM10/30/13

to pytho...@python.org

On 30/10/2013 19:01, jonas.t...@gmail.com wrote:
>
> And your still a stupid monkey i dare you to go test your IQ.
>

It's you're as in you are and not your as in belongs to me.

I have no intention of getting my IQ tested, but I do know that it's a
minimum of 120 as that was required for me to pass the old UK 11+
examination. Given that I spent my time at a grammar school in the top
stream I'd guess that my IQ is actually higher, but there you go.

Not that that really matters. What does is that I'm smart enough to be
able to follow a set of instructions when requested to do so, for
example I could probably follow these
https://wiki.python.org/moin/GoogleGroupsPython if I needed to. I'm
therefore assuming that you're not bright enough to follow these
instructions and so have annoyed thousands of people with your double
spaced crap, which I've again snipped.

jonas.t...@gmail.com

unread,

Oct 30, 2013, 3:22:02 PM10/30/13

to

I do not follow instructions, i make them accesible to anyone. And you just following them is a clear example to your lack of IQ.

jonas.t...@gmail.com

unread,

Oct 30, 2013, 3:23:25 PM10/30/13

to

Den onsdagen den 30:e oktober 2013 kl. 20:18:30 UTC+1 skrev Mark Lawrence:

What i actually saying is that you are indeed... an anal code monkey that never ever had a selfsustained thought of your own.

Think about it.

Dan Stromberg

unread,

Oct 30, 2013, 3:29:35 PM10/30/13

to jonas.t...@gmail.com, Python List

xz compression is pretty hard, if a little bit slow. Also, if you want really stellar compression ratios and you don't care about time to compress, you might check out one of the many paq implementations.

I have a module that does xz compression in 4 different ways:
http://stromberg.dnsalias.org/svn/backshift/tags/1.20/xz_mod.py

It's only for smallish chunks in the ctypes version, because that was all I needed. The others should be able to handle relatively large inputs.

--
https://mail.python.org/mailman/listinfo/python-list

Mark Lawrence

unread,

Oct 30, 2013, 3:31:51 PM10/30/13

to pytho...@python.org

I suggest that you reread what I wrote above, assuming that you can find
it amongst the double spaced crap. I said I would follow the
instructions if I needed to. So clearly I'm not just following them, as
I've no need to, as I'm smart enough to use a vastly superior tool.

Tim Delaney

unread,

Oct 30, 2013, 3:35:59 PM10/30/13

to jonas.t...@gmail.com, Python-List

This is not an appropriate forum for this question. If you know it's an inappropriate forum (as you stated) then do not post the question here. Do a search with your preferred search engine and look up compression on lossless Wikipedia. And read and understand the following link:

http://www.catb.org/esr/faqs/smart-questions.html

paying special attention to the following parts:

http://www.catb.org/esr/faqs/smart-questions.html#forum

http://www.catb.org/esr/faqs/smart-questions.html#prune

http://www.catb.org/esr/faqs/smart-questions.html#courtesy

http://www.catb.org/esr/faqs/smart-questions.html#keepcool

http://www.catb.org/esr/faqs/smart-questions.html#classic

If you have *python* code implementing this algorithm and want help, post the parts you want help with (and preferably post the entire algorithm in a repository).

However, having just seen the following from you in a reply to Mark ("I do not follow instructions, i make them accesible to anyone"), I am not not going to give a second chance - fail to learn from the above advice and you'll meet my spam filter.

If the data is truly completely random noise, then there is very little that lossless compression can do. On any individual truly random data set you might get a lot of compression, a small amount of compression, or even expansion, depending on what patterns have randomly occurred in the data set. But there is no current lossless compression algorithm that can take truly random data and systematically compress it to be smaller than the original.

If you think you have an algorithm that can do this on truly random data, you're probably wrong - either your data is has patterns the algorithm can exploit, or you've simply been lucky with the randomness of your data so far.

Tim Delaney

Mark Lawrence

unread,

Oct 30, 2013, 3:35:45 PM10/30/13

to pytho...@python.org

I just have to bow down to your vast superiority over me.

How is your job with your country's diplomatic corp going by the way?

Modulok

unread,

Oct 30, 2013, 3:46:57 PM10/30/13

to jonas.t...@gmail.com, Python mailing list

--
https://mail.python.org/mailman/listinfo/python-list

>> I am searching for the program or algorithm that makes the best possible of

>> completly (diffused data/random noise) and wonder what the state of art

>> compression is.

None. If the data to be compressed is truly homogeneous, random noise as you

describe (for example a 100mb file read from cryptographically secure random

bit generator such as /dev/random on *nix systems), the state-of-the-art

lossless compression is zero and will remain that way for the foreseeable

future.

There is no lossless algorithm that will reduce truly random (high entropy)

data by any significant margin. In classical information theory, such an

algorithm can never be invented. See: Kolmogorov complexity

Real world data is rarely completely random. You would have to test various

algorithms on the data set in question. Small things such as non-obvious

statistical clumping can make a big difference in the compression ratio from

one algorithm to another. Data that might look "random", might not actually be

random in the entropy sense of the word.

>> I understand this is not the correct forum but since i think i have an

>> algorithm that can do this very good, and do not know where to turn for such

>> question i was thinking to start here.

Not to sound like a downer, but I would wager that the data you're testing your

algorithm on is not as truly random as you imply or is not a large enough body

of test data to draw such conclusions from. It's akin to inventing a perpetual

motion machine or an inertial propulsion engine or any other classically

impossible solutions. (This only applies to truly random data.)

-Modulok-

jonas.t...@gmail.com

unread,

Oct 30, 2013, 3:47:09 PM10/30/13

to

No i am not wrong.
End of story

jonas.t...@gmail.com

unread,

Oct 30, 2013, 3:47:48 PM10/30/13

to

Well then i have news for you.

jonas.t...@gmail.com

unread,

Oct 30, 2013, 3:49:00 PM10/30/13

to

Den onsdagen den 30:e oktober 2013 kl. 20:46:57 UTC+1 skrev Modulok:

My algorithm will compress data from any random data source.

Antoon Pardon

unread,

Oct 30, 2013, 3:28:01 PM10/30/13

to pytho...@python.org

Op 30-10-13 20:01, jonas.t...@gmail.com schreef:

Are you sure it is this game you want to play? Please consider
carefully: For what purpose did you come to this group? Is your
behaviour accomplishing that purpose?

You were asked to take responsibility for the detrimental effect your
choice of news reader software has on this newsgroup. Your answer boiled
down to a very clear "I don't care about the detrimental effect I cause"
Well until you start caring and adapt your behaviour, people will tend
not to care about your questions/problems and the most likely responses
you will get is people pointing to your anti-social behaviour.

You may feel very righteous in your response to Mark but you will just
further alienate the regulars. If that is your goal you can continue
as you did before and soon you will be in a lot of kill file or if
you hope for some cooperation from the regulars, you'd better show
you can be cooperative too.

--
Antoon Pardon

Gene Heskett

unread,

Oct 30, 2013, 4:32:38 PM10/30/13

to pytho...@python.org

On Wednesday 30 October 2013 16:29:12 jonas.t...@gmail.com did opine:

Congratulations Jonas. My kill file for this list used to have only one
name, but now has 2. Unfortunately I will still see the backscatter of
others trying to tell you how to post and interact with this list. But for
now, this person with, since we are quoting IQ's here, a tested 147 says
good by.

Cheers, Gene
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)

BOFH excuse #275:

Bit rot
A pen in the hand of this president is far more
dangerous than 200 million guns in the hands of
law-abiding citizens.

Grant Edwards

unread,

Oct 30, 2013, 5:18:36 PM10/30/13

to

On 2013-10-30, jonas.t...@gmail.com <jonas.t...@gmail.com> wrote:

> I am searching for the program or algorithm that makes the best
> possible of completly (diffused data/random noise) and wonder what
> the state of art compression is.

[...]

> It is of course lossless compression i am speaking of.

For completely random noise, the CAT compression algorithm will
acheive the maximum theoretical result. It's been available on Unix
systems for decades via the "cat" command.

It's also trivial to implement in your own code if you desire.

--
Grant Edwards grant.b.edwards Yow! I put aside my copy
at of "BOWLING WORLD" and
gmail.com think about GUN CONTROL
legislation...

Mark Janssen

unread,

Oct 30, 2013, 5:26:04 PM10/30/13

to jonas.t...@gmail.com, Python List

On Wed, Oct 30, 2013 at 11:21 AM, <jonas.t...@gmail.com> wrote:

> I am searching for the program or algorithm that makes the best possible of completly (diffused data/random noise) and wonder what the state of art compression is.

Is this an April Fool's Joke? A key idea of "completely" random is
that you *can't* compress it.
--
MarkJ
Tacoma, Washington

Joshua Landau

unread,

Oct 30, 2013, 5:30:24 PM10/30/13

to python-list

On 30 October 2013 19:18, Mark Lawrence <bream...@yahoo.co.uk> wrote:
> On 30/10/2013 19:01, jonas.t...@gmail.com wrote:
>>

>> And your still a stupid monkey i dare you to go test your IQ.
>

> It's you're as in you are and not your as in belongs to me.
>
> I have no intention of getting my IQ tested, but I do know that it's a
> minimum of 120 as that was required for me to pass the old UK 11+
> examination. Given that I spent my time at a grammar school in the top
> stream I'd guess that my IQ is actually higher, but there you go.

What I'm confounded about is this list's inability to recognise a
troll when it slaps it vocally in the face.

This isn't like Nikos. There's no "troll vs. incompetent" debate to be
had. This guy started talking about compressing *random data* and
immediately descended into insults. Your IQ, whatever it may be, is
going to waste ;).

Mark Lawrence

unread,

Oct 30, 2013, 5:52:00 PM10/30/13

to pytho...@python.org

On 30/10/2013 21:30, Joshua Landau wrote:
> On 30 October 2013 19:18, Mark Lawrence <bream...@yahoo.co.uk> wrote:
>> On 30/10/2013 19:01, jonas.t...@gmail.com wrote:
>>>

>>> And your still a stupid monkey i dare you to go test your IQ.
>>

>> It's you're as in you are and not your as in belongs to me.
>>
>> I have no intention of getting my IQ tested, but I do know that it's a
>> minimum of 120 as that was required for me to pass the old UK 11+
>> examination. Given that I spent my time at a grammar school in the top
>> stream I'd guess that my IQ is actually higher, but there you go.
>
> What I'm confounded about is this list's inability to recognise a
> troll when it slaps it vocally in the face.
>
> This isn't like Nikos. There's no "troll vs. incompetent" debate to be
> had. This guy started talking about compressing *random data* and
> immediately descended into insults. Your IQ, whatever it may be, is
> going to waste ;).
>

In my defence I did say this earlier in the exchange "I can't help with

compression but I can help with a marvellous source of the opposite,

expansion, it's google groups." Definitely a troll though, to go in my
dream team with Xah Lee and Ilias Lazaridis amongst others and good old
rr on the subs bench :)

Tim Chase

unread,

Oct 30, 2013, 7:01:39 PM10/30/13

to pytho...@python.org

On 2013-10-30 21:30, Joshua Landau wrote:
> started talking about compressing *random data*

If it's truly random bytes, as long as you don't need *the same*
random data, you can compress it quite easily. Lossy compression is
acceptable for images, so why not random files? :-)

import os
inname = "random.txt"
namez = inname + '.rnz'
# compress the file
with open(outnamez, 'w') as f:
f.write(os.stat(inname).st_size)

# uncompress the file
with open(namez) as f:
size = int(f.read())
with open('/dev/random', 'rb') as rnd, open(inname, 'wb') as out:
for i in range(size):
out.write(rnd.read(1))

There are optimizations that can be made, and I didn't make it run
on Win32, but I leave those as exercises for the reader. That
said, this compresses *remarkably* well for large files ;-)

-tkc

Chris Angelico

unread,

Oct 30, 2013, 7:41:11 PM10/30/13

to pytho...@python.org

On Thu, Oct 31, 2013 at 10:01 AM, Tim Chase
<pytho...@tim.thechases.com> wrote:
> On 2013-10-30 21:30, Joshua Landau wrote:
>> started talking about compressing *random data*
>
> If it's truly random bytes, as long as you don't need *the same*
> random data, you can compress it quite easily. Lossy compression is
> acceptable for images, so why not random files? :-)

Maybe. But what if it's not truly random, but only pseudo-random?

# create a file full of random data
import random
seed = random.getrandbits(32)
length = random.getrandbits(16) # in four-byte units
random.seed(seed)

inname = "random.txt"
namez = inname + '.rnz'

with open(inname, "wb") as bigfile:
for _ in range(length):
bigfile.write(random.getrandbits(32).to_bytes(4,"big"))

# compress that file
with open(namez, "wb") as smallfile:
smallfile.write(seed.to_bytes(4,"big"))
smallfile.write(length.to_bytes(4,"big"))

# uncompress it
with open(namez, "rb") as f:
seed = int.from_bytes(f.read(4),"big")
length = int.from_bytes(f.read(4),"big")
random.seed(seed)
with open("out_" + inname, "wb") as bigfile:
for _ in range(length):
bigfile.write(random.getrandbits(32).to_bytes(4,"big"))

Voila! Very impressive compression ratio, and exploits the very
randomness of the data!

ChrisA

Dave Angel

unread,

Oct 30, 2013, 11:22:37 PM10/30/13

to pytho...@python.org

See http://gailly.net/05533051.html for a discussion of a patent on a
similarly ludicrous "algorithm."

Maybe you can hoodwink the patent office into granting you one as well.

--
DaveA

rusi

unread,

Oct 31, 2013, 8:54:49 AM10/31/13

to

On Thursday, October 31, 2013 3:00:24 AM UTC+5:30, Joshua Landau wrote:

> What I'm confounded about is this list's inability to recognise a
> troll when it slaps it vocally in the face.

> This isn't like Nikos. There's no "troll vs. incompetent" debate to be
> had.

Its usually called "entertainment". Something related:

http://onceuponatimeinindia.blogspot.in/2009/07/hard-drive-weight-increasing.html

Tim Roberts

unread,

Nov 2, 2013, 5:31:09 PM11/2/13

to

jonas.t...@gmail.com wrote:
>
>Well then i have news for you.

Well, then, why don't you share it?

Let me try to get you to understand WHY what you say is impossible. Let's
say you do have a function f(x) that can produce a compressed output y for
any given x, such that y is always smaller than x. If that were true, then
I could call f() recursively:
f(f(...f(f(f(f(f(x)))))...))
and eventually the result get down to a single bit. I hope it is clear
that there's no way to restore a single bit back into different source
texts.

Here's another way to look at it. If f(x) is smaller than x for every x,
that means there MUST me multiple values of x that produce the same f(x).
Do you see? If x is three bits and f(x) is two bits, that means there are
8 possible values for x but only 4 values for f(x). So, given an f(x), you
cannot tell which value of x it came from. You have lost information.
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

Mark Janssen

unread,

Nov 2, 2013, 5:37:45 PM11/2/13

to Tim Roberts, Python List

> Let me try to get you to understand WHY what you say is impossible. Let's
> say you do have a function f(x) that can produce a compressed output y for
> any given x, such that y is always smaller than x. If that were true, then
> I could call f() recursively:
> f(f(...f(f(f(f(f(x)))))...))
> and eventually the result get down to a single bit. I hope it is clear
> that there's no way to restore a single bit back into different source
> texts.

Hey, that's a nice proof!

Cheers,

Mark Janssen

Steven D'Aprano

unread,

Nov 2, 2013, 11:17:57 PM11/2/13

to

On Sat, 02 Nov 2013 14:31:09 -0700, Tim Roberts wrote:

> jonas.t...@gmail.com wrote:
>>
>>Well then i have news for you.
>
> Well, then, why don't you share it?
>
> Let me try to get you to understand WHY what you say is impossible.

[snip reasons]

Expanding on Tim's post... the first scenario Tim gives, applying the
compression repeatedly to its own output until you eventually get a
single byte, can be overcome if there are data sets that are unchanged by
compression. That is, if f() is the compression function:

f(f(f(data))) = f(f(data)) == f(data) == data

for some values of data. So if you start with some other data, and
compress it, then compress it again, and again, eventually you end up
with one of these attractors, after which repeated compression doesn't
give you any more benefit.

[Aside: the attractors aren't necessarily a single point. The attractor
could be a cycle of two or more points, f(x) == y and f(y) == x. Or you
could even have a "strange attractor", which brings us to chaos theory.]

But your second reason, better known as the pigeonhole principle,
demonstrates that for any lossless compression method, there must be data
sets that actually expand the data. It doesn't matter how cleverly you
compress the data, you can't fit 20kg of potatoes in a 10kg bag, so to
speak. Suppose your compression algorithm compresses a single byte into a
nybble (four bits). There are 256 different input data sets (0x00,
0x01, ... 0xFF) and only 16 different outputs (0x0, 0x1, ... 0xF). There
is no way for 256 pigeons to fit in 16 pigeon holes unless you put two or
more pigeons in at least one hole. Ergo, if the compression algorithm is
lossless, *some* data must be expanded rather than compressed.

Alternatively, the compression may be lossy. Or both!

Obviously data isn't compressed a byte at a time, that would be silly.
But the principle still applies.

There is a way to apparently get around these limits: store data
externally, perhaps inside the compression application itself. Then, if
you just look at the compressed file (the "data.zip" equivalent, although
I stress that zip compression is *not* like this), you might think it has
shrunk quite a lot. But when you include the hidden data, the compression
is not quite so impressive...

--
Steven

Chris Angelico

unread,

Nov 3, 2013, 12:10:30 AM11/3/13

to pytho...@python.org

On Sun, Nov 3, 2013 at 2:17 PM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> There is a way to apparently get around these limits: store data
> externally, perhaps inside the compression application itself. Then, if
> you just look at the compressed file (the "data.zip" equivalent, although
> I stress that zip compression is *not* like this), you might think it has
> shrunk quite a lot. But when you include the hidden data, the compression
> is not quite so impressive...

Storing externally is, of course, a very useful thing - it's just not
compression. For instance, a git repository (and quite likely a
Mercurial one too) keeps track of files as blobs of data referenced by
their cryptographic hashes. The current state of a directory can be
described by listing file names with their object hashes, which is a
very compact notation; but it doesn't have _all_ the information, and
the "decompression" process involves fetching file contents from the
library. It's a tool in the arsenal, but it's not compression in the
same way that PK-ZIP is.

With real compression algorithms, there's usually an "out" clause that
detects that the algorithm's doing a bad job. The file format for
PK-ZIP and, I expect, every other archiving+compression file format,
has allowances for some of the files to be stored uncompressed. That
way, there's no need for any file to be enlarged by more than some
signature - say, a one-byte "compression type" marker, or even a
one-bit mark somewhere. But enlarged they must be, for the reasons
Steven explained.

ChrisA

Ethan Furman

unread,

Nov 3, 2013, 12:26:03 AM11/3/13

to pytho...@python.org

On 10/30/2013 01:32 PM, Gene Heskett wrote:
>
> Congratulations Jonas. My kill file for this list used to have only one
> name, but now has 2.

You have more patience than I! Jonas just made mine seven. :)

--
~Ethan~

Ethan Furman

unread,

Nov 3, 2013, 12:26:44 AM11/3/13

to pytho...@python.org

On 10/30/2013 12:23 PM, jonas.t...@gmail.com wrote:
>
> What i actually saying is that you are indeed... [insult snipped]

*plonk*

Mark Janssen

unread,

Nov 3, 2013, 1:09:46 AM11/3/13

to Ethan Furman, Python List

>> Congratulations Jonas. My kill file for this list used to have only one
>> name, but now has 2.
>
> You have more patience than I! Jonas just made mine seven. :)

Gosh, don't kill the guy. It's not an obvious thing to hardly anyone
but computer scientists. It's an easy mistake to make.

--
MarkJ
Tacoma, Washington

Gene Heskett

unread,

Nov 3, 2013, 4:50:46 AM11/3/13

to pytho...@python.org

On Sunday 03 November 2013 04:40:45 Ethan Furman did opine:

> On 10/30/2013 01:32 PM, Gene Heskett wrote:

> > Congratulations Jonas. My kill file for this list used to have only
> > one name, but now has 2.
>
> You have more patience than I! Jonas just made mine seven. :)
>

> --
> ~Ethan~

Yeah, well there are a couple others in the mugwump category here yet. I
lurk here to try and learn, and baseless arguments are just noise. To be
filtered. And its working!

But it may be that this old dog has learned his last "new" trick in the
language arena too, too many "irons in the fire", and fine tuning machinery
to run the GCode I write to carve metal or wood is the primary interest
ATM. At 79yo, the short term memory needs help. I'm smart enough to
understand that, but it doesn't mean I like it. Its a right PIMA TBE.

Cheers, Gene
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)

All the evidence concerning the universe has not yet been collected,
so there's still hope.

Michael Torrie

unread,

Nov 3, 2013, 10:14:11 AM11/3/13

to pytho...@python.org

On 11/03/2013 12:09 AM, Mark Janssen wrote:
>>> Congratulations Jonas. My kill file for this list used to have only one
>>> name, but now has 2.
>>
>> You have more patience than I! Jonas just made mine seven. :)
>

> Gosh, don't kill the guy. It's not an obvious thing to hardly anyone
> but computer scientists. It's an easy mistake to make.

I don't think he's being plonked for not understanding computational
theory. He's being plonked for resorting to name calling on his second
post! If he was a smart computer scientist type, then engaging in a
discussion about the theoretical aspects of his algorithm would have
been welcomed by him, because that's what science is all about. But he
failed that early on.

Thanks to everyone in this part of the thread for turning this
ridiculous farce into a really educational discussion on the theory of
information compression. Too bad the OP has tuned out a long time ago.

Joshua Landau

unread,

Nov 3, 2013, 10:34:07 AM11/3/13

to Steven D'Aprano, python-list

On 3 November 2013 03:17, Steven D'Aprano

<steve+comp....@pearwood.info> wrote:
> On Sat, 02 Nov 2013 14:31:09 -0700, Tim Roberts wrote:
>
>> jonas.t...@gmail.com wrote:
>>>
>>>Well then i have news for you.
>>
>> Well, then, why don't you share it?
>>
>> Let me try to get you to understand WHY what you say is impossible.
> [snip reasons]
>

> But your second reason, better known as the pigeonhole principle,
> demonstrates that for any lossless compression method, there must be data
> sets that actually expand the data. It doesn't matter how cleverly you
> compress the data, you can't fit 20kg of potatoes in a 10kg bag, so to
> speak. Suppose your compression algorithm compresses a single byte into a
> nybble (four bits). There are 256 different input data sets (0x00,
> 0x01, ... 0xFF) and only 16 different outputs (0x0, 0x1, ... 0xF). There
> is no way for 256 pigeons to fit in 16 pigeon holes unless you put two or
> more pigeons in at least one hole. Ergo, if the compression algorithm is
> lossless, *some* data must be expanded rather than compressed.

You have to be careful to make this totally rigorous, too.

Note that I *can* make a "compression" algorithm that takes any
length-n sequence and compresses all but one sequence by at least one
bit, and does not ever expand the data.

"00" -> ""
"01" -> "0"
"10" -> "1"
"11" -> "00"

This, obviously, is just 'cause the length is an extra piece of data,
but sometimes you have to store that anyway ;). So if I have a list of
N length-Y lists containing only 1s or 0s, I can genuinely compress
the whole structure by N log2 Y items.

Joshua Landau

unread,

Nov 3, 2013, 10:51:09 AM11/3/13

to Steven D'Aprano, python-list

On 3 November 2013 15:34, Joshua Landau <jos...@landau.ws> wrote:
>I can genuinely compress
> the whole structure by N log2 Y items.

By which I mean 2N items.

Mark Janssen

unread,

Nov 3, 2013, 10:40:36 PM11/3/13

to Joshua Landau, python-list, Steven D'Aprano

> Note that I *can* make a "compression" algorithm that takes any
> length-n sequence and compresses all but one sequence by at least one
> bit, and does not ever expand the data.
>
> "00" -> ""
> "01" -> "0"
> "10" -> "1"
> "11" -> "00"
>
> This, obviously, is just 'cause the length is an extra piece of data,
> but sometimes you have to store that anyway ;).

And how many bits will you use to store the length?

> So if I have a list of

> N length-Y lists containing only 1s or 0s, I can genuinely compress

> the whole structure by N log2 Y items.

But you cheated by using a piece of information from "outside the
system": length. A generic compression algorithm doesn't have this
information beforehand.
--
MarkJ
Tacoma, Washington

Tim Chase

unread,

Nov 4, 2013, 8:08:03 AM11/4/13

to pytho...@python.org

On 2013-11-03 19:40, Mark Janssen wrote:
> But you cheated by using a piece of information from "outside the
> system": length. A generic compression algorithm doesn't have this
> information beforehand.

By cheating with outside information, you can perfectly compress any
one data-set down to 1 bit. Just store the data in the program, then
store 1 bit of "is this file the data we have stored in the
program?". Granted, in modern OSes, you waste 7 extra bits since
they require you to write an entire byte at a time. :-)

And with that, you could even have an empty file and test for a file
extension. ;-)

-tkc

jonas.t...@gmail.com

unread,

Nov 4, 2013, 8:53:28 AM11/4/13

to

Well let me try to explain why it is working and i have implemented one.
I only need to refresh my memory it was almost 15 years ago.
This is not the solution but this is why it is working.
65536=256^2=16^4=***4^8***=2^16

Yes i am aware that 256 is a single byte 8 bits, but the approach is valid anyway.

jonas.t...@gmail.com

unread,

Nov 4, 2013, 9:00:31 AM11/4/13

to

You see if the program behave intelligent some small operations can perform wonder on a number.

Dave Angel

unread,

Nov 4, 2013, 9:27:19 AM11/4/13

to pytho...@python.org

On Mon, 4 Nov 2013 05:53:28 -0800 (PST), jonas.t...@gmail.com
wrote:

> Den lördagen den 2:e november 2013 kl. 22:31:09 UTC+1 skrev Tim
Roberts:

> > Here's another way to look at it. If f(x) is smaller than x for
every x,

> > that means there MUST me multiple values of x that produce the
same f(x).

> > Do you see? If x is three bits and f(x) is two bits, that means
there are

> > 8 possible values for x but only 4 values for f(x). So, given an

f(x), y=

> > cannot tell which value of x it came from. You have lost
information.

> Well let me try to explain why it is working and i have implemented
one.
> I only need to refresh my memory it was almost 15 years ago.
> This is not the solution but this is why it is working.
> 65536=256^2=16^4=***4^8***=2^16

> Yes i am aware that 256 is a single byte 8 bits, but the approach

is valid =
> anyway.

And e ^ (I * pi) == -1
So what. ?

Better file that patent, before the patent office realizes the
analogy to the perpetual motion machine.

--
DaveA

rusi

unread,

Nov 4, 2013, 9:46:58 AM11/4/13

to

On Monday, November 4, 2013 7:57:19 PM UTC+5:30, Dave Angel wrote:

> On Mon, 4 Nov 2013 05:53:28 -0800 (PST), Jonas wrote:
> > Well let me try to explain why it is working and i have implemented one.
> > I only need to refresh my memory it was almost 15 years ago.
> > This is not the solution but this is why it is working.
> > 65536=256^2=16^4=***4^8***=2^16

> > Yes i am aware that 256 is a single byte 8 bits, but the approach

> > is valid anyway.

> And e ^ (I * pi) == -1
> So what. ?

> Better file that patent, before the patent office realizes the
> analogy to the perpetual motion machine.

Now I am too much of a dim-wit to comprehend all this compressified
profundification but I think I have a (dim) clue how such a patent
will be obtained:

Just make a submission in such an unreadable double (then triple,
quadruple...) spaced format that the patent office will grant it in
exasperation.

jonas.t...@gmail.com

unread,

Nov 4, 2013, 5:34:23 PM11/4/13

to

e is an approximation... and your idea is not general for any n.

Dave Angel

unread,

Nov 4, 2013, 8:29:48 PM11/4/13

to pytho...@python.org

On Mon, 4 Nov 2013 14:34:23 -0800 (PST), jonas.t...@gmail.com
wrote:

> e is an approximation... and your idea is not general for any n.

e is certainly not an approximation, and I never mentioned n.

--
DaveA

Steven D'Aprano

unread,

Nov 4, 2013, 11:33:46 PM11/4/13

to

On Mon, 04 Nov 2013 14:34:23 -0800, jonas.thornvall wrote:

> Den måndagen den 4:e november 2013 kl. 15:27:19 UTC+1 skrev Dave Angel:
>> On Mon, 4 Nov 2013 05:53:28 -0800 (PST), jonas.t...@gmail.com
>> wrote:

[...]

>> > This is not the solution but this is why it is working.
>>
>> > 65536=256^2=16^4=***4^8***=2^16

"this" being Jonas' alleged lossless compression method capable of
compressing random data.

>> > Yes i am aware that 256 is a single byte 8 bits, but the approach

>> is valid anyway.

I must say, I cannot see the connection between the fact that 256**2 ==
2**16 and compression of random data. I might as well state that I have
squared the circle, and offer as proof that 3+4 == 5+2.

>> And e ^ (I * pi) == -1
>>
>> So what. ?
>>
>>
> e is an approximation... and your idea is not general for any n.

e is not an approximation, it is a symbolic name for an exact
transcendental number which cannot be specified exactly by any finite
number of decimal places.

Your comment about "n" is irrelevant, since Euler's Identity e**(i*pi)=-1
has nothing to do with "n". But in case you actually meant "i", again you
are mistaken. i is a symbolic name for an exact number.

--
Steven

Steven D'Aprano

unread,

Nov 4, 2013, 11:36:44 PM11/4/13

to

On Tue, 05 Nov 2013 04:33:46 +0000, Steven D'Aprano wrote:

> On Mon, 04 Nov 2013 14:34:23 -0800, jonas.thornvall wrote:
>
>> Den måndagen den 4:e november 2013 kl. 15:27:19 UTC+1 skrev Dave Angel:
>>> On Mon, 4 Nov 2013 05:53:28 -0800 (PST), jonas.t...@gmail.com
>>> wrote:
> [...]
>>> > This is not the solution but this is why it is working.
>>>
>>> > 65536=256^2=16^4=***4^8***=2^16
>
> "this" being Jonas' alleged lossless compression method capable of
> compressing random data.

/facepalm

I meant "it", not "this", as in "why it (the compression method) is
working". Sorry for any confusion.

--
Steven

Tim Roberts

unread,

Nov 7, 2013, 3:05:53 AM11/7/13

to

jonas.t...@gmail.com wrote:
>
>Well let me try to explain why it is working and i have implemented one.
>I only need to refresh my memory it was almost 15 years ago.
>This is not the solution but this is why it is working.
>65536=256^2=16^4=***4^8***=2^16

All of those values are indeed the same, and yet that is completely
unrelated to compression. Did you honestly believe this was actually
explaining anything?

>Yes i am aware that 256 is a single byte 8 bits, but the approach is valid anyway.

This whole post is a most amazing collection of non sequiturs. If you
would like to describe your compression scheme, there really are people
here who would be interested in reading it (although that number gets less
and less as this thread goes on).

Mark Janssen

unread,

Nov 7, 2013, 1:59:43 PM11/7/13

to Tim Roberts, Python List

>>Well let me try to explain why it is working and i have implemented one.
>>I only need to refresh my memory it was almost 15 years ago.
>>This is not the solution but this is why it is working.
>>65536=256^2=16^4=***4^8***=2^16
>
> All of those values are indeed the same, and yet that is completely
> unrelated to compression. Did you honestly believe this was actually
> explaining anything?

I think the idea is that you could take any arbitrary input sequence,
view it as a large number, and then find what exponential equation can
produce that result. The equation becomes the "compression".

MarkJ
Tacoma, Washington

Tim Roberts

unread,

Nov 7, 2013, 2:22:45 PM11/7/13

to Mark Janssen, Python List

Mark Janssen wrote:
>>> Well let me try to explain why it is working and i have implemented one.
>>> I only need to refresh my memory it was almost 15 years ago.
>>> This is not the solution but this is why it is working.
>>> 65536=256^2=16^4=***4^8***=2^16
>>

>> I think the idea is that you could take any arbitrary input sequence,
>> view it as a large number, and then find what exponential equation can
>> produce that result. The equation becomes the "compression".

Interesting -- I hadn't noticed that. Of course, now you have the
problem of expressing 4^8 in a compact way.

Chris Angelico

unread,

Nov 7, 2013, 5:26:45 PM11/7/13

to pytho...@python.org

On Fri, Nov 8, 2013 at 5:59 AM, Mark Janssen <dreamin...@gmail.com> wrote:
> I think the idea is that you could take any arbitrary input sequence,
> view it as a large number, and then find what exponential equation can
> produce that result. The equation becomes the "compression".

Interesting idea, but I don't see how this has anything to do with
compressing arbitrary data. We already have a compact notation for
representing a large number as 2^24*x+2^16*y+2^8*z+q, so I don't see
how this will be any better, other than eliding NULs.

ChrisA

jonas.t...@gmail.com

unread,

Nov 7, 2013, 9:05:55 PM11/7/13

to

I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?

Chris Angelico

unread,

Nov 7, 2013, 9:17:36 PM11/7/13

to pytho...@python.org

On Fri, Nov 8, 2013 at 1:05 PM, <jonas.t...@gmail.com> wrote:
> I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?

I don't care how fast. I care about the laws of physics :) You can't
stuff more data into less space without losing some of it.

Also, please lose Google Groups, or check out what other people have
said about making it less obnoxious.

ChrisA

Mark Janssen

unread,

Nov 7, 2013, 9:24:44 PM11/7/13

to Chris Angelico, Python List

On Thu, Nov 7, 2013 at 6:17 PM, Chris Angelico <ros...@gmail.com> wrote:
> On Fri, Nov 8, 2013 at 1:05 PM, <jonas.t...@gmail.com> wrote:

>> I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?
>

> I don't care how fast. I care about the laws of physics :) You can't
> stuff more data into less space without losing some of it.

Technically, the universe could expand temporarily or reconfigure to
allow it; the question is who or what will have to shift out to allow
it?

--
MarkJ
Tacoma, Washington

jonas.t...@gmail.com

unread,

Nov 7, 2013, 9:25:18 PM11/7/13

to

Please, you are he obnoxious, so fuck off or go learn about reformulation of problems. Every number has an infinite number of arithmetical solutions. So every number do has a shortest arithmetical encoding. And that is not the hard part to figure out, the hard part is to find a generic arithmetic encoding.

I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8=65536.

Chris Angelico

unread,

Nov 7, 2013, 9:27:17 PM11/7/13

to pytho...@python.org

On Fri, Nov 8, 2013 at 1:24 PM, Mark Janssen <dreamin...@gmail.com> wrote:
> On Thu, Nov 7, 2013 at 6:17 PM, Chris Angelico <ros...@gmail.com> wrote:

>> On Fri, Nov 8, 2013 at 1:05 PM, <jonas.t...@gmail.com> wrote:

>>> I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?
>>

>> I don't care how fast. I care about the laws of physics :) You can't
>> stuff more data into less space without losing some of it.
>

> Technically, the universe could expand temporarily or reconfigure to
> allow it; the question is who or what will have to shift out to allow
> it?

... okay, I bow to your superior power. If you can make the very
*UNIVERSE* bow to your compression algorithm, I have to admit defeat.
:D

ChrisA

rusi

unread,

Nov 7, 2013, 9:36:51 PM11/7/13

to

On Friday, November 8, 2013 7:55:18 AM UTC+5:30, jonas wrote:
> Den fredagen den 8:e november 2013 kl. 03:17:36 UTC+1 skrev Chris Angelico:

> > On Fri, Nov 8, 2013 at 1:05 PM, jonas.thornvall wrote:
> > > I guess what matter is how fast an algorithm can encode and decode a big number, at least if you want to use it for very big sets of random data, or losless video compression?

> > I don't care how fast. I care about the laws of physics :) You can't
> > stuff more data into less space without losing some of it.
> > Also, please lose Google Groups, or check out what other people have
> > said about making it less obnoxious.
> > ChrisA

> Please, you are he obnoxious, so fuck off or go learn about
> reformulation of problems. Every number has an infinite number of
> arithmetical solutions. So every number do has a shortest
> arithmetical encoding. And that is not the hard part to figure out,
> the hard part is to find a generic arithmetic encoding.

Since we are discussing profound matters such as cosmic laws,
here's my 'all-universal' law:

When you engage with a troll you get trolled.

Chris Angelico

unread,

Nov 7, 2013, 9:36:49 PM11/7/13

to pytho...@python.org

On Fri, Nov 8, 2013 at 1:25 PM, <jonas.t...@gmail.com> wrote:
> Please, you are he obnoxious, so fuck off or go learn about reformulation of problems. Every number has an infinite number of arithmetical solutions. So every number do has a shortest arithmetical encoding. And that is not the hard part to figure out, the hard part is to find a generic arithmetic encoding.
>
> I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8=65536.

I can see that 4^8 = 65536. Now how are you going to render 65537? You
claimed that you could render *any* number efficiently. What you've
proven is that a small subset of numbers can be rendered efficiently.

Maybe what you can do is render a large number of small subsets of
numbers efficiently. (Let's say all instances of integer^integer, and
all cases of Fibonacci numbers.) You'll still have some holes in
between which you can't render as tidily, and these are the ones where
the best representation is the raw one, plus a small tag saying that
it's raw. That's how most compression algorithms work.

Also, please don't swear. When I described Google Groups posts as
"obnoxious", I was using the word in a strictly correct way, and that
is not an invitation for profanity. Of course, if you want to be seen
as *yourself* obnoxious, that's one of the easier ways to accomplish
that.

ChrisA

Mark Janssen

unread,

Nov 7, 2013, 9:43:17 PM11/7/13

to Chris Angelico, Python List

>> I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8=65536.
>

> I can see that 4^8 = 65536. Now how are you going to render 65537? You
> claimed that you could render *any* number efficiently. What you've
> proven is that a small subset of numbers can be rendered efficiently.

I think the idea would be to find the prime factorization for a given
number, which has been proven to be available (and unique) for any and
every number. Most numbers can compress given this technique. Prime
numbers, of course, would not be compressed.

--
MarkJ
Tacoma, Washington

R. Michael Weylandt <michael.weylandt@gmail.com>

unread,

Nov 7, 2013, 9:43:10 PM11/7/13

to jonas.t...@gmail.com, pytho...@python.org

Chris's point is more subtle: the typical computer will store the number 65536 in a single byte, but it will also store 4 and 8 in one byte. So if your choice is between sending "65536" and "(4,8)", you actually loose efficiency in your scheme. Don't think in decimal, but in terms of information needing transfer.

You might reply that you don't need a whole byte for 4 or 8 -- that's true. You could, e.g., just encode the fourth and eight bits of a single byte and send that: certainly gives some compression but at the cost of generality-- what would you do with 65^65? It's sort of like the mean-variance tradeoff in statistics; it's much easier to encode certain data sets (e.g. powers of two as you noted) but only if you concede your algorithm will only work for those values.

More generally, check out the work of Claud Shannon; a very accessible and worthwhile author.
> --
> https://mail.python.org/mailman/listinfo/python-list

Chris Angelico

unread,

Nov 7, 2013, 10:05:55 PM11/7/13

to pytho...@python.org

On Fri, Nov 8, 2013 at 1:43 PM, Mark Janssen <dreamin...@gmail.com> wrote:
>>> I am not sure if it is just stupidness or laziness that prevent you from seeing that 4^8=65536.
>>

>> I can see that 4^8 = 65536. Now how are you going to render 65537? You
>> claimed that you could render *any* number efficiently. What you've
>> proven is that a small subset of numbers can be rendered efficiently.
>
> I think the idea would be to find the prime factorization for a given
> number, which has been proven to be available (and unique) for any and
> every number. Most numbers can compress given this technique. Prime
> numbers, of course, would not be compressed.

Also an interesting theory.

1) How do you represent the prime factors?
2) How do you factor large numbers efficiently? Trust me, if you've
solved this one, there are a *LOT* of people who want to know. People
with money, like Visa.
3) Still doesn't deal with all possible numbers.

ChrisA

Roy Smith

unread,

Nov 7, 2013, 10:08:30 PM11/7/13

to

In article <mailman.2179.1383879...@python.org>,

Chris Angelico <ros...@gmail.com> wrote:

> 2) How do you factor large numbers efficiently? Trust me, if you've
> solved this one, there are a *LOT* of people who want to know. People
> with money, like Visa.

Not to mention the NSA.

Chris Angelico

unread,

Nov 7, 2013, 10:24:27 PM11/7/13

to pytho...@python.org

On Fri, Nov 8, 2013 at 1:43 PM, R. Michael Weylandt
<michael....@gmail.com> <michael....@gmail.com> wrote:
> Chris's point is more subtle: the typical computer will store the number 65536 in a single byte, but it will also store 4 and 8 in one byte. So if your choice is between sending "65536" and "(4,8)", you actually loose efficiency in your scheme. Don't think in decimal, but in terms of information needing transfer.

Well, 65536 won't fit in a single byte, nor even in two (only just). A
typical binary representation of 65536 would take 3 bytes, or 4 for a
common integer type: 0x00 0x01 0x00 0x00 (big-endian). So it would be
possible to represent that more compactly, if you deem one byte each
for the base and the exponent: 0x04 0x08. However, that system allows
you to represent just 62,041 possible numbers:

>>> decomp={}
>>> for base in range(256):
for exp in range(256):
decomp[base**exp]=base,exp

>>> len(decomp)
62041

The trouble is, these numbers range all the way up from 0**0 == 0 to
255**255 == uhh...
>>> 255**255
46531388344983681457769984555620005635274427815488751368772861643065273360461098097690597702647394229975161523887729348709679192202790820272357752329882392140552515610822058736740145045150003072264722464746837070302159356661765043244993104360887623976285955058200326531849137668562738184397385361179287309286327712528995820702180594566008294593820621769951491324907014215176509758404760451335847252744697820515292329680698271481385779516652518207263143889034764775414387732372812840456880885163361037485452406176311868267428358492408075197688911053603714883403374930891951109790394269793978310190141201019287109375

which would decompress to (obviously) 255 bytes. So you can represent
just over sixty thousand possible 255-byte strings in two bytes with
this system.

To generalize it, you'd need to devote a bit somewhere to saying
"There are more to add in". Let's say you do this on the exponent byte
(high bit for convenience), so you have 0x04 0x08 means 65536, and
0x04 0x88 0x01 0x01 means 65536+1 = 65537. Now we have something that
generalizes; but the efficiency is gone - and there are too many ways
to encode the same value. (Bear in mind, for instance, that 0x01 0xNN
for any NN will still just represent 1, and 0x00 0xNN will represent
0. That's wasting a lot of bits.)

The application I can see for this sort of thing is not data
compression, but puzzles. There are lots of puzzles that humans find
enjoyable that represent an entire grid with less data than it
contains - for one popular example, look at Sudoku. You have a 9x9
grid where each cell could contain any of nine digits, which means a
theoretical information density of 9**81; but the entire grid can be
described by a handful of digits and heaps of blank space. This could
be a similarly-fun mathematical puzzle: 3359232 can be represented as
B1**E1 + B2**E2, where all numbers are single-digit. Find B1, B2, E1,
and E2. In this case, you're guaranteed that the end result is shorter
(four digits), but it's hardly useful for general work.

ChrisA

R. Michael Weylandt <michael.weylandt@gmail.com>

unread,

Nov 7, 2013, 11:05:21 PM11/7/13

to Chris Angelico, pytho...@python.org

On Nov 7, 2013, at 22:24, Chris Angelico <ros...@gmail.com> wrote:

> On Fri, Nov 8, 2013 at 1:43 PM, R. Michael Weylandt
> <michael....@gmail.com> <michael....@gmail.com> wrote:
>> Chris's point is more subtle: the typical computer will store the number 65536 in a single byte, but it will also store 4 and 8 in one byte.
>

> Well, 65536 won't fit in a single byte, nor even in two (only just). A
> typical binary representation of 65536 would take 3 bytes, or 4 for a
> common integer type: 0x00 0x01 0x00 0x00 (big-endian).

Quite right -- meant C's int type, (machine word) not char. My mistake!

Michael

Chris Angelico

unread,

Nov 7, 2013, 11:06:49 PM11/7/13

to pytho...@python.org

Sure. Assuming at least 32-bit words, yes, that's correct. Of course,
this is still just proving that it's {possible, not possible} to
compress specific values, while the OP claimed to compress anything.

ChrisA

Dave Angel

unread,

Nov 7, 2013, 11:12:06 PM11/7/13

to pytho...@python.org

On Thu, 7 Nov 2013 18:43:17 -0800, Mark Janssen
<dreamin...@gmail.com> wrote:
> I think the idea would be to find the prime factorization for a
given
> number, which has been proven to be available (and unique) for any
and
> every number. Most numbers can compress given this technique.
Prime
> numbers, of course, would not be compressed.

No, very few numbers can be compressed using this technique. And
primes of course expand greatly.

--
DaveA

Steven D'Aprano

unread,

Nov 7, 2013, 11:47:46 PM11/7/13

to

On Thu, 07 Nov 2013 18:43:17 -0800, Mark Janssen wrote:

>>> I am not sure if it is just stupidness or laziness that prevent you
>>> from seeing that 4^8=65536.
>>
>> I can see that 4^8 = 65536. Now how are you going to render 65537? You
>> claimed that you could render *any* number efficiently. What you've
>> proven is that a small subset of numbers can be rendered efficiently.
>
> I think the idea would be to find the prime factorization for a given
> number, which has been proven to be available (and unique) for any and
> every number. Most numbers can compress given this technique.

Sadly, they would not.

Let's suppose you wish to compress the number 143. So you find the prime
factorization, 11*13=143. Have you compressed anything? No you have not
-- the original number, 143, fits in a single byte, while it requires
*two* bytes to deal with 11 and 13 (one byte for 11, and a second byte
for 13).

We can try a different strategy: while 143 requires a full eight bits
(one byte), both 11 and 13 require only four bits (one nybble) each. So
by hard coding our expectation that the result will be two nybbles, we
can avoid expanding the data and end up with exactly zero compression:

143 = binary 10001110 or eight bits in total
11, 13 = binary 1011 1101 or eight bits in total

But while that trick works for 143, it doesn't work for 154 (one byte,
binary 10011010) which has prime factorization 2*7*11 (three nybbles 0010
0111 1011). Even if you can somehow drop the leading zeroes, it requires
nine bits versus eight.

Suppose instead of using bytes, you use decimal digits. 11 and 13 use
four digits, while 143 uses only three. Likewise, 154 has three decimal
digits while 2*7*11 has four digits. In both cases, there is no
compression.

How about recording which prime number it is, instead of the actual value
of the prime? So 154=2*7*11 expands to the 1st, 4th and 5th prime, so
maybe we can record 1 4 5 instead of 2 7 11 and save some bits. (We'll
save more bits the larger the prime.) Of course, to reverse the
compression you need to keep a table of the prime numbers somewhere, and
that's going to be a lot bigger than the space you save...

Now, I accept that, possibly, with sufficient work we might be able to
find some cunning mechanism by which we can compress *any* integer value.
But it won't be the same mechanism for every value! For example, we might
compress (2**1000+1) using a run-length encoding of the binary bits,
while compressing 14629839399572435720381495231 as its prime
factorization (the 319th prime number, the 479th prime, the 499th prime
six times), and 10**1000 as an encoding of the decimal digits. But we
still need some sort of tag to distinguish between these different
compression schemes, and once you take into account the extra information
in the tag, there will be cases where some numbers aren't compressed at
all.

The ability to losslessly compress *everything* is sheer pseudo-
mathematics, up there with finding an exact rational value for pi, or
squaring the circle using only a compass and straight-edge. But the
ability to losslessly compress *useful data* is real, and such algorithms
operate by finding non-random patterns and redundant information in the
data.

--
Steven

Steven D'Aprano

unread,

Nov 8, 2013, 12:32:46 AM11/8/13

to

On Thu, 07 Nov 2013 18:25:18 -0800, jonas.thornvall wrote:

> Please, you are he obnoxious, so fuck off

Pot, meet kettle.

> I am not sure if it is just stupidness or laziness that prevent you from
> seeing that 4^8=65536.

And what is it that prevents you from seeing that 4**8=65536 is
irrelevant to the compression of random data?

All your talk about "reformulation of problems", "infinite number of
arithmetical solutions" and "generic arithmetic encoding" is just a smoke
screen to distract from the fact that you don't, in fact, have a
compression algorithm that can losslessly compress arbitrary random data
of any length.

I am as sure as this as I am sure you don't have a perpetual motion
machine, a method for squaring the circle using only a compass and
straight-edge, or a fool-proof system for winning the lottery every week
without any chance of failure.

--
Steven

Gregory Ewing

unread,

Nov 8, 2013, 2:09:15 AM11/8/13

to

Steven D'Aprano wrote:
> Of course, to reverse the
> compression you need to keep a table of the prime numbers somewhere,

That's not strictly necessary -- you could calculate them
as needed. It wouldn't be *fast*, but it would work...

You've got me thinking now about how viable a compression
scheme this would be, efficiency issues aside. I suppose
it would depend on things like the average density of primes
and the average number of prime factors a number has.
Any number theorists here?

--
Greg

Gregory Ewing

unread,

Nov 8, 2013, 2:16:56 AM11/8/13

to

Mark Janssen wrote:
> Technically, the universe could expand temporarily or reconfigure to
> allow it;

Oh, no! If Jonas ever succeeds in getting his algorithm to
work, the Void will expand and swallow us all!

http://en.wikipedia.org/wiki/The_Dreaming_Void

--
Greg

Chris Angelico

unread,

Nov 8, 2013, 2:21:59 AM11/8/13

to pytho...@python.org

On Fri, Nov 8, 2013 at 6:09 PM, Gregory Ewing
<greg....@canterbury.ac.nz> wrote:
> You've got me thinking now about how viable a compression
> scheme this would be, efficiency issues aside. I suppose
> it would depend on things like the average density of primes
> and the average number of prime factors a number has.
> Any number theorists here?

Start by coming up with an efficient storage representation for the
primes. Don't forget that you can't use a bitfield, because eg 98 =
7*7*2, so you somehow need to represent that the 7 comes up twice. And
you can't limit the size of your primes (eg by representing 98 as
"\x07\x07\x02").

And that's without dealing with the major issue of _finding_ prime
factors for a large number.

ChrisA

jonas.t...@gmail.com

unread,

Nov 8, 2013, 10:48:05 AM11/8/13

to

3^2-2^2=5

rusi

unread,

Nov 8, 2013, 10:57:53 AM11/8/13

to

> 3^2-2^2=5

Oh my! What a profound insight!!

Ian Kelly

unread,

Nov 8, 2013, 1:48:38 PM11/8/13

to Python

On Fri, Nov 8, 2013 at 8:48 AM, <jonas.t...@gmail.com> wrote:
> 3^2-2^2=5

How do you intend to encode 3**2 - 2**2 in such a way that it is more
compact than simply encoding 5? If you actually have an algorithm,
you should share it instead of dropping these cryptic one-line
non-explanations and leaving us guessing about the details. But I'm
starting to think that you don't actually have an algorithm at all,
whereas my initial impression was that you did and were simply
mistaken about its effectiveness.