Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

compression type

47 views
Skip to first unread message

mcj...@gmail.com

unread,
Jul 17, 2008, 8:24:55 AM7/17/08
to
A primitive idea maybe?

from what I understand most compression techniques try to use
redundancy elimination as their main score. So take for example repeat
occurances of the same and give token of replica instead. What would
happen say if you placed in a geometric area all of what gives a
token, and said not token but geometric location as what happens to be
it? Pretty much it maybe if the token is always the same size to be
what works right? so what happens if you say in geometric area what
out to be instead of just what repeats itself a sequence of words say
for example that occur in different order what is neighbouring
eachother where you can say in geometric area a plot course that draws
a string pattern different ways as the token, to be crude as any
example words spaced in a sphere, with a curve that intersects the
words that find themself ordered other ways as the token to say.

so you may see this in a file:

the first sentence to see
another sentence like this
a good sentence to run on about
a final sentence to finish
to see another sentence
about a good sentence

but put the word "sentence" in this sphere, and surround all the words
that make each sentence so that all you do to token the whole sentense
is to draw say a curve in the sphere through each word as what you say
is the token, like math that makes the curve and where it starts. but
also see the word "see" and "good" once because they show up in more
than one sentence, find the curve to be at the same place for those
words as the token for those setences.

isn't this no matter what the same as redundancy reduction done right?
it says at least the same doesn't it? but has yet better potention to
say nothing worse but better? in idea? since token space is allowed to
be the same for what geometry there can be to say a place? but
extended for the idea of patterns that rearrange?

doesn't this have a better theory to it altogether with what
rearranged patterns can be as what can be found common enough I
suppose.

Jim Leonard

unread,
Jul 17, 2008, 2:01:03 PM7/17/08
to
On Jul 17, 7:24 am, mcja...@gmail.com wrote:
> So take for example repeat
> occurances of the same and give token of replica instead.

This is what dictionary-based compression already does, but with a "1-
dimensional geographical location" token (match offset, match
length). If you want to represent the dictionary using 2 or 3
dimensions, that's up to you but I think you'll find the extra
housekeeping isn't worth the cost.

Phil Carmody

unread,
Jul 17, 2008, 8:07:12 PM7/17/08
to

'\n's in the stream provide the illusion of the 2nd dimension.

Phil

--
Dear aunt, let's set so double the killer delete select all.
-- Microsoft voice recognition live demonstration

mcj...@gmail.com

unread,
Jul 17, 2008, 11:52:14 PM7/17/08
to

in a geometry area say so you can connect-the-dots of words that fit
in pattern, like from one way to the other is the sentence backwards.

so placing say every word in a file into a sphere area for example
where the words apart can make sentences with curves inside the
sphere, the math specificiation of the curves being what is the file
content to say what expands.

then different in idea than redundancy reduction, because patterns in
different order are called the same with a different coordinate map to
say what it is.

Jim Leonard

unread,
Jul 18, 2008, 12:08:47 AM7/18/08
to

Yes, I know what you were trying to say in the first post. I still
think (read: I'm sure) you're going to find that the amount of
information necessary to keep track of the vectors will not be nearly
as efficient as established standards.

mcj...@gmail.com

unread,
Jul 18, 2008, 5:08:52 AM7/18/08
to
> as efficient as established standards.- Hide quoted text -
>
> - Show quoted text -

Wouldn't it be a whole different proportion of compression though
since it's not reducing redundancy, but instead saying patterns
reordered. like

the whole line
a fine line
whole fine line

would only be stored as "the whole line fine" where you connect the
words in different order to say the token. I would go so far assuming
a proportation that can't be compared to redundancy reduction with how
that can really turn out to be. and no matter what to say is the same
anyways, if tokens for single words aren't with a resolution modifier
say to make further of a pattern.


mcj...@gmail.com

unread,
Jul 18, 2008, 5:17:54 AM7/18/08
to
> say to make further of a pattern.- Hide quoted text -

>
> - Show quoted text -

forgot a....

mention that I don't think this has ever been tried. It doesn't seem
to test the idea of being a difficult algorithm that reaches far.
and over and over again, might as well be what you can try to do for
something compressed this way because it may as well land patterns by
saying patterns until you get what has no patterns in it at all, I see
that being such an easy case.

I bet random doesn't say much against patterns... the only thing there
ought to be in the highest regard of what can't be compressed is what
doesn't have any patterns at all (the way of tokens, file allocations,
and distrance stretches would say different in implementation).

like random isn't exactly (and could only be exactly) what doesn't
show any recurring pattern, completely different thinking that
redundancy reduction. that's no pattern of pattern of pattern of
pattern of pattern so on.

Thomas Richter

unread,
Jul 18, 2008, 5:30:06 AM7/18/08
to
mcj...@gmail.com wrote:

> Wouldn't it be a whole different proportion of compression though
> since it's not reducing redundancy,

Hardly "pattern reordering" is one of the basic compression techniques,
see for example the Lempel-Ziv algorithm.

Greetings,
Thomas

mcj...@gmail.com

unread,
Jul 18, 2008, 5:55:02 AM7/18/08
to
On Jul 18, 5:30 am, Thomas Richter <t...@math.tu-berlin.de> wrote:

so i mean this....

in a sphere geometry area for example, the words:

"in a while once"

now tokens are curved lines say, beginning in sphere area to draw a
line passing through each word.

so now a line curve as math figure for each token is available for any
way the words can order.

"it's not once in a while, but often that I come across an idea that
takes once to see right, and a while to see through"

it's not ^(curve through "once in a while"), but often that I com
across an idea that takes ^(curve at "once") to see right, and ^(curve
through "a while") to see through

Marco Al

unread,
Jul 18, 2008, 7:13:57 AM7/18/08
to
mcj...@gmail.com wrote:

> A primitive idea maybe?
>
> from what I understand most compression techniques try to use
> redundancy elimination as their main score. So take for example repeat
> occurances of the same and give token of replica instead. What would
> happen say if you placed in a geometric area all of what gives a
> token

I can't decide whether you are trying to describe a geometrical
equivalent of a HMM or whether your text is being generated by a HMM ...

Marco

Willem

unread,
Jul 18, 2008, 12:50:51 PM7/18/08
to
mcj...@gmail.com wrote:
) Wouldn't it be a whole different proportion of compression though
) since it's not reducing redundancy, but instead saying patterns
) reordered. like

Compression *is* reducing redundancy. The two are equivalent.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

mcj...@gmail.com

unread,
Jul 19, 2008, 4:03:35 PM7/19/08
to
On Jul 18, 12:50 pm, Willem <wil...@stack.nl> wrote:

why can't compression be reducing reorganized patterns?

so saying this once...

"hot dog vendor meal"

in a sphere ok, placed like one word in the middle and other words
around it.

now a token to say "dog meal" is in sphere curve line that connects
the words dog and meal, but misses any other words.
now a small token says "hot dog", "dog vendor", "hot meal", "dog
meal", "meal dog", "hot dog meal",

so that's a token that is mathematical remark on curve in sphere.

that's not redundancy reduction, but i don't see it not being the idea
of compression somehow.

Willem

unread,
Jul 19, 2008, 4:10:41 PM7/19/08
to
mcj...@gmail.com wrote:
) On Jul 18, 12:50 pm, Willem <wil...@stack.nl> wrote:
)> Compression *is* reducing redundancy.  The two are equivalent.
)
) why can't compression be reducing reorganized patterns?

Suppose you have a 100 byte file, and you have a compressor that
reduces that file to 500 bytes. That means that you are able to
represent the information in the original file with 500 bytes.

In other words: The original file represents some information with
1000 bytes, but that information is such that it can be represented
in 500 bytes. So, the other 500 bytes in that file must be redundant.

Conclusion: Compression *is* reducing redundancy.

mcj...@gmail.com

unread,
Jul 19, 2008, 4:16:09 PM7/19/08
to
On Jul 17, 8:24 am, mcja...@gmail.com wrote:

if I'm right the idea should have a different proportion to what
redundancy reduction yields. it's not being the same basis at all.


a few strings that occur like:

L1/Rule1/Entry1
L2/Rule1/Entry2
L3/Rule4/Entry1
L1/Rule2/Entry2
L2/Rule3/Entry1

has it only so this is kept in geometry area:

L1/ Rule1/ Entry1 L2/ L3/ Rule2/ Rule3/ Rule4/ Entry2

So a token says each line.


mcj...@gmail.com

unread,
Jul 19, 2008, 4:38:55 PM7/19/08
to
> So a token says each line.- Hide quoted text -

>
> - Show quoted text -

Can't this be seen as a different proportion though than anything said
like to work like redundancy reduction? because in geometry area you
find driving instructions to put an ordering together as they fit
together other ways too when the same is stored but another token is
used.

because redundancy reduction is saying less for the same that shows
itself often as how you can say once said what appears more often, but
say token for each time it's found. so say a pattern another way
though, but think like this for how it works....

so geometry fills with parts of words, or whatever, where say to start
at a place in the middle is to build around how from middle to
somewhere forms a pattern, but not just from middle from otherside too
maybe. like the word coyote in a sphere has around it:

ugly pretty stunning
coyote
den run dance
movie
good bad best
speed

now token for curve start at ugly, go through coyote, and end up at
dance.

another curve start at ugly, end up at den.

another curve start at coyote, go through ugly, end up at movie

now many curves to make up sentences with a small token for each.

Industrial One

unread,
Jul 20, 2008, 12:04:25 AM7/20/08
to
On Jul 19, 2:03 pm, mcja...@gmail.com wrote:
>
> why can't compression be reducing reorganized patterns?

Any kind of pattern, direct or indirect is a redundancy. If you build
a compressor based on "reorganizing patterns" it would require
exhaustive processing and not compress significantly more than already-
existing LZW techniques. I thought like you once, when I noticed that
re-arranging paragraphs/sentences in many usenet posts would make it
way more redundant (cuz of all the quotes) but when I tested the idea
(by hand) on one thread and compared it to RAR, the gain was roughly
1.5%.

mcj...@gmail.com

unread,
Jul 20, 2008, 12:34:05 AM7/20/08
to
On Jul 20, 12:04 am, Industrial One <industrial_...@hotmail.com>
wrote:

I think it misses the point of redundancy altogether though... it's
calling abhcgdef dhbfcega dbfegcha acgdfbhe a curve about the same
size i guess if you say those letters in a sphere and make math that
draws a curve work through the letters as the token.

where those letters once in the sphere like abcdefgh is with a curve
line drawing across the letters for each way they are.. i see no
repeat occurances of a string occur there too good, but i see abcdefgh
once and a curve line in math like a few characters might be to be
each way those letters reorganize.
a very differerent proportion to see carefully.

i think if i were to try an algorithm to build this in a sphere i'd
just find the pattern maybe somewhat already there and carry on with
where the curve takes it for anything more to add that gets another
token... but given the idea of a sphere and curve just being silly at
it, it's really for another way altogether to say it better.

but it has a fundementally different idea than reducing redundancy...
i don't think that has to be the only way of compression working.

it's finding a pattern reorganized, so nothing to do with reducing
redundancy at all.

see.. build in a sphere area the words or string parts but see how
the others built there too find another curve to be mostly anything
else needing a token sometimes?

it's like so with not being redundancy reduction at all. it's finding
a reorganization of a pattern say words in a sentence with any way to
be arranged being what says once to storing the words, and a token
many ways to say the words organized another way. it has no
equilivency to the idea of redundancy reduction at all if you think
what proportional difference this can be. for example how effective it
can be has nothing to do with how much redundancy there is. it's
effective on account of how you can find a pattern part of another
pattern to be like saying what's in both patterns once, but tokens
that say the pattern every which way. and most importantly to figure
what a proportion it has, it's to find anything that's not even part
of the pattern except some to be where a curve says that but the rest
as what's already there.

see how that could of been compressed if every word showed up only
once in the sphere, but all to say about how it writes is curves that
draw through the sphere connecting the words... so just say math curve
designations in sphere area.it's not just finding the longest strings
in common to be a token, it's finding reorganized words. see it may
seem the same to say once a word and token it, but it's not to say how
a sentence is backwards and forwards as what has the setence said once
but a different token. even though that's a few tokens right instead?
that's not a proportion that compares to the idea of how much
backwards what can be forwards the other way, but even not that way
but in the middle to be what has after it some of another pattern
already made for and before some of even another. like where you put
them in the sphere maybe?

it's another proportion though besides what redundancy reduction can
even achieve in idea... because it's not working with that limit of
seeing a pattern reorganized another way. see... it's putting words
before and after the other way, but them to say other sentences any
way that has those words is to be only the part of the sentence that
isn't those words before and after where in the sphere you draw a
curve that connects beginning, the middle part, then the end part as
what.

Jim Leonard

unread,
Jul 20, 2008, 12:46:33 AM7/20/08
to
On Jul 19, 3:38 pm, mcja...@gmail.com wrote:
> now many curves to make up sentences with a small token for each.

It doesn't matter how you're representing the relationship between the
words, it's all the same thing.

Typical LZ77 compression already does what you're describing. The
"curves" are a series of codes that describe where in the dictionary
the next "words" come from.

Your idea is not new, other than the fact that it would take up more
data than necessary to "point" to other words than existing methods.

Willem

unread,
Jul 20, 2008, 4:11:58 AM7/20/08
to
mcj...@gmail.com wrote:
) but it has a fundementally different idea than reducing redundancy...
) i don't think that has to be the only way of compression working.

Did you miss the post where I explained quite clearly that this is
a fundamentally flawed way of thinking ?

If you compress a file, then there *must have been* redundant information
in that file.

mcj...@gmail.com

unread,
Jul 20, 2008, 2:24:42 PM7/20/08
to
On Jul 20, 4:11 am, Willem <wil...@stack.nl> wrote:

what would happen if this though...


say a string of text another way like a connected line pattern which
simply comes out ot be a different way for whatever the text is, and
have a math that takes the idea of how it's a connected line pattern
and find how to transform the line pattern it is into another line
pattern... but say you find another string of text where you can find
a line pattern for it that has a mathematical transform from the first
line pattern... like saying a curved line, but a transform to the
curve line instead of another curved line.. which should be in cases
less information to say another curved line. So say a curved line,
math that from that curved line is another curved line, and say that
instead of another curved line.. so a curved line and math to
transform it into another curved line, as less than both curved lines
together, as how you say two curved lines.

isn't that far from redundant information? at least in idea of repeat
instances of same.

i betcha there must be a 50/50 odd to saying the other curve as
transformed from the first in a way that's bigger or smaller than the
other curve. I bet ultimately.

and what a proportion that would be too... I bet 50/50 about bigger or
smaller has to be it... I bet with how much it takes to say a curve,
and how much to say a curve as a curve already but different, instead
of saying another curve.. i bet 50/50 for what can be said to see it
as a curve changed to be another curve has to be true for the size of
the math that comes out to be less information than the whole curve.

so to say text translated to the idea of a curve for the text, then
store curve as text but say more text as curve transforming math, as
what is less than another curve for more text.... that should work as
a type of compression if it were so possible, but in idea should be
somehow. different in idea than lessen repeat occurances.

not a curve like in other example though.. but I think that idea tries
to have a proportion that whims the same idea. especially how over and
over to compress should work because nothing should say devoid of
patterns in what you lay out as how you saw it before.

Willem

unread,
Jul 20, 2008, 3:59:52 PM7/20/08
to
mcj...@gmail.com wrote:
) what would happen if this though...

If you ignore fundamental principles and simple arguments,
then you will either get laughed at or get ignored.

mcj...@gmail.com

unread,
Jul 20, 2008, 5:35:07 PM7/20/08
to
On Jul 20, 3:59 pm, Willem <wil...@stack.nl> wrote:

What am I ignoring that's fundamental?

I'm taking the understanding into account that compression works with
the idea of reducing redundancy...
so far the only idea I think is repeat occurances that can be said
once and explained more often right?

what a way to achieve the reduction of information... but I wouldn't
say the only way, it's just said so the way to be about it.

I was trying to not be far from an idea that says differerent, would
work in idea of thinking about it, and has something else to it when
it comes to what proportion can be achieved in how much information
can be reduced.

now think of a string of text... find the string of text said another
way as a shape somehow, where every word there can be would draw a
different shape. ok ?
now find another string of text, find the shape for it, now find math
that transforms one shape to another, find that in some cases the math
to transform the first shape into the second shape is smaller than the
second shape itself... so say this now, hold the first shape and the
math to transform the shape as the information...
so now in idea it's compression not working for the idea of repeat
occurances, but for how a shape is math transform in size bigger or
smaller than another shape.
so not like there's any math for the idea, or how even any example
tries to fare, it's just the idea of how it's working to achieve
compression.

see how that's completely different than finding repeat occurances of
even the same string?
see how it doesn't even depend on how many repeat occurances can be
found?

so in simplicity of the same proportion I think this idea of
compression would work, I don't get stuck thinking of it anyway...

like... say for everytime abc cba or bca is found as part of the file,
you say coordinate in area and a curve where you start at the first
letter and the curve follows through across each letter. so now only
the letters "abc" are in the geometry area, but a token that says the
letters rearranged any way.

so that achieves another way besides repeat occurances of the same
string.

I think files say alot better about rearranged patterns than repeat
occurances.. and _no matter what_ it's doing exactly redundancy
reductiotion the same as repeat occurances is too, it definitely says
that _at least_, but could only be better.

This is being different than redundant information if that only says
repeat occurances, is it not ?

mcj...@gmail.com

unread,
Jul 20, 2008, 5:54:29 PM7/20/08
to

I would think of it working like....

say first of all none of the file for real mixed with tokens, but the
idea like this....
put all of the file in a geometric area, where parts are further or
closer apart.
keep putting it in the geometric area where like if "had been here" is
already in there, it might be broken apart as words or together maybe?
but now putting "here already" in is what, so put the word "already"
like near the word "here".

so now for "had been here" and "here already" you only keep "had been
here already", because the word "here" already found but not like a
repeat occurance, but like a pattern to find another way.

so it should be like "gunsmith", "muts", "record", "buns", "thrill"
has it so there's maybe in geometric area

g uns mi th muts record b rill

and then for those words a token that has a plot coordinate map said
shorter, like just a curved line to connect the parts in an ordering.

see how this can achieve better? it has no limit the same as finding
repeat occurances this way.

mcj...@gmail.com

unread,
Jul 20, 2008, 6:12:23 PM7/20/08
to
> repeat occurances this way.- Hide quoted text -

>
> - Show quoted text -

I think this idea could really go over well...

It doesn't seem to follow the same thinking as how random data is hard
to compress with how repeat occurances won't frequent enough to call
it any benefit....
seems like random can mostly have small strings rearranged as common
enough... to call that stored information once though and a token each
time...

and i think even once compressed in what you say is a geometric area
and tokens, you can find that to even be patterns like you can say
again.

mcj...@gmail.com

unread,
Jul 20, 2008, 6:27:40 PM7/20/08
to
> repeat occurances this way.- Hide quoted text -

>
> - Show quoted text -

also, I'd like to add... to think about maybe?

I'd like to find the theoretical compression limit to be said a better
way maybe...

isn't it always possible to write a small software program, that when
run, has software runtime of generating a greater amount of
information?
so not said for any example that could work, but can't it always be a
smaller software program that runs to output more information?

can't a small program be like what goes through a loop of transforming
a string, inside another loop, inside another loop, inside another
loop... like all loops transform the string in some complicated math
bizarre that ends up being the output? like be a few strings being
transformed where the loop is a run-on series of string transform the
way it's the output maybe?

so can't any file like it's compressed just be a small program that
runs how it outputs the file if it runs a way to compute for what
output is?

so can't compression in idea be what goes the way of being the
smallest possible program that can run to generate output? if for
example it's the smallest software program that can run to generate
output? nothing to say that can work, but in idea... ?

isn't it fair to say the smallest program that can be made to run and
generate file output is the best it can be ?


mcj...@gmail.com

unread,
Jul 20, 2008, 6:49:08 PM7/20/08
to
> generate file output is the best it can be ?- Hide quoted text -

>
> - Show quoted text -

wouldn't this idea work...

for who says random can't be compressed...

how about a small software program that works with the code that made
the random data in the first place, with a software equilivent of the
entroy source... like any entry source is hard to think about if it's
so complicated but then just as good entropy source can be a poor
radio signal right? so a software machine to make the poor radio
signal but exactly the same as the point is, and then run to generate
the same random?

that could be a small software program each time, to say the idea of a
large random file being compressed.

to say it's pretty good at being random each way but always a small
program to run in outputting the random content makes me think
different about what trend the idea of random should have in thinking
about how good compression can be.

mcj...@gmail.com

unread,
Jul 20, 2008, 7:01:41 PM7/20/08
to
> generate file output is the best it can be ?- Hide quoted text -

>
> - Show quoted text -

I'd like to think that the small software program is the same as
saying the random data compressed, because it runs to be the same
information.
who cares if there's no way to get it like that, that's not the point.

the webpage is kind of hard on this idea, but such a simple analogy
must prove that idea wrong. that random is hard to compress maybe, but
certainly possible and it's not any idea to refer to at all except in
the case of compression meaning redundancy reduction by reducing
repeat strings. I've already given a few examples of how that's not
the only way to compress a file.

randomness is no problem at all to compress if I can call any random
generated content the same as a small program with the same poor radio
signal simulator that made the random as well as the same random
generating routine setup the same way. it's saying exactly the same
for the random content, but easily it's saying the idea if it being
compressed too. I don't want to see how random is made another way, I
want to see the idea of random itself as good as that would always be
at saying random. Because certainly compressable, if that idea can
work.

Also look at my idea of saying file data to compress is a shape...
say it's a shape, like find a way to convert information into a shape.
the shape doesn't have to be bigger, in idea as many shapes for what
to say as a shape can make the shape the same size in data as what
makes for the shape... but now say this, find some other file data to
be a shape too ok? but now instead of saying other file content again,
with first shape already said for another part of data, say a
mathematical formula that transforms the shape into the other shape.

so also see how putting in a geometry area like said before would be
in idea the same. It has a different way to work than reducing repeat
occurances.

mcj...@gmail.com

unread,
Jul 20, 2008, 7:09:39 PM7/20/08
to
> generate file output is the best it can be ?- Hide quoted text -

>
> - Show quoted text -

this might be a hard proportion to make sense...

if i can say a shape in math terms, then I can say another shape in
math terms, then I say a shape transform in math to transform a shape,
then doesn't a math transform in size it is have a chance of being
smaller than the shape transformed to? in some examples I think so, so
it has a chance to be bigger too...

doens't that work out having to be some shapes bigger in math to say a
shape is transformed to, and some shapes small to say in math to
transform to?

like say text is a shape, like some math curve that transforms if you
add another incentive to it. so say a curve, but then say not another
curve but how to curve the line more.. so say to curve the line more
until it's another curve, so say instead of another curve you have one
curve and math transform to other curve.

so some cases the math would be smaller than the other curve when you
say a curve already to change?

so if i say text is a curve shape then math to transorm curve, I can
say other text with less right?

Thomas Richter

unread,
Jul 21, 2008, 3:37:06 AM7/21/08
to
mcj...@gmail.com schrieb:

> On Jul 20, 3:59 pm, Willem <wil...@stack.nl> wrote:
>> mcja...@gmail.com wrote:
>>
>> ) what would happen if this though...
>>
>> If you ignore fundamental principles and simple arguments,
>> then you will either get laughed at or get ignored.
>>
>> SaSW, Willem
>> --
>> Disclaimer: I am in no way responsible for any of the statements
>> made in the above text. For all I know I might be
>> drugged or something..
>> No I'm not paranoid. You all think I'm paranoid, don't you !
>> #EOT
>
> What am I ignoring that's fundamental?

You claim you have found a way to reduce the size of the representation
of information, but you state that your ideas do not make use of
redundancy in the text.

No matter what I think about your ideas otherwise, what you do here *is*
redundancy reduction. (FACT!)

And, if you would look a bit further, you will find that very similar
(though less graphical) ideas are used by existing algorithms, like the
Lempel-Ziv.

In fact, I suggest that you do some reading first, just to get a broader
view on the subject, and, afterwards, go over your ideas and come back.
You'll then probably see that your ideas and the ideas of Ziv aren't
very far apart, just that Ziv's algorithm is a bit more "down to earth",
and practical and implementable. The principles are similar, though.

So long,
Thomas

Thomas Richter

unread,
Jul 21, 2008, 3:39:52 AM7/21/08
to
mcj...@gmail.com schrieb:

> for who says random can't be compressed...

Ok, here's an exercise for you, or rather a question:

Please give a definition of "random". (I'm asking for not more).


Then, once you have that definition, we can work from there backwards.

So long,
Thomas

Willem

unread,
Jul 21, 2008, 3:52:09 AM7/21/08
to
mcj...@gmail.com wrote:
) What am I ignoring that's fundamental?
)
) I'm taking the understanding into account that compression works with
) the idea of reducing redundancy...

Compression does not "work with" reducing redundancy.
Compression *IS* reducing redundancy.
Two different names for the same thing.

) so far the only idea I think is repeat occurances that can be said
) once and explained more often right?

Wrong. Reducing redundancy is not only about finding repeat occurrences.
Perhaps you should first research existing methods of redundancy reduction.

mcj...@gmail.com

unread,
Jul 22, 2008, 12:02:33 AM7/22/08
to
On Jul 21, 3:52 am, Willem <wil...@stack.nl> wrote:

Does anybody see what proportion a method like this has for
compression?

I think it might be different that the ideas used for any other ways
of compressing...


Because for "frog toad" and "ford goat" only "f o r d g at" would be
what gets stored. Like, stored in a sphere where each part is
orientented where a curved line can connect the parts in different
ways. so then the tokens themself are actually a curved line
descriptor in math that connects the parts together in the order for
what the token means. So for example with just that stored in the
sphere now there can be a token for "gord, grad, rat, droat, foad,
dorf, dator, fat rat, rat fog" as taking only the size of the token
for more space.

Anybody want to check the math on this one ?

Thomas Richter

unread,
Jul 22, 2008, 3:23:53 AM7/22/08
to
mcj...@gmail.com schrieb:

> Does anybody see what proportion a method like this has for
> compression?

Well, why don't you implement it then and test it? My personal
impression is: Not much, because LZ is not so different and works pretty
well - it's not so easy to compete against.

> Anybody want to check the math on this one ?

It's not yet in the form where math can be applied in first place. Only
you can do that - and I doubt many will volunteer. If you want to prove
that your method works, I think it's up to you to provide an implementation.

So long,
Thomas

mcj...@gmail.com

unread,
Jul 22, 2008, 5:11:09 AM7/22/08
to
On Jul 22, 3:23 am, Thomas Richter <t...@math.tu-berlin.de> wrote:
> mcja...@gmail.com schrieb:

I'm convinced this way of compressing has a proportional difference to
methods tried...

I even see no reason why random can't be compressed quite well
actually.

say given a random string:

"jht845mnh82hk"

right.. no repeat occurances significant enough to call tokened with
traditional compression's string repeat occurance tactic...

however,

"j h t 8 4 5 m n 2 k"

can be stored, and tokens that say:

curve line j - h - t - 8 - 4 - 5 - m - n
curve line 2 - h - k

maybe not how this actually works out to be with the size of how you
say a curved line, but it's not stringent on any limits of how this
can be a proportion that achieves for random a compression ratio
better off.


mcj...@gmail.com

unread,
Jul 22, 2008, 6:34:00 AM7/22/08
to
> better off.- Hide quoted text -

>
> - Show quoted text -

I think for sure this method would achieve a good result with
random...

take for example how it can work this way...

say all the file put in a sphere... but just as one block in the
sphere, now you can just say the sphere in the file and a token to the
one block.. ok?
so say always then everything in the file goes in the sphere, and how
the file content is can only be tokens.
so now no compression but no loss to have a pattern token system in
place, get it? so a sphere and 1 token to the data block of the whole
file. same size.
so now to say instead of a big block of data, have it in the sphere
broken apart and the file many tokens.. so now the file is only
getting bigger right? with tokens to inside the sphere?

so with the whole file as a big block in the sphere, you can break the
block up where pieces reorganize to be part of the file, but you can
also get rid of any pieces of the block that show a reorganization
stategy that is already found or make one already there extend.

so a file the same size to start with, but only smaller as you can
keep saying that instead of a having something in the file be stored
as a whole block, it's either nothing at all if the blocks already can
have a curved line draw them together, or just the part that extends
how a curved line can say some but not all.

so isn't that like saying that even if among alot of random data, all
you can find is "qlkzg[p387b" and "nalncqgbzz" then you get a lesser
storage size since
you can just say "q lk z g[p387b nalnc gb" in the sphere, and file
content said as a few lines that connect the blocks together in order.

so now file contents is line, like a curved line for in the sphere, as
tokens.

this isn't edging a condition of token size as thinking the tradeoff
of whether or not it's worth making a token for repeat occurances, it
isn't like that at all.

mcj...@gmail.com

unread,
Jul 22, 2008, 6:54:58 AM7/22/08
to
> generate file output is the best it can be ?- Hide quoted text -

>
> - Show quoted text -

so given the idea of this as what to compress:

"abcdefghijklmnopqrstuvwxyz abstract wizard ward start"

you can store:

"ab cdefghijklmnopqrstuvwxyz stract w izard ar d st t"
or
"ab cdefghijklmnopqrstuvwxyz st ract wiz ar d ward t"
or
"ab cdefghijklmnopqrstuv w xyz st ract iz ar d ard t"
or
"a b cdefghijklmnopqrs t uv w xyz st rct iz ar d rd"
or
"a b c d efghijklmnopqrs t uv w xyz st rct iz ar r"
or
"a b c d efghijklmnopqrs t uv w xyz s rc iz ar r"
or
"a b c d efghijklmnopq r s t uv w xyz s rc iz"

and so on..

each way has connect-the-dots to say the same.

Mark Nelson

unread,
Jul 22, 2008, 7:30:37 AM7/22/08
to
On Jul 21, 2:39 am, Thomas Richter <t...@math.tu-berlin.de> wrote:
> mcja...@gmail.com schrieb:

I think a good definition for the purpose of discussion here is
something like this:

"A random sequence is defined as a any sequence which cannot be
generated with a program shorter than itself. "

The only catch to this definition is that it is with respect to the
machine on which the program is going to run. Other than that I think
it works very well.

It also stops any discussion of compressing random data dead in its
tracks by defining the problem away.

|
| Mark Nelson - http://marknelson.us
|

mcj...@gmail.com

unread,
Jul 22, 2008, 7:33:16 AM7/22/08
to
> each way has connect-the-dots to say the same.- Hide quoted text -

>
> - Show quoted text -


reorganized patterns _not_ recurring patterns.

say "ab gh tu" as stored, then a single token can be for
tughab ghtuab ghabtu abtugh abghtu tuabgh


mcj...@gmail.com

unread,
Jul 22, 2008, 7:47:52 AM7/22/08
to
> tughab ghtuab ghabtu abtugh abghtu tuabgh- Hide quoted text -

>
> - Show quoted text -

"abcdefgh eic feg cad bad gag"

can be as

"a b c d efg h eic feg"

to think even one big curve line but that stored maybe..

Thomas Richter

unread,
Jul 22, 2008, 2:57:43 PM7/22/08
to
Mark Nelson wrote:

Mark Nelson wrote:
> On Jul 21, 2:39 am, Thomas Richter <t...@math.tu-berlin.de> wrote:
>> mcja...@gmail.com schrieb:
>>> for who says random can't be compressed...
>> Ok, here's an exercise for you, or rather a question:
>> Please give a definition of "random". (I'm asking for not more).
>> Then, once you have that definition, we can work from there backwards.
>
> I think a good definition for the purpose of discussion here is
> something like this:
>
> "A random sequence is defined as a any sequence which cannot be
> generated with a program shorter than itself. "
>
> The only catch to this definition is that it is with respect to the
> machine on which the program is going to run. Other than that I think
> it works very well.

That (namely, Kolmogorov's) is a very good definition indeed, and it
ends the argument.

For the purpose of the thread, I actually considered a simpler one,
namely "data the compressor did not expect" - speaking as a
mathematician, I would say it is the matter of how you place your
quantors: The OP claimed:

There is a compressor such that for any data sequence the output of the
sequence under the compressor is shorter than the input.

which is wrong - and what I would call "compression of random data" or
(equivalently) "recursive compression". Correct is, however, only

For any data sequence exists a compressor such that the output of the
sequence under the compressor is shorter than the input.

> It also stops any discussion of compressing random data dead in its
> tracks by defining the problem away.

Well, indeed. Though it would have been nice if the OP had to make up
his mind himself. The question is not quite as trivial as it seems.

So long,
Thomas

Jim Leonard

unread,
Jul 22, 2008, 3:28:16 PM7/22/08
to
On Jul 22, 4:11 am, mcja...@gmail.com wrote:
> I even see no reason why random can't be compressed quite well
> actually.
>
> say given a random string:
>
> "jht845mnh82hk"
>
> right.. no repeat occurances significant enough to call tokened with
> traditional compression's string repeat occurance tactic...
>
> however,
>
> "j h t 8 4 5 m n 2 k"
>
> can be stored, and tokens that say:
>
> curve line j - h - t - 8 - 4 - 5 - m - n
> curve line 2 - h - k

Right. So how would you describe these "curves"? What would the data
look like? Probably some floating-point numbers for vectors/points
and, what, bezier curves? Floating-point numbers for those too,
right? For sufficient precision in the mantissa and exponent you'd
use IEEE 80-bit floating point numbers, yes?

Why don't you work this all out. You'd see that the "curve"
definitions would actually take up more space just for themselves than
1. the data they were trying to reconstruct, for small datasets, or 2.
other established methods like LZ77, for larger datasets.

If you want to stick to one dimension, you can convert your "curves"
into "lines" since it doesn't matter if the "lines" are curved or not
as long as they point to the right data. So you'd only use integer
numbers to point to matches. And that's LZ77.

Jim Leonard

unread,
Jul 22, 2008, 3:31:08 PM7/22/08
to
On Jul 22, 5:34 am, mcja...@gmail.com wrote:
> you can just say "q lk z g[p387b nalnc gb" in the sphere, and file
> content said as a few lines that connect the blocks together in order.
>
> so now file contents is line, like a curved line for in the sphere, as
> tokens.

You are completely neglecting the size of the data necessary to
describe these spheres and curves/lines. Saying "put all the file in
a sphere" implies that you are defining a sphere and indicating where
the file goes. That takes up space. It is not free. Try working it
out and you'll see that this arbitrary arrangement takes up more space
than it saves.

mcjason

unread,
Jul 22, 2008, 9:53:14 PM7/22/08
to

Ok so given nothing different about the size I was thinking it would
be a file with just the same size about for what it is it say
(sphere)TOKEN and just if you have all the data into the sphere not
broken apart.

so that's the same size, so then it can be thought that everything to
do where size is smaller is to be where something like
ajkl12 and ajlk12 can be said another way, and another way to say is
less than what it takes to say one data block.
so a sphere with like "(data but not aj k, and 12: aj:k:12:
data)TOKENS"...

so for alot it's like to say that you only need in all of it a
reoganized pattern a few times to be better no matter what. right
maybe?

mcjason

unread,
Jul 22, 2008, 10:03:54 PM7/22/08
to

i mean to say..

isn't it like saying that no matter what the size of data is, it's
that the way to just say it and not be smaller is
(sphere)TOKEN but then to say smaller is to only find _in any size it
can be_ only a few patterns that reorganize.

it's neat how the size of the curves and how you work the sphere area
matters.. but the tradeoff is mostly another way.

right.. lines or curves or something.. where a token plots for an
arrangement.

mcjason

unread,
Jul 22, 2008, 10:06:57 PM7/22/08
to
On Jul 22, 9:53 pm, mcjason <mcja...@gmail.com> wrote:

anybody know how to explain that tradeoff better than me?

mcjason

unread,
Jul 23, 2008, 12:28:41 AM7/23/08
to
On Jul 17, 8:24 am, mcja...@gmail.com wrote:
> A primitive idea maybe?
>
> from what I understand mostcompressiontechniques try to use
> redundancy elimination as their main score. So take for example repeat
> occurances of the same and give token of replica instead. What would
> happen say if you placed in a geometric area all of what gives a
> token, and said not token but geometric location as what happens to be
> it? Pretty much it maybe if the token is always the same size to be
> what works right? so what happens if you say in geometric area what
> out to be instead of just what repeats itself a sequence of words say
> for example that occur in different order what is neighbouring
> eachother where you can say in geometric area a plot course that draws
> a string pattern different ways as the token, to be crude as any
> example words spaced in a sphere, with a curve that intersects the
> words that find themself ordered other ways as the token to say.
>
> so you may see this in a file:
>
> the first sentence to see
> another sentence like this
> a good sentence to run on about
> a final sentence to finish
> to see another sentence
> about a good sentence
>
> but put the word "sentence" in this sphere, and surround all the words
> that make each sentence so that all you do to token the whole sentense
> is to draw say a curve in the sphere through each word as what you say
> is the token, like math that makes the curve and where it starts. but
> also see the word "see" and "good" once because they show up in more
> than one sentence, find the curve to be at the same place for those
> words as the token for those setences.
>
> isn't this no matter what the same as redundancy reduction done right?
> it says at least the same doesn't it? but has yet better potention to
> say nothing worse but better? in idea? since token space is allowed to
> be the same for what geometry there can be to say a place? but
> extended for the idea of patterns that rearrange?
>
> doesn't this have a better theory to it altogether with what
> rearranged patterns can be as what can be found common enough I
> suppose.

For the point of what I tried saying about random...

since it can be said that any random data is compressed if you have a
software program with the same entropy source seed the generation of
random data, isn't it the same to say that random carries no trend in
it's nature against compressability?

Like, duh... isn't the random generating software program just the
same as the random data compressed?

just to say though... because it's enough to see what idea tries to be
about saying redundancy not found enough in random data is why it's
"mathematically impossible?" to compress? just because it has few
repeat occurances worth giving a token about?

I hated that too...

am I maybe onto something when I say how reorganized patterns can
achieve a type of compression that should have no ordering against the
nature of random data?

there should be many instances of finding in random things like a few
characters together in groups that reorganize another way, but not all
of them reorganized another way but for some to be part of how another
pattern has some of another pattern.


""cherth church theater lightning ghost storm rover"
stored like
ch er th ur ea t lightning o st orm rov


and then space of a few tokens.

can you see how more this way doesn't work like the pigeon hole
problem?

i would even say nothing but a sphere with blocks, and the rest as
tokens. because you can say just a sphere with 1 block and 1 token to
be only that overhead, but now to say better about a smaller size is
to only find a tradeoff that seems certain of catching a benefit with
random.

so no pigeon hole problem there at all actually... just the tradeoff
of token size and blocks broken apart.


I'm almost certain this achieves for random data.

mcjason

unread,
Jul 23, 2008, 1:02:12 AM7/23/08
to
> achieve a type ofcompressionthat should have no ordering against the

> nature of random data?
>
> there should be many instances of finding in random things like a few
> characters together in groups that reorganize another way, but not all
> of them reorganized another way but for some to be part of how another
> pattern has some of another pattern.
>
> ""cherth church theater lightning ghost storm rover"
> stored like
> ch er th ur ea t lightning o st orm rov
>
> and then space of a few tokens.
>
> can you see how more this way doesn't work like the pigeon hole
> problem?
>
> i would even say nothing but a sphere with blocks, and the rest as
> tokens. because you can say just a sphere with 1 block and 1 token to
> be only that overhead, but now to say better about a smaller size is
> to only find a tradeoff that seems certain of catching a benefit with
> random.
>
> so no pigeon hole problem there at all actually... just the tradeoff
> of token size and blocks broken apart.
>
> I'm almost certain this achieves for random data.- Hide quoted text -

>
> - Show quoted text -

so it's like saying that in data of any size no matter what, there's a
benefit if it can be found to have a pattern that's bigger than the
token be reorganized more than one way right?

like, in all of the data to compress... just find in all of it
"uiJKllnq" and also "KlluiB" to store "ui J K ll n q b", and then
anything more like "opqKb" stores only "op" more because now a token
can be for it with how the rest is there.


I like how it looks to put it all in a sphere area say, and have the
file 'content' only be tokens. like, the file content is only tokens
that draw a pattern organization plot.

mcjason

unread,
Jul 23, 2008, 1:12:40 AM7/23/08
to

and maybe the file format would be where at the beginning just say one
thing after the other the way each thing after another can only be
what starts in the center at rotates around what makes it so a curved
line maybe can also be draw to connect things to together? then the
rest of the file as just curved line tokens?

mcjason

unread,
Jul 23, 2008, 1:14:41 AM7/23/08
to

and i bet though that what's compressed is able to be compressed again
the same way.. i mean, there should be any reason why there isn't
patterns to find like this in how you say curved lines and sphere
areas...

Jim Leonard

unread,
Jul 23, 2008, 1:24:06 PM7/23/08
to
On Jul 22, 8:53 pm, mcjason <mcja...@gmail.com> wrote:
> where something like
> ajkl12 and ajlk12 can be said another way, and another way to say is
> less than what it takes to say one data block.
> so a sphere with like "(data but not aj k, and 12: aj:k:12:
> data)TOKENS"...

What is a "data block"? Data is not free. If you constrain your idea
of a "data block" to a single byte then you can see there is no space
savings.

Already you are disproving your own argument. You are saying that it
should be simple to represent:
"ajkl12"
in a different form with:


"(data but not aj k, and 12: aj:k:12: data)TOKENS"

which is clearly longer than the source data, unless you think you can
define it in less than 6 bytes (which you can't).

Please research LZ77 so that you can see your method has already been
surpassed 30 years ago.

Jim Leonard

unread,
Jul 23, 2008, 1:37:17 PM7/23/08
to
On Jul 23, 12:14 am, mcjason <mcja...@gmail.com> wrote:
> and i bet though that what's compressed is able to be compressed again
> the same way.. i mean, there should be any reason why there isn't
> patterns to find like this in how you say curved lines and sphere
> areas...

We're done here. I think your next course of action is to stop
ranting and program an LZ77 compressor so that you gain actual
experience writing a compressor. Here's a few links to help you get
started:

http://datacompression.dogma.net/index.php?title=FAQ:Intro_to_Data_Compression-_Huffman_and_Arithmetic_Coding%2C_LZ77%2C_LZ78

http://www.fadden.com/techmisc/hdc/index.htm

Read these completely, and if you don't understand the LZ77 portions,
find a different hobby.

mcjason

unread,
Jul 24, 2008, 1:46:33 PM7/24/08
to
On Jul 23, 1:37 pm, Jim Leonard <MobyGa...@gmail.com> wrote:
> On Jul 23, 12:14 am, mcjason <mcja...@gmail.com> wrote:
>
> > and i bet though that what's compressed is able to be compressed again
> > the same way.. i mean, there should be any reason why there isn't
> > patterns to find like this in how you say curved lines and sphere
> > areas...
>
> We're done here.  I think your next course of action is to stop
> ranting and program an LZ77 compressor so that you gain actual
> experience writing a compressor.  Here's a few links to help you get
> started:
>
> http://datacompression.dogma.net/index.php?title=FAQ:Intro_to_Data_Co...

>
> http://www.fadden.com/techmisc/hdc/index.htm
>
> Read these completely, and if you don't understand the LZ77 portions,
> find a different hobby.

Hello?

I'd like to think I'm not doing a poor job approaching the idea of why
random might not be a challenge at all to compress, is this not
interesting for this discussion group?


Let me try to restate my idea....


I'm trying to explain how the idea of what I'm saying is fundamentally
different than any other compression I believe so.

It's trying to be the idea of pattern reorganization, definitely not a
proportional comparison to redundancy reduction as it is to reduce
repeat occurances.

It's actually so simple to think about, I believe I embraced the idea
of how it's _almost definitely_ true to the idea of being good about
random.

I'd like to try to explain what tradeoff proportion the idea has...

so given no case at all of compression, say just the whole file as one
data block in the sphere, and 1 token of it. so a file with no
overhead to care about really for what matters, so the same size
abouts.

so now to think of any benefit of compressing, it's to realize this:

- instead of one block,
like "loqenalqoq"
have the few blocks "l:o:q:ena" in geometric area, like say each
block is seperated by distance and locationed.
now for the token, say _a curved line_ that connects the blocks in
order. so curved line through each block like l-o-q-ena-l-q-o-q

so now storage is sized "loqena" bytes + size of curved line (like
beizer curve or something)

and now say more of the file to see has "hghalauiqen"
then it's to store as more "h:g:h:a:ui:en"

but then in algorithm I suppose now the block "ena" can be made
"en", and the "a" portion dropped because there's already an "a" block
to organize the "ena"
pattern.

and now another curve token, or to extend the curve token already
there.

- so as long as in data _ANY LENGTH_ there can be found _EVEN 1_
occurance of a pattern that is part of another pattern even, or
organizes another way, it's to
claim a benefit. You can see the size of the token being funny too
for what there is to see about a tradeoff.

like, in data any length... once found "tyqi" and another time
found "tynnqi" and the rest of the file
would be like storing "REST0:ty:qi:REST1:REST2:nn" (no bigger), and
maybe even a curved line token the same size!

REST0 before "tyqi"
REST1 after "tyqi" until "tynnqi"
REST2 after tynnqi

now before there could have been one token for the whole file
content,
now there can be still one token as a curved line, but instead of
being for one place, it curves through
REST0:ty:qi:REST1:ty:nn:qi:REST2"

so now the size is.. well!

before: ("ALL")curved line
after: ("REST0:ty:qi:REST1:REST2:nn" <- size of "ALL" before negative
other "ty:qi")curved line

but some overhead to say more blocks.


so basically it's to draw a pattern organization plot of blocks that
organize together as different patterns to piece together the file
content.

mcjason

unread,
Jul 24, 2008, 1:57:51 PM7/24/08
to
On Jul 24, 1:46 pm, mcjason <mcja...@gmail.com> wrote:
> On Jul 23, 1:37 pm, Jim Leonard <MobyGa...@gmail.com> wrote:
>
>
>
>
>
> > On Jul 23, 12:14 am, mcjason <mcja...@gmail.com> wrote:
>
> > > and i bet though that what's compressed is able to be compressed again
> > > the same way.. i mean, there should be any reason why there isn't
> > > patterns to find like this in how you say curved lines and sphere
> > > areas...
>
> > We're done here.  I think your next course of action is to stop
> > ranting and program an LZ77 compressor so that you gain actual
> > experience writing a compressor.  Here's a few links to help you get
> > started:
>
> >http://datacompression.dogma.net/index.php?title=FAQ:Intro_to_Data_Co...
>
> >http://www.fadden.com/techmisc/hdc/index.htm
>
> > Read these completely, and if you don't understand the LZ77 portions,
> > find a different hobby.
>
> Hello?
>
> I'd like to think I'm not doing a poor job approaching the idea of whyrandommight not be a challenge at all to compress, is this not
> content.- Hide quoted text -

>
> - Show quoted text -

I don't think random has any trend against patterns that reorganize,
like it has against string reoccurances.

so in so much data the tradeoff of finding patterns that reorganize
starts to show a better gain than finding pattern parts
as string reoccurances. because "aa:bb:cc" is ccbbaa, aabbcc, bbccaa,
etc. and even part of "aaeegg"

mcjason

unread,
Jul 24, 2008, 2:01:20 PM7/24/08
to

one token about the same size might as well cover fewer or lesser
parts of what organizes together.

Jim Leonard

unread,
Jul 24, 2008, 2:08:12 PM7/24/08
to
On Jul 24, 12:46 pm, mcjason <mcja...@gmail.com> wrote:
> I'd like to think I'm not doing a poor job approaching the idea of why
> random might not be a challenge at all to compress, is this not
> interesting for this discussion group?

Not when you are ignoring what we're saying when trying to help you.

> I'm trying to explain how the idea of what I'm saying is fundamentally
> different than any other compression I believe so.

It's not. I just told you why in a previous post; I guess you don't
bother reading posts.

> so basically it's to draw a pattern organization plot of blocks that
> organize together as different patterns to piece together the file
> content.

THIS HAS ALREADY BEEN DONE IN PREVIOUS METHODS INVENTED 30 YEARS AGO.
Was that clear enough? What you describe is no different than the
match/offset codes used in LZ77, with the exception that LZ77 is more
efficient than what you have been describing.

Until you can accurately describe LZ77, and how your method exceeds it
for all test cases, please stop posting about your idea.

mcjason

unread,
Jul 25, 2008, 2:30:13 AM7/25/08
to
On Jul 24, 2:08 pm, Jim Leonard <MobyGa...@gmail.com> wrote:
> On Jul 24, 12:46 pm, mcjason <mcja...@gmail.com> wrote:
>
> > I'd like to think I'm not doing a poor job approaching the idea of why
> > random might not be a challenge at all to compress, is this not
> > interesting for this discussion group?
>
> Not when you are ignoring what we're saying when trying to help you.
>
> > I'm trying to explain how the idea of what I'm saying is fundamentally
> > different than any other compression I believe so.
>
> It's not.  I just told you why in a previous post; I guess you don't
> bother reading posts.
>
> > so basically it's to draw a pattern organization plot of blocks that
> > organize together as different patterns to piece together the file
> > content.
>
> THIS HAS ALREADY BEEN DONE IN PREVIOUS METHODS INVENTED 30 YEARS AGO.


Only if the idea of pattern reorganization you're think about about is
to land the redundant portions in locations that are better
found with match/offset codes... what I'm talking about here is _very
different_!

I'm supposing the idea that portions of the file that can be broken
apart and reorganized in different ways.
say "not a long string" is found to be the data to compress...
then it's to store "not a long string" as any of these ways...

- "not:a:long:string"
also "a string long"
also "not a string"
also just "not a long"

- "n:ot:a:lo:g:stri:n"
also "got log lot"

- "no:t:a:lo:ng:s:r:i"
also "nostalgia"

- "n:o:t:a:l:ng:stri"
also "nostril tang"

- "not:a:lo:ng:stri"


- "n:o:t:a:l:g:s:ri"
also "analog"
also "tall snot"
also "latta nosrita"

"no:t:a:lo:ng:s:tri:ng"

> Was that clear enough?  What you describe is no different than the
> match/offset codes used in LZ77, with the exception that LZ77 is more
> efficient than what you have been describing.


I think you're giving too much credit here to the idea of thinking
compression just works the way that is good, from what I gather of how
that tries to contradicts the example I'm trying to make of how
pattern reorganization is actually a completely different idea
altogether, that has nothing to do with having to find repeat
occurances
of the same thing said to be better at saying what should be taken as
less to say more.

>
> Until you can accurately describe LZ77, and how your method exceeds it
> for all test cases, please stop posting about your idea.

I think you're using a sliding window to see something else.

So each of the next characters is exactly equilivent to the characters
found at an offset difference behind where we're at.

It may be only to find how repeating an example tries to say something
else mostly that gets stuck at only being a little less than the more
it has, because although
some was said as part of something else and the rest can't be found as
anything but new, it's to find in part what is mostly said another way
that already got eliminated
as a part found in something else that looked better in another
example of where it was.

mcjason

unread,
Jul 25, 2008, 2:50:36 AM7/25/08
to
> example of where it was.- Hide quoted text -

>
> - Show quoted text -

Maybe people can see this isn't even in the same line of thinking as
where the idea of a piegonhole problem can be seen?

It's to see as only as the content inside a sphere, and curved lines
through it to say how it comes together that way.

I'm feeding off an area that has parts that come together to say what
makes sense of what's plainspoken the way that parts can come together
many ways, and I'm always finding parts to come together to say
something different many ways, but with so many parts to put together
as what can only get a little harder, and ways to put them together
that only gets a little more complicated, it's seeing how the stuffing
is without the pigeonhole problem the way only so much fits explaining
well for what I understand of how to say it better. It's try over and
over again even and still find a pattern that can be said smaller as
better that might be the outstanding difference to see. I'd probably
find a pattern after a while that doesn't say much different than
being a pattern too complicated to see smaller.

Thomas Richter

unread,
Jul 25, 2008, 2:59:11 AM7/25/08
to
mcjason schrieb:

>
> Hello?

Hello, hello? Someone home?

> I'd like to think I'm not doing a poor job approaching the idea of why
> random might not be a challenge at all to compress, is this not
> interesting for this discussion group?

NO, it isn't, you're wasting bandwidth by repeating endlessly. You have
homework to do, so please do. There are things you have to *read* first
(seriously!), and a couple of questions you must become clear about. All
this has been mentioned before, by other people.

o) Read about LZ77. Even if you *think right now* it is unrelated to
your idea, nevertheless read it. There is nothing you must be afraid of,
please do and then(!) think about it. The best you can do is actually
work LZ77 out in one example yourself (emulating the machine by pen and
paper.) I'm serious.

o) Please think about what "random" actually means. Your concepts are
still not cleaned up.


> It's trying to be the idea of pattern reorganization, definitely not a
> proportional comparison to redundancy reduction as it is to reduce
> repeat occurances.
>
> It's actually so simple to think about, I believe I embraced the idea
> of how it's _almost definitely_ true to the idea of being good about
> random.

Second point above. Please state what "random" means. You haven't done
so yet. Please do your homework - it's really about helping you, not
about annoying you. Nobody can do that for you, you must learn it yourself.

> - instead of one block,
> like "loqenalqoq"
> have the few blocks "l:o:q:ena" in geometric area, like say each
> block is seperated by distance and locationed.
> now for the token, say _a curved line_ that connects the blocks in
> order. so curved line through each block like l-o-q-ena-l-q-o-q

Which is, in a nutshell, LZ77/LZ78 in a variant. It separates patterns
into blocks, locating the the longest possible substring in its
dictionary and adds the pattern plus its extension to the dictionary.

> so now storage is sized "loqena" bytes + size of curved line (like
> beizer curve or something)

Just consider: How much data do you need to describe the curve (just
estimate!) LZ77 is simpler, it stores (entropy encoded) a
length/distance pair. If you would think about it, you'll find that the
parameters describing your curve are pretty long compared to what the LZ
variants do.

> and now say more of the file to see has "hghalauiqen"
> then it's to store as more "h:g:h:a:ui:en"

Separating data in blocks is an elementary feature of all the lookalikes
of LZ.

> but then in algorithm I suppose now the block "ena" can be made
> "en", and the "a" portion dropped because there's already an "a" block
> to organize the "ena"
> pattern.
>
> and now another curve token, or to extend the curve token already
> there.

And you still have to store those instructions to the decoder. Well,
fine then, but note that you need to allocate rate (file size) to do so,
and while storing an "a" takes one byte, how many bytes does it take to
describe the curve? That rate isn't for free!


> - so as long as in data _ANY LENGTH_ there can be found _EVEN 1_
> occurance of a pattern that is part of another pattern even, or
> organizes another way, it's to
> claim a benefit. You can see the size of the token being funny too
> for what there is to see about a tradeoff.
>
> like, in data any length... once found "tyqi" and another time
> found "tynnqi" and the rest of the file
> would be like storing "REST0:ty:qi:REST1:REST2:nn" (no bigger), and
> maybe even a curved line token the same size!
>
> REST0 before "tyqi"
> REST1 after "tyqi" until "tynnqi"
> REST2 after tynnqi
>
> now before there could have been one token for the whole file
> content,
> now there can be still one token as a curved line, but instead of
> being for one place, it curves through
> REST0:ty:qi:REST1:ty:nn:qi:REST2"
>
> so now the size is.. well!

No, the data you gave is insufficient for the decoder. You *also* need
to describe the curves, which takes more rate; which is exactly why you
won't be able to compress "random" data at all.

> before: ("ALL")curved line
> after: ("REST0:ty:qi:REST1:REST2:nn" <- size of "ALL" before negative
> other "ty:qi")curved line
>
> but some overhead to say more blocks.
>
>
> so basically it's to draw a pattern organization plot of blocks that
> organize together as different patterns to piece together the file
> content.

Actually, do you read our posts here? Hello? Anybody home?

So long,
Thomas

Willem

unread,
Jul 25, 2008, 4:10:41 AM7/25/08
to
mcjason wrote:
) <a lot of nonsense, repeated over and over>

I notice you have chosen to completely ignore my posts about how
'compression' and 'redundancy reduction' are two phrases for the same
thing.

Furthermore, I notice that must of your replies are in the line of
'but my method is completely different' followed by the umpteenth
explanation of 'your method', without acrtually addressing the points
that are made. You sound like a broken record.


Are you trolling, or just plain stupid ? My vote is on trolling.
(The bit about 'random' kinda gives it away.)

mcjason

unread,
Jul 25, 2008, 4:10:56 AM7/25/08
to

data where the trend tends to be few repeat occurances of a length of
data, where it's usually not a worthwhile tradeoff to say one
occurance of what repeats, for there to be a token, for how tokens
have a limited way of being said for what else is said. Beause in
random data the allocation space for a token is usually too exhausted
for
there to be a worthwhile way of saying what a token is for what else
is said, for how a repeat occurance of a length of data can be said
once with a token otherwise.

I understand perfectly why this can be seen as a problem when it comes
to compressing with the technique of saying what repeats once with a
token for other occurances. It's intuititive to think of this the way
the problem is well described. But I can't find anywhere the say so of
random being hard to compress isn't connected with the idea of only
working the way that repeat occurances are made fewer, with tokens
taking a naming allocation.

It's very limited to think that's the only way to compress, I gave A
PERFECT analagy of how this is VERY WRONG.

it's to say this proves how random is compressable, take it whatever
way you want I know it's right.

say for every length of data there can be a shape, a shape where it's
a shape different for everyway the data is different.
given perfect math it would be a shape the same size as the data,
because of that making a different shape for everyway data is
different.

now say for two lengths of data, a shape for each.

now.. this might be a little harder to believe is right.

given a shape, and another shape, there is math to say the shape but
made different, to the other shape, where the math to say one shape
different to the other shape is smaller than the other shape. So
instead of saying two shapes, say one shape and the math to make the
shape different as the other shape.

given a perfect idea of how this would work, shouldn't it be that the
math has a 50% rightful claim of being smaller than the other shape,
and a 50% rightful claim of being bigger than the other shape?
Shouldn't it though just to think of the most idea condition there
should be?

doesn't that make sense when there could be some math smaller to say
one shape made to be changed is another shape, smaller than the other
shape? and some math bigger than the other shape? shouldn't the idea
round off as a 50/50 of smaller and bigger than the other shape? to
say a shape changed is another shape.

so now if "BLUE" is a box shape, and "RED" is almost a box shape
punched in the corner so hard. I can store a box shape, and how hard
to punch the box in the corner.

I think this strongly debates the idea of how random is compressable.

Or you can just think the software that makes for random data is the
random data itself compressed if you run it the same way.

I think given at least that, there's nothing to think about at all
when it comes to how random is compressable except the restrictions of
redundancy reduction's way of reducing repeat occurances, that is MOST
CERTAINLY not the only way to compress information.
The pigeonhole problem can find it's way there too.


>
> > - instead of one block,
> >   like "loqenalqoq"
> >   have the few blocks "l:o:q:ena" in geometric area, like say each
> > block is seperated by distance and locationed.
> >   now for the token, say _a curved line_ that connects the blocks in
> > order. so curved line through each block like l-o-q-ena-l-q-o-q
>
> Which is, in a nutshell, LZ77/LZ78 in a variant. It separates patterns
> into blocks, locating the the longest possible substring in its
> dictionary and adds the pattern plus its extension to the dictionary.
>
> >   so now storage is sized "loqena" bytes + size of curved line (like
> > beizer curve or something)
>
> Just consider: How much data do you need to describe the curve (just
> estimate!) LZ77 is simpler, it stores (entropy encoded) a
> length/distance pair. If you would think about it, you'll find that the
> parameters describing your curve are pretty long compared to what the LZ
> variants do.

How many blocks to organize together can a curve draw through?

I imagine this idea....


If there was a big sphere, inside this sphere are many pieces of the
puzzle to put the file together.
Near er is amp, p, ack, tr, and age because the words pack and
amperage come together.
Near amp already in the sphere is cr, and l because the words cramp
and crack come together.
Near cr already in the sphere is im, and son, because the words
crimson and trim come together.
Near ack already in the sphere is att, because the word attacker come
toghther.

Now try putting more in the sphere for more data...

then it's to say for example "try improving" to store is only
"y:oving" as more to keep and a curve to either extend or add.

>
> >  before: ("ALL")curved line
> >  after: ("REST0:ty:qi:REST1:REST2:nn" <- size of "ALL" before negative
> > other "ty:qi")curved line
>
> >  but some overhead to say more blocks.
>
> > so basically it's to draw a pattern organization plot of blocks that
> > organize together as different patterns to piece together the file
> > content.
>
> Actually, do you read our posts here? Hello? Anybody home?
>
> So long,

>         Thomas- Hide quoted text -

mcjason

unread,
Jul 25, 2008, 4:20:40 AM7/25/08
to

It's a thinking point to see how I might be right about something.


but if I said the same thing over and over again it would be to say
once the way that might work, but it also might be to say it
backwards, forwards, from the middle out, and from the end to
beginning, over and over again a different way, that might make a
major point smaller to see.

I was being anywhere with saying that, but it organizes a few ways
together like it can be found better than saying part then the rest as
one part but many rests, to say part and rest many ways with other
parts where each part is only once the same, but never is part of
another part the in more than one part, because of how one part is
about being better on it's own but with some of another part anyways.
So for any part to be seen once, is for it to not be part of a part
that's together as seperate parts because they come with a bigger part
that is seen on it's own. It takes so few parts to say alot, because
they come together as with other parts to be many parts together like
many parts together is just better than some of a part in some of
another part, because it's a part together better than apart.


Willem

unread,
Jul 25, 2008, 4:28:24 AM7/25/08
to
mcjason wrote:
) It's a thinking point to see how I might be right about something.

You're working from the assumption that you're right,
and everybody else is wrong.

Yet, in your posts you demonstrate that you don't know anything
about existing compression techniques.


SaSes , Willem

mcjason

unread,
Jul 25, 2008, 4:28:36 AM7/25/08
to
On Jul 23, 1:37 pm, Jim Leonard <MobyGa...@gmail.com> wrote:
> On Jul 23, 12:14 am, mcjason <mcja...@gmail.com> wrote:
>
> > and i bet though that what's compressed is able to be compressed again
> > the same way.. i mean, there should be any reason why there isn't
> > patterns to find like this in how you say curved lines and sphere
> > areas...
>
> We're done here.  I think your next course of action is to stop
> ranting and program an LZ77 compressor so that you gain actual
> experience writing a compressor.  Here's a few links to help you get
> started:
>
> http://datacompression.dogma.net/index.php?title=FAQ:Intro_to_Data_Co...

>
> http://www.fadden.com/techmisc/hdc/index.htm
>
> Read these completely, and if you don't understand the LZ77 portions,
> find a different hobby.

I can put everything you said into a smaller program and make it run
to say the same if it were trying to be the simplest way to program a
rejection letter servant with no manners and takes only a keyword as a
hint, and it would also serve the purpose of answering any post that
tries to be better than there isn't to think about.

does that compress you? I found it saying alot more than one thing.

Thomas Richter

unread,
Jul 25, 2008, 5:05:42 AM7/25/08
to
mcjason schrieb:

>
>> Second point above. Please state what "random" means. You haven't done
>> so yet. Please do your homework - it's really about helping you, not
>> about annoying you. Nobody can do that for you, you must learn it yourself.
>
> data where the trend tends to be few repeat occurances of a length of
> data, where it's usually not a worthwhile tradeoff to say one
> occurance of what repeats, for there to be a token, for how tokens
> have a limited way of being said for what else is said. Beause in
> random data the allocation space for a token is usually too exhausted
> for
> there to be a worthwhile way of saying what a token is for what else
> is said, for how a repeat occurance of a length of data can be said
> once with a token otherwise.

Not a very reasonable definition, but for the time being, let's take
this. According to this definition, the following string

1234567891012131415161718191202122232425262728293031323334353637383940...

is random, (nothing repeats, provably) though still a ten-year old can
see its construction algorithm.


Hint: You seem to believe that "random" is an attribute that you can a
apply to a sequence you can point at. "Random" is the property of a
process, not of a specific string in particular. Depending on the
process, the string

1111111111111111111111111111111111111111111111111111111111111....

is as likely as the above.

> I understand perfectly why this can be seen as a problem when it comes
> to compressing with the technique of saying what repeats once with a
> token for other occurances.

I'm not saying this. *You* say this.

> It's intuititive to think of this the way
> the problem is well described. But I can't find anywhere the say so of
> random being hard to compress isn't connected with the idea of only
> working the way that repeat occurances are made fewer, with tokens
> taking a naming allocation.
>
> It's very limited to think that's the only way to compress, I gave A
> PERFECT analagy of how this is VERY WRONG.

*Sigh* You gave a non-working example. What makes you believe that I
think in "patterns"? I don't. My field is *image compression*, yet you
can compress them even though there are no patterns, and the algorithms
used there do not look for matched patterns. Hence, please do not try to
tell me what I do and do not know - I think it's the time for you to
deepen your research.

> it's to say this proves how random is compressable, take it whatever
> way you want I know it's right.

Using a definition of "random" that makes sense (your definition
doesn't, I wouldn't call either of the strings random), you cannot
compress random strings.

> say for every length of data there can be a shape, a shape where it's
> a shape different for everyway the data is different.
> given perfect math it would be a shape the same size as the data,
> because of that making a different shape for everyway data is
> different.

That's a "data model"; the question is "is this data model" reasonable
to compress data? And the answer is: For every model one can construct
data that cannot be successfully modeled by it (IOW, cannot be
compressed, using an optimal entropy coding algorithm on the output of
the model). In your case, the model would be to draw shapes or curves or
spheres. As long as you don't give better arguments as why you believe
the model you have is good, and for which type of data it is good for,
this is a lost attempt.

What you don't seem to realize is that while it is fairly true that more
complex models can describe more complex data, these models *also*
require more modeling parameters you somehow have to encode as part of
the message. It is a trade-off between simplicity of the model against
the size of the model parameters. Choosing a simple pattern repetition
model (as in LZ77) leaves only few model parameters (length and offset),
but it is only sufficient to match patterns exactly (from the past) and
not to describe sequences with a more complicated construction algorithm
(as the one I gave above). You can surely introduces models that do that
better, but then you also need more parameters.

In the end, you'll never have an algorithm that "perfectly compresses
everything" because even though your model is then very complete, it is
so complicated that you need to transmit too much data just to describe
it. You *cannot* win this game, it's a logical constraint about maps
between finite sets, a very elementary one.

> now say for two lengths of data, a shape for each.
>
> now.. this might be a little harder to believe is right.

I'm not arguing at this level - you don't seem to understand.

> given a shape, and another shape, there is math to say the shape but
> made different, to the other shape, where the math to say one shape
> different to the other shape is smaller than the other shape. So
> instead of saying two shapes, say one shape and the math to make the
> shape different as the other shape.

All very well, but you still need data to describe this "different", and
you'll soon find out (once you would dare to try to implement it) that
the overall byte budget required to describe this "different" is higher
than the byte budget you save by using this model, at least for *most* data.

If you don't believe this, I urge you to implement your idea in an
algorithm and observe this yourself. Depending on the data set, the most
successful models are simple.


> given a perfect idea of how this would work, shouldn't it be that the
> math has a 50% rightful claim of being smaller than the other shape,
> and a 50% rightful claim of being bigger than the other shape?
> Shouldn't it though just to think of the most idea condition there
> should be?
>
> doesn't that make sense when there could be some math smaller to say
> one shape made to be changed is another shape, smaller than the other
> shape? and some math bigger than the other shape? shouldn't the idea
> round off as a 50/50 of smaller and bigger than the other shape? to
> say a shape changed is another shape.

It all makes sense to say so, but your algorithm also has to say so,
namely has to communicate this to the decoder. And *that* is where your
problem is.

Again, if you don't believe me, construct this algorithm and you'll see
yourself.

So long,
Thomas

mcjason

unread,
Jul 25, 2008, 5:15:02 AM7/25/08
to
On Jul 22, 7:30 am, Mark Nelson <snorkel...@gmail.com> wrote:

> On Jul 21, 2:39 am, Thomas Richter <t...@math.tu-berlin.de> wrote:
>
> > mcja...@gmail.com schrieb:
> > > for who says random can't be compressed...
> > Ok, here's an exercise for you, or rather a question:
> > Please give a definition of "random". (I'm asking for not more).
> > Then, once you have that definition, we can work from there backwards.
>
> I think a good definition for the purpose of discussion here is
> something like this:
>
> "A random sequence is defined as a any sequence which cannot be
> generated with a program shorter than itself. "
>
> The only catch to this definition is that it is with respect to the
> machine on which the program is going to run. Other than that I think
> it works very well.
>
> It also stops any discussion of compressing random data dead in its
> tracks by defining the problem away.
>

so should you, because it never gets far.

> |
> | Mark Nelson -http://marknelson.us
> |

I see an opposite idea here...

so any random data no matter what, can be alternatively represented by
a program that runs to generate it.
but it's not to say random is compressable to the size of that problem
looked at another way?
I get it, I get why that might be well said, because there's nothing
to say about random to make it smaller.

But it's always true that there's a way to represent random data as
the program that runs to make it.

I know, try another entropy source that can't be made into a program
that runs the same way.

But my point is this....

it seems the only point there tries to be about how random can't be
compresses, is because of it's trend of showing no repeat occurances
as what would be called redundancy, in any way that's worthwhile.

that's to say that the only idea there can be of compression is
redundancy reduction, and even to use tokens.

ok.. to stoopify that idea, what if I did this...
say before lots of data I say with a math that expands many locations
and offsets for each token found afterwards, then I say the rest of
the data as just.

so...

dsfkhnsdkjf^TOKEN1sdfljsd^TOKEN2

neato...

how about

what if I could say a line that draws straight, but spikes at
different heights as it's drawn, in math?
what if the math to say that line is small enough to say a line like
that which is smaller than many tokens?
so what if I put that before data that wasn't compressed, and say the
spike means which offset and the distance between spikes is where
there's a token?

of course this could exist, but so it doesn't that I know of and that
matters little.


isn't like random tries to be an example of how compression was made
to not work, like, it says exactly what opposes the fashion of
compressing in a way that tries nothing smarter than exactly the only
way to do anything at all.

it's so well to think random is to not find recurring trend in a long
length of data, because redundancy reduction is exactly oppsite the
way it's meant to work.

find in random data so long what exhausts the option of tokens to use,
say strings that recur with a token, but find a token to be unique in
identifying an explanation for something else.

explanation isn't even the right word, it's just dead dumb about
saying what goes here is something else, because here is special about
not being the same as anything else.

I would want explanation to be the word that meant what a token could
do.. it could explain what goes there instead of just pointing it out.

like, for block of data said with a token, the token can be what tries
to carry on ongoing changing trend with tokens before even to be
better.

so say in once place a token that says something is a way, then make
the next token what carry's on how to be different than what the token
before says, and so on.

but make it what the token examins. ok fine.. that's probably done,
that's the only way to think of it next.

so a token can say for what was there before, what is there now as
what is different than last time... but it's how it's different as
data that hasn't even been tried but can work in idea.

see how that can get random? because i can say one place is "jbgka"
with a token, but i can next token, as the token itself, what says the
difference to make about "jbgka" to be say "89gffg"

ok who cares... it's so many ideas that are 'mathematically distant',
and it's so close to a right idea to examine random in it's fancy way
of being as what is 'impossible to compress' because it carries no
recurring trend as the only thing trying to reduce a recurring trend
finds hard. like, it's to say exactly random is what redundancy
reduction tries not to speak for.

so lets see such a conclusion reached about how because random has no
recurring trend, and compression tries to find a recurring trend, that
random can 't be compressed. but some wildcats out there have
recursion as the answer.

so it's like to say try every small idea of what has a mathematical
expanse to say bigger information for what math can be said smaller
than the information it is.
so it's to beat with a hammer a math forumula that expands to the
information wanting to be said smaller. like there's just no other
math known to do this stunt, but can that ever work too... because i
know of such a thing as small math forumulas that say more. it's like,
fractals and stuff, but then it's like knowing how to make any fractal
there can be, but then it's like saying data is a fractal to see it
another way, but then it's like having a math forumla be what comes
out to be a fractal that is smaller than the fractal is itself at
explaining information, so it's like, hard to find a math formula for
this or something... so lets jump to a conclusion there and say random
isn't compressable, because it has no recurring trend, because it
doesn't find repeat occurances of data often, because that's the
simplest idea of compression, because that's all there is.

I like my shape analagy, it serves well at being simple about proving
something. you can actually find it right if you want to, because
every way to say right and wrong is to find a formula that can work
some anyways and making a shape changed to another shape, and the
weakest way too. it's no recurring trend to not find that to work in
random, except how bad the math is anyways.

mcjason

unread,
Jul 25, 2008, 5:20:30 AM7/25/08
to
On Jul 20, 12:46 am, Jim Leonard <MobyGa...@gmail.com> wrote:
> On Jul 19, 3:38 pm, mcja...@gmail.com wrote:
>
> > now many curves to make up sentences with a small token for each.
>
> It doesn't matter how you're representing the relationship between the
> words, it's all the same thing.
>
> Typical LZ77 compression already does what you're describing.  The
> "curves" are a series of codes that describe where in the dictionary
> the next "words" come from.
>
> Your idea is not new, other than the fact that it would take up more
> data than necessary to "point" to other words than existing methods.

if LZ77 had this idea it would be the same as what I'm talking
about.....

a token that gets bigger to say more... but say the token says to
start at a match, but then the rest of the token says something
like...

start 12 move up 1, move left 2, match, move up 3, move down right 1,
match


and that's to build the whole match

where a token that says to go another way matches some of the same but
different for the direction it takes.

that would be like the same idea.

mcjason

unread,
Jul 25, 2008, 5:55:59 AM7/25/08
to

I have an easy time believing one thing....

say for all there is to compress... put it in a geometry area.
now say it's just that.

now the file is just that, and 1 token to say that's what expands, is
just the block there in the geometry area.

so nothing different about the size really.

now.. instead of one block, this instead...

find every instance of BBBB, and seperate the block.

so in
"sdfjl44tn98324jbBBBB098wutjk0982kjaerjtjkbBBBBsejh2348095bb23ybyBBBB2hi2u553vb23bnjfngBBBB"

now say one BB block and the blocks before and after each BB

now the geometry area is with that

now one curved line as the token to draw that pattern.

so lost is every occurance of BBBB except one, so 6 bytes lost.
gained is what it took to say more blocks, and a curved line that
might be slightly bigger but not much?

so the tradeoff of finding a data block of _ANY SIZE_ that has
occurances of BB, like in any size this can happen once in a while.

no pigeonhole concept here because tokens aren't mixed with data, it's
the geometry area and curves outside it as all there is to expect.

to say seperate blocks there might as well be the simplest way....
say one block after another, but make it so one block after another is
at a location starting different like it is to say a spiral starting
at the center, but one that a curve
can always find it's way through easily maybe?

see how this proves random is compressable?

because in random data any size it's good to see BBBB once in a while,
but it's only a curve slightly more complicated and saying blocks like
before and after each BBBB... but for what there is to say about size
being bigger, it's to say a seperate block and a curve slightly more
complicated for each time BBBB is found?

it's like.. easy to see maybe?

mcjason

unread,
Jul 25, 2008, 6:27:02 AM7/25/08
to
> "sdfjl44tn98324jbBBBB098wutjk0982kjaerjtjkbBBBBsejh2348095bb23ybyBBBB2hi2u5­53vb23bnjfngBBBB"

>
> now say one BB block and the blocks before and after each BB
>
> now the geometry area is with that
>
> now one curved line as the token to draw that pattern.
>
> so lost is every occurance of BBBB except one, so 6 bytes lost.
> gained is what it took to say more blocks, and a curved line that
> might be slightly bigger but not much?
>
> so the tradeoff of finding a data block of _ANY SIZE_ that has
> occurances of BB, like in any size this can happen once in a while.
>
> no pigeonhole concept here because tokens aren't mixed with data, it's
> the geometry area and curves outside it as all there is to expect.
>
> to say seperate blocks there might as well be the simplest way....
> say one block after another, but make it so one block after another is
> at a location starting different like it is to say a spiral starting
> at the center, but one that a curve
> can always find it's way through easily maybe?
>
> see how this proves random is compressable?
>
> because in random data any size it's good to see BBBB once in a while,
> but it's only a curve slightly more complicated and saying blocks like
> before and after each BBBB... but for what there is to say about size
> being bigger, it's to say a seperate block and a curve slightly more
> complicated for each time BBBB is found?
>
> it's like.. easy to see maybe?- Hide quoted text -

>
> - Show quoted text -

See how I can say this....


in data any length, no matter what....

store in a geometry area, but say no different than the data together
and one token.
so no bigger really....


now say this is what is being compressed...

... any length ... "abcdefghijklmnopqrstuvwxyz efcdab cderfab" ... any
length... "erfab 123456789 da" .,.. any length ...

then it's to store...

BLOCK, "ab", "cd", "ef", "ghijklmnopqrstuvwxyz", BLOCK, "erf",
"123456789", "da", BLOCK

and one token...

a curved line... BLOCK - "ab" - "cd" - "ef" - "qghijklmnopqrstuvwxyz "
- "ef" - "cd" - "ab" - "cd" - "er" - BLOCK"f" - "ab" - BLOCK -
BLOCK"erf" - "ab" - "123456789 " - BLOCK"da" - BLOCK

so it has to say 14 blocks instead of 1, and a curved line that isn't
just saying at one place, but is saying through 14 blocks like how
they're situated.

now that's to lose 15 bytes, but gained is explaining 14 blocks
instead of one, and gained is a curved more complicated.

so that's about at odds with saying nothing better.

so what makes this better now?

isn't it to find that going on forever is to find better than what it
takes to explain a new block and how a curve becomes more complicated
for every
"ab", "cd", "ef", "qghijklmnopqrstuvwxyz ", "f", "erf, "123456789 ",
and "da" found, it's to say that size less but a block more and a
curve slightly more complicated?

mcjason

unread,
Jul 25, 2008, 6:55:11 AM7/25/08
to
> "ab", "cd", "ef", ...
>
> read more »- Hide quoted text -

>
> - Show quoted text -

did i ever screw that up... hehe

... any amount ... "abcdefghijklmnop" ... "opmnklijghefcdab" ... any
amount
stored as....

BLOCK_BEFORE, "ab", "cd", "ef", gh", "ij", "kl", "mn", "op",
BLOCK_AFTER

so then a curved line BLOCK_BEFORE - "ab" - "cd" - "ef" - "gh" - "ij"
- "kl" - "mn" - "op" - "op" - "mn" - "kl" - "ij" - "gh" - ef" - "cd"
- "ab" - BLOCK_AFTER

so....

stored with block seperation, to say one block after another makes a
spiral say for example but one a curve draws through well.

so...

total size now... each block, as seperated, and a curved line.


16 bytes lost, 10 blocks seperated instead of 1, and a curved line
more complex.


so it's to say that forever as the size of data, any 2 bytes as found
to be "ab", "cd", "ef", "gh", "ij", or "kl" is for one block
seperation, and a curved line slightly more complicated.

that's about even right? unless it's slightly better right?

so now it's only to find in data of arbitrary length more, 3
characters found together more than once to be at even better odds.

mcjason

unread,
Jul 25, 2008, 7:01:07 AM7/25/08
to

it's to say that each block is a plot point in a 3d space, and the
curved line is a spiral say that touches each plot point for how the
pattern organizes.

Willem

unread,
Jul 25, 2008, 7:10:24 AM7/25/08
to
mcjason wrote:
) it's to say that each block is a plot point in a 3d space, and the
) curved line is a spiral say that touches each plot point for how the
) pattern organizes.

How much space is used storing the shape (a spiral say) ?


SaSW, Willem

mcjason

unread,
Jul 25, 2008, 9:27:27 AM7/25/08
to
On Jul 20, 6:12 pm, mcja...@gmail.com wrote:
> On Jul 20, 5:54 pm, mcja...@gmail.com wrote:
>
>
>
>
>
> > On Jul 20, 5:35 pm, mcja...@gmail.com wrote:
>
> > > On Jul 20, 3:59 pm, Willem <wil...@stack.nl> wrote:
>
> > > > mcja...@gmail.com wrote:
>
> > > > ) what would happen if this though...
>
> > > > If you ignore fundamental principles and simple arguments,
> > > > then you will either get laughed at or get ignored.

>
> > > > SaSW, Willem
> > > > --
> > > > Disclaimer: I am in no way responsible for any of the statements
> > > >             made in the above text. For all I know I might be
> > > >             drugged or something..
> > > >             No I'm not paranoid. You all think I'm paranoid, don't you !
> > > > #EOT
>
> > > What am I ignoring that's fundamental?
>
> > > I'm taking the understanding into account that compression works with
> > > the idea of reducing redundancy...
> > > so far the only idea I think is repeat occurances that can be said
> > > once and explained more often right?
>
> > > what a way to achieve the reduction of information... but I wouldn't
> > > say the only way, it's just said so the way to be about it.
>
> > > I was trying to not be far from an idea that says differerent, would
> > > work in idea of thinking about it, and has something else to it when
> > > it comes to what proportion can be achieved in how much information
> > > can be reduced.
>
> > > now think of a string of text... find the string of text said another
> > > way as a shape somehow, where every word there can be would draw a
> > > different shape. ok ?
> > > now find another string of text, find the shape for it, now find math
> > > that transforms one shape to another, find that in some cases the math
> > > to transform the first shape into the second shape is smaller than the
> > > second shape itself... so say this now, hold the first shape and the
> > > math to transform the shape as the information...
> > > so now in idea it's compression not working for the idea of repeat
> > > occurances, but for how a shape is math transform in size bigger or
> > > smaller than another shape.
> > > so not like there's any math for the idea, or how even any example
> > > tries to fare, it's just the idea of how it's working to achieve
> > > compression.
>
> > > see how that's completely different than finding repeat occurances of
> > > even the same string?
> > > see how it doesn't even depend on how many repeat occurances can be
> > > found?
>
> > > so in simplicity of the same proportion I think this idea of
> > > compression would work, I don't get stuck thinking of it anyway...
>
> > > like... say for everytime abc cba or bca is found as part of the file,
> > > you say coordinate in area and a curve where you start at the first
> > > letter and the curve follows through across each letter. so now only
> > > the letters "abc" are in the geometry area, but a token that says the
> > > letters rearranged any way.
>
> > > so that achieves another way besides repeat occurances of the same
> > > string.
>
> > > I think files say alot better about rearranged patterns than repeat
> > > occurances..  and _no matter what_ it's doing exactly redundancy
> > > reductiotion the same as repeat occurances is too, it definitely says
> > > that _at least_, but could only be better.
>
> > > This is being different than redundant information if that only says
> > > repeat occurances, is it not ?
>
> > I would think of it working like....
>
> > say first of all none of the file for real mixed with tokens, but the
> > idea like this....
> > put all of the file in a geometric area, where parts are further or
> > closer apart.
> > keep putting it in the geometric area where like if "had been here" is
> > already in there, it might be broken apart as words or together maybe?
> > but now putting "here already" in is what, so put the word "already"
> > like near the word "here".
>
> > so now for "had been here" and "here already" you only keep "had been
> > here already", because the word "here" already found but not like a
> > repeat occurance, but like a pattern to find another way.
>
> > so it should be like "gunsmith", "muts", "record", "buns", "thrill"
> > has it so there's maybe in geometric area
>
> > g    uns    mi    th    muts    record    b    rill
>
> > and then for those words a token that has a plot coordinate map said
> > shorter, like just a curved line to connect the parts in an ordering.
>
> > see how this can achieve better? it has no limit the same as finding
> > repeat occurances this way.- Hide quoted text -

>
> > - Show quoted text -
>
> I think this idea could really go over well...
>
> It doesn't seem to follow the same thinking as how random data is hard
> to compress with how repeat occurances won't frequent enough to call
> it any benefit....
> seems like random can mostly have small strings rearranged as common
> enough... to call that stored information once though and a token each
> time...
>
> and i think even once compressed in what you say is a geometric area
> and tokens, you can find that to even be patterns like you can say
> again.- Hide quoted text -

>
> - Show quoted text -

The proof that random data is compressable...

BLOCK_BEFORE ... "abcdefghijklmnop" ... "opmnklijghefcdab" ...
BLOCK_AFTER

stored as....


BLOCK_BEFORE, "ab", "cd", "ef", gh", "ij", "kl", "mn", "op",
BLOCK_AFTER


so then a curved line connecting BLOCK_BEFORE - "ab" - "cd" - "ef" -


"gh" - "ij" - "kl" - "mn" - "op" - "op" - "mn" - "kl" - "ij" - "gh" -
ef" - "cd" - "ab" - BLOCK_AFTER


so....

stored with seperation of each block.
To say one block after another the way it is to start somewhere and
work around a centerpoint of plot points
to be how only block seperation is to keep and for plot points to be
how a spiral or curve can always connect any points together.

so...


total size now... each block, as seperated, and a curved line.


16 bytes reduced, 10 blocks seperated instead of 1, and a curved line
more complex.


so it's to say for an arbitrary amount more data, any 2 bytes as


found
to be "ab", "cd", "ef", "gh", "ij", or "kl" is for one block

seperation for the block found before, and a curved line made more
complicated.

that should be about breaking even right?

it would be even better to find something like "abijkl" like parts
already for another pattern, because now that's 6 bytes reduced, 1
more block seperation,
and a stretched curved line.

so find "uiopijqref" found after to be only "uiop", and "qr" blocks as
new, for example, and a stretched curved line.

I mean like...

in a 3d space plot points, and a spiral zig-zag curve line that
connects each plot in the order to arrange the pattern.


see how the idea of a pigeonhole problem isn't even there? because the
data blocks to organize together are kept seperate from the token
area.

stan

unread,
Jul 26, 2008, 1:42:02 AM7/26/08
to

You don't seem to believe conventional thinking is correct and that you
have a totally new idea.

Can you actually code this up? Or maybe show some pseudo code
explanation? Are current computers capable of executing your idea?

Failing any of that can you show a specific original file and a complete
compressed file? You have repeatedly mentioned uncompressed examples and
shown some reorganizations but you havent shown how to represent the
geometry that is needed to rebuild the original text. You method of
compression can repoduce the original information I hope.

How about showing:

1. An uncompressed string.

2. The complete results of applying your idea to 1.
The reorganized parts and the geometry required to uncompress
the reorganized parts.( The information needed to map the
reorganized parts back to the original data. )

Maybe I'm not clearly seeing your idea. You're basically saying that in
your head, even random data can be compressed. Can you make the idea
concrete?

See in my head I can imagine that perpetual motion works, gas is cheap,
and teenagers aren't the least bit annoying. Then I wake up and, oh
well.

mcjason

unread,
Jul 28, 2008, 5:45:09 AM7/28/08
to
On Jul 17, 8:24 am, mcja...@gmail.com wrote:
> A primitive idea maybe?
>
> from what I understand most compression techniques try to use

Anybody catch how pattern reorganization finds another reason to
compress besides reducing repeat occurances of the same string?

the difference I think would be where it's better one way or the other
to say a small string or big string as what gets a token besides how
it's not good to say both parts as a token.

like when it's to find

"clap" "timer" "brim" "rim" "brime" "rims" "timber" "climb"

then which way is better to keep "imb"?
should there be "imb", "tim", "er", and "cl"
or should there be "im", "er", "ber", "bri"

as the rest too... for which way is better to store as smallest.
but it could be "imb", but also "tim" right? how they both have "im",
but "im" alone isn't as good?


isn't that what there is to most compression methods ?

mcjason

unread,
Jul 28, 2008, 6:11:23 AM7/28/08
to
> isn't that what there is to most compression methods ?- Hide quoted text -

>
> - Show quoted text -

like, it's sometimes better for how "imb" is often together but "im"
on it's own too, but not "im" as how there's "imb" and "im", but for
"imb" it's to not keep "im" and use "b" each time... like, one way or
another is better or worse but overall good about something.

"horse tail" "sailing" "wailing" "bait", "wait"
so which way?

ail, ait ... but now ai alone ?

mcjason

unread,
Jul 28, 2008, 9:18:08 AM7/28/08
to

Ok, so in idea....

"my cat walks on my living room floor"

m:y:cat:wa:l:ks:on:living:r:oo:fl:r

or a few other ways to say

so that's what's kept in the file.. right...
so say for that in the file, but the way it's in a sphere or
something, like where there's just a point in the sphere to mean each
block, i guess each point in the sphere
would be like starting from the center and some kind of spiral outward
where nothing can be a problem as much for how you get from one point
to the next if you draw a curved line through it.


so then as a token a math spiral line connecting parts together
right...

so, it's to see the spiral how it draws across each point in the
geometry. then those are the parts together that become the original
content.


so however big that would be.

even for a spiral to not be mixed with regular file data, where
token's have big business about being something not like anything
else.
right... so file format: (GEOMETRY AREA)(SPIRAL)
see... not even a pigeonhole problem, because tokens aren't mixed with
real information.


so I tried explaining what I saw a tradeoff be that might remark on
random's compressability.

so.. as file untouched except in format I guess..
(ALL FILE CONTENTS AS ONE BLOCK)(SPIRAL JUST OF POINT OF BLOCK)

ehem.. ok, so just to say, nothing at all bigger really...

so.. since it says nothing now about the file being bigger just to say
nothing different, what is it to say any detail of file size
shrinkink...

so each block takes space to say... it can be said one block after
another like not where it goes but how each after another spirals from
a center point,
so I guess each block can be said as just what the block is and how it
seperates.

so blocks would be in a different part than tokens, because it's to
think doing it the other way where it's like saying at least one token
means the whole file as the whole file is a big block in the geometry,
or many blocks with a token saying how they are together.. because
tokens don't have to mix with real file data, like where it's a
problem to say a token and the real data.

so it's to see nothing bigger to just say one block, and a spiral that
goes nowhere but at the location of the block for the whole content.

so it's no bigger really...

now, it's to say only better with size.

so what now says smaller?

if anywhere in the whole file is just "ab", 2 letters, and it's more
than once, it's to do this each time:
say for the block before and after "ab", is a seperate block, so it's
a pattern to organize.
say "ab" only once.. right. the point :)
then a spiral that organizes the pattern together.
so the spiral is from first block, to "ab" block, to next block, to
"ab" block, and so on...

so that's a spiral right... in a geometry area like a sphere, where
each block is to say just a point in the sphere somehow.

so right now... if for each time "ab" is found, it's a block
seperation and the spiral getting more complicated, then is it worth
it?
so a spiral to say more complicated to connect more points together as
how it comes together, and a block seperation, to be all there
is to say more to size, for every "ab" there is...

see how the file is allowed to be any size though, for the way that
for all of it, you can see this tradeoff for how it is to find "ab"
only once ?
see how that is at par though, see the idea that's at par with? it's
even odds of working this way, it's an even tradeoff..
especially to notice how token idea is a spiral seperate from file
content.

so that's just to say 2 bytes, as pretty much even probably, with what
seperating a block is at expence and a spiral being more complicated.

see how this figure would play in any way, and it's just to say ONLY
ONE recurring string for a file ANY SIZE, is to reduce it's size.
mostly because of how a token isn't mixed with real data, but also
because it's pattern organization the way it can work further even..

because it's to say better than 2 bytes now, but now 2 bytes found the
same ANYWHERE is only a block seperation and spiral more complicated
for each if that's worth it.. maybe not 2 bytes, but in arbitrary
sized data, any size bigger, even more bytes together in all of it to
say the size reduced? because in geometry for plotpoint of blocks to
connect together it can be anything like how you say one block after
another for how the spiral works at being able to put it together.

see how that's a tradeoff though that says _even_ for a file any size
of a recurring string that it's always worth it.. especially how it's
a spiral that puts it together as not a token mixed with file data? <-
see though

see how that has to be right to think about it... because to seperate
a block is only to say one block seperate from another, if one block
after another just says from the center out or something, and then
after all that a spiral putting them together in order... see how from
now on for any size it's just that?

do I make as much sense?

>
> 2. The complete results of applying your idea to 1.
> The reorganized parts and the geometry required to uncompress
> the reorganized parts.( The information needed to map the
> reorganized parts back to the original data. )
>
> Maybe I'm not clearly seeing your idea. You're basically saying that in
> your head, even random data can be compressed. Can you make the idea
> concrete?
>
> See in my head I can imagine that perpetual motion works, gas is cheap,
> and teenagers aren't the least bit annoying. Then I wake up and, oh

> well.- Hide quoted text -

mcjason

unread,
Jul 28, 2008, 10:59:26 AM7/28/08
to
> > - Show quoted text -- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -

>
> - Show quoted text -


Do you see how it's exactly the same as redundancy reduction too, how
it's to say once for multiple occurances, but has a better idea too
for having less to regard more information?

because it doesn't have the problem of finding how it's to say a token
for
"platapus" "plank" as pla but then tapus and nk, but maybe pl is the
better idea and atapus ank , for how pl is given a token for more too,
for everything else.

but that's not really the point...

for example "record sale" and "reorder" is like "re:c:ord:sal:e:r :"
as always good if a spiral can say across 6 places and 4 places like
it's smaller than 10 bytes + spiral + 6 block seperations to say
instead of 18 bytes.. probably not, but because there' to say more
without a problem the way now "reader" is to add "ade" block, and to
say spiral slightly more complicated, but to say reader now by
"re:e:r:" already there.... so right, "re:e:r" but "re" because it
pieces as together for the size of the spiral or something. so more to
add as "recall" is to maybe only say "call" as a new block, or maybe
to say "sal" block as "s:al" but add "l" block too, for "recall".


I'm having a hard time trying to say what I think the way it works...

it's like having a big sphere, where word parts for example are spaced
close together and apart where a curve can always connect the parts
together. but then it's for words that have the same prefixes and
suffixes for example to be the same part for the prefix/suffix but
near in the sphere are the other parts of words where a curve draws
through a few parts to put the word parts together in order to make
the word, then say that as the token.

but like pattern parts right... because better yet than redundancy of
recurrances is redundancy of patterns rearrangeable... but is
redundancy of recurrances anyways.

see how patterns rearrangeable is different though? the way backwards
of forwards is like to hold "wards:back:for" but now back and forth is
"back:th"?

see how in different order though, parts that can be any order..

see _MOST IMPORTANTLY_ though, how putting big and small parts
together as a pattern to organize, is like saying the parts only once
and making a spiral only somewhat more complicated? besides how it
already is to say the content so far? see how that can be? see how
that has a tradeoff besides what it's worth to keep a token for a part
that recurs for example... because a 1 byte small part for example
might always be worth it in saying how the spiral crosses to put other
parts together another way? because the spiral is not like saying each
token for where the spiral goes is not the point, that's not what.
it's a spiral that works better at what is already there and has
ultimate reach at what a token has problems referring to maybe? I
think I say it wrong, there's a better way to think of it..

it's like how you can never escape being able to use a part already as
together with other parts to be part of other parts together, to put
together. like, a token is to just say one thing usually right? match/
length offset? so that's like where a token is good for something if
it's smaller than something it says, and so with how a token can be
said too. but a token like that never makes better the idea of
"cinder" and "cider" for "ci" and "der" except to say a token each..
but that's fine because to say it this way would be "ci:n:der" and a
spiral like through ci-der-ci-n-der like it's no better for size. but
then to say more as what has parts "ci", "n", or "der" in it is not
the problem with finding out if the rest is more, like "citadel", but
there's also "citrus" and "recite", then maybe a token for "cit"
instead.. nah, that's not it.. it's something else about how this
would work.

it's how with "th:er:ch:amb:und:gh:gn:at" a spiral as a token can say
in any order together better than a bunch of tokens for each part, if
they are together another way... nah, that's not _exactly_ the
point... but is kinda.

like how a spiral can say "thunder", "chamber", "thunder", "gnat",
"chat"... whatever, but a spiral so big to put it together a way like
smaller than each token for parts, where sometimes with tokens it's
other parts to be better at saying tokens too or something... nah,
it's not really that but yah...

it's how it conglubulates altogether somehow in a sphere that it's to
think about how to say a spiral too but not, it's how "chat" and
"attenuate" are what curves in the same area can put together, but how
curves can also put together "attention" and "wallpaper" like where
"pappa" is what can come together too with curves.

but that's not it I guess to say... it's how from far in the sphere
across the other side cuts across a middle portion or something... not
really to think about though...

it's how altether though pretty much everything that's said forwards
can be said backwards... like how close together in a sphere there's
every single way to form an html tag with value/attributes, where
every value name and attribute type is a data block only once, and how
a curve connects the pieces of how an html tag is formed one way or
another.

but instead other ways, and not even comparable mostly, would say
sometimes "<h1 space=" is to have a token for, but so is "<h1 indent="
if that's a better way.

so can you see how a sphere might work like that? like, it's to say
all parts of html tags like "H1:Width:Height:HR:H2:Redisplay" are
seperate blocks, that piece together as a full tag the way a curve
connects the pieces together?

then all you say for all the html is where a big spiral or something +
all the data blocks like that, is what becomes it all, but to say so
small maybe?

mcjason

unread,
Jul 28, 2008, 11:51:24 AM7/28/08
to
> see how in different ...
>
> read more »- Hide quoted text -

>
> - Show quoted text -

right.. like for HTML

so to think all parts of HTML tags put in a sphere, and parts of the
contents like words only once, but mostly word parts I guess.

then one big spiral that connects the parts together, then going the
same place for the same part to use again...

so to just say as data to store plain and simple... each block like it
becomes a point in a sphere, then just a spiral as one big token
though, not tokens mixes with real data... so not even a pigeonhole
problem with having a way to say the token.

so each block, block seperation, and a spiral. as the whole file.

especially the token that isn't mixes, so just the blocks, and a
spiral like math for a spiral. connecting the parts together.

see how the better idea it might have though is to keep "redo" as
"re:do" because there can/is also "donut" and "relapse" like
"nut:lapse" are kept too, because a single curve line is the token to
saying it? but it's not worthwhile or worthless because "re" to have a
token but maybe "don" and "do" if there's also "donna", but not with
how most probably work... not well explained i guess, I'm having
trouble saying a way to think of it where the proportion of how it
would work is explained well.

it's like.. never a problem somewhere where saying a token for a
string part is, but that's not it, but kinda...


it's like, having another way in thinking about it for how it's always
about making a pattern that organizes another way with a curve for how
it organizes another way still with another curve.. but how that is
altogether instead.

for how it is to explain what might be done overall this way.

it's like taking every word there is and breaking off prefix/suffix/
etc. parts of the words, and keeping base words and such, and just
saying for any word a curve line in the sphere to connect the prefix/
suffix/base word... but how even a bigger curve is worth saying easier
than a smaller curve for much more together maybe.. but not.
it would probably, but it's how patterns can organize another way that
keeps it a way different than saying how repeat occurances are
respected with a token.

i guess it's like how "abracadabra" "brass" "homeplate" can be said
like "a:bra:c:d:s:homeplate", instead of "aBRAcadaBRA" "BRAss"
"homeplate" like for repeat occurances only maybe... for what's worth
it. because "a" is worth saying seperate in one way, but maybe not
another.. like, it's always worth it if a curve draws through it as
small but draws through what's bigger too to say what a curve can be
altogether as a token for what it says... not exactly though.. not
exactly.

but kindof.... but with more like where "a" is a middle part of what
comes together and stuff... where a curve always says together one way
or another is something different... because if you think everything
like blocks that come together another way, and reorganize anyway, but
say blocks but how they organize different ways, and how you use the
same blocks but not the same for different organizations but usually
always the same for the parts that organize as part...

it should have another ratio to it... it works by being with a
different benefit of reducing datasize.

it's how it's all packed together to say what can be organized in a
different way for what comes together, and how you say for it to come
together...

it's how things can be so small it's worthless for a token to say it
usually, but is worth it to be part of 2 bigger parts for example a
curve can say is what comes to being bigger than the curved line.

it's not that though but yeah....

it's how I bet with what's compressed, by how a sphere and spiral is
even said, it can reduce it yet again just because, not even with any
idea of being good or bad to start with of how it was compressed, but
by how this tries to be about any data..

it's like saying, for all data altogether to be compressed, the only
thing that doesn't compress some is where altogether you can't find 3
characters or so that recur even once.. because it'll work like that
with how it is to not mix tokens with real data... because it's
instead to say sphere area of points that say blocks, but then say
spiral of points together in order, like where the spiral goes back
and forth between the same points to put it together.

it's how for what that would work on... it's like impossible for
something to not be much a way like this.

but see how even once to say 3 characters that recur more than once,
it's good to say once only the 3 characters...

because it's instead to say this...

plain data: "ndfnnBBBefnnnBBBdsf98hfwennBBB"
is

compressed: "ndfnn:efnnn:dsf98hfwenn:BBB":SPIRAL("ndfnn" - "BBB" -
"efnnn" - "BBB" - "dsf98hfwenn" - "BBB")

so is it less to say that compressed like that? if the SPIRAL in math
to describe is no more than 6 bytes right... but block seperation too.

but if there was infinite more BBB... no matter what it's to say
forever worth saying less information if BBB is always to take away
and add what always works... like, take away BBB but add block
seperation and make the spiral go one place more.

so then it's any size anyways besides 3 characters that say the same
right?

but there's no extra overhead... no pigeonhole idea at all.. see how
it's no pigeonhole idea at all? see how that's never a problem?
because the spiral is on it's own to put data blocks together? see
what that says about how forever for everything that makes this
tradeoff is to say less.. but it's to say nothing more to start with,
but it's never more with size except how the spiral connects more
points.. like the points can't be arranged a way that works well...
and how there can be any number of blocks.. see how no range extent/
reach/facinity problem?

see what block seperation.. it's what's between BBB and another BBB to
seperate, to say are blocks that organize together... like from one
BBB to another, is what's inbetween, as blocks connected together..
right.... BBB the same block to visit.

so forever "jsdflskdjfBBBsdflsdjflBBBsdfdsdfsdBBB" but so forever...
chance of BBB showing up again but maybe not.. but forever

is to say less size for any BBB if block seperation and spiral being
made more complicated is less size than BBB.

see how that's forever though? see how that works forever to say
that's the tradeoff?

what is it to say random is compressable when you should only find
what is bigger than saying a block seperation and spiral being made
more complicated as what matters, for any data size? like, it should
be thinking about how this is... to say that. maybe?

mcjason

unread,
Jul 28, 2008, 1:48:11 PM7/28/08
to
On Jul 25, 5:05 am, Thomas Richter <t...@math.tu-berlin.de> wrote:
> mcjason schrieb:
>
>
>
> >> Second point above. Please state what "random" means. You haven't done
> >> so yet. Please do your homework - it's really about helping you, not
> >> about annoying you. Nobody can do that for you, you must learn it yourself.
>
> > data where the trend tends to be few repeat occurances of a length of
> > data, where it's usually not a worthwhile tradeoff to say one
> > occurance of what repeats, for there to be a token, for how tokens
> > have a limited way of being said for what else is said. Beause in
> > random data the allocation space for a token is usually too exhausted
> > for
> > there to be a worthwhile way of saying what a token is for what else
> > is said, for how a repeat occurance of a length of data can be said
> > once with a token otherwise.
>
> Not a very reasonable definition, but for the time being, let's take
> this. According to this definition, the following string
>
> 1234567891012131415161718191202122232425262728293031323334353637383940...

see for each 2 numbers together for that like...

12 23 34 45 56 67 78 89 91 10 01 12 21 13 31 14 41 15 51 16 61 17 71
18 81 19 91 12 20 02 21 12 22 22 23 32 24 42 25 52 26 62 27 72 28 82
29 93 30 03 31 13 32
23 33 33 34 43 35 53 36 63 37 73 38 83 39 94 40

so say this...

say each 2 numbers the same once only.
put a point in a sphere for each 2 numbers.
draw a spiral inside the sphere connecting the points together the way
it is to say all numbers in order, so go to the same place in the
sphere for the same 2 numbers right...

so then that can be stored as a file saying just the sphere with the
points for each 2 numbers, and a spiral right?
so the file ... to seperate each 2 numbers as on they're own.. say one
after the other in the order they are go in the sphere, like start at
the center of the sphere for the first and draw around the center plot
points for each 2 numbers, plot points say for example that make it
easy to draw a spiral back and fourth between them. and the math of a
spiral loop.

get it.. how a spiral loop connects the parts together like one after
the other where the spiral touches points in the sphere?

see how the spiral around would be numbers by 2 numbers at a time one
way, and opposite the other way and stuff... see how you can connect-
the-dots of the whole number that comes together?

so i mean for the spiral to say the numbers by the 2 in order that
reconstructs the file content... like to say parts that organize in
another way.

so it's just to say what goes in a sphere.. and a spiral loop that
draws inside the loop.. then where the spiral goes is what is next...
like to make the spiral start somewhere and the way it spirals is to
say the numbers any way they can be put together.

Industrial One

unread,
Jul 29, 2008, 1:52:40 AM7/29/08
to
On Jul 19, 10:34 pm, mcja...@gmail.com wrote:
> On Jul 20, 12:04 am, Industrial One <industrial_...@hotmail.com>
> wrote:
>
> > On Jul 19, 2:03 pm, mcja...@gmail.com wrote:
>
> > > why can't compression be reducing reorganized patterns?
>
> > Any kind of pattern, direct or indirect is a redundancy. If you build
> > a compressor based on "reorganizing patterns" it would require
> > exhaustive processing and not compress significantly more than already-
> > existing LZW techniques. I thought like you once, when I noticed that
> > re-arranging paragraphs/sentences in many usenet posts would make it
> > way more redundant (cuz of all the quotes) but when I tested the idea
> > (by hand) on one thread and compared it to RAR, the gain was roughly
> > 1.5%.
>
> I think it misses the point of redundancy altogether though... it's
> calling abhcgdef dhbfcega dbfegcha acgdfbhe a curve about the same
> size i guess if you say those letters in a sphere and make math that
> draws a curve work through the letters as the token.
>
> where those letters once in the sphere like abcdefgh is with a curve
> line drawing across the letters for each way they are.. i see no
> repeat occurances of a string occur there too good, but i see abcdefgh
> once and a curve line in math like a few characters might be to be
> each way those letters reorganize.
> a very differerent proportion to see carefully.
>
> i think if i were to try an algorithm to build this in a sphere i'd
> just find the pattern maybe somewhat already there and carry on with
> where the curve takes it for anything more to add that gets another
> token... but given the idea of a sphere and curve just being silly at
> it, it's really for another way altogether to say it better.

That's cool, but the problem is all the extensive processing it would
require. And when we talk about practical data to compress such as
common English text, all the side information representing your
mathematical curves n shit will cancel most of the benefits of your
exhaustive searching for less uniform redundancies. As I said, you
will beat modern LZW techniques by 1 or 2% and take 2 or 3x more time
to encode. Bad tradeoff.

> but it has a fundementally different idea than reducing redundancy...
> i don't think that has to be the only way of compression working.

And it isn't the only method around. It's the most efficient as it
takes less than a minute and brings the size down close to the
entropical limit.

> it's finding a pattern reorganized, so nothing to do with reducing
> redundancy at all.
>
> see.. build in a sphere area the words or string parts but see how
> the others built there too find another curve to be mostly anything
> else needing a token sometimes?
>
> it's like so with not being redundancy reduction at all. it's finding
> a reorganization of a pattern say words in a sentence with any way to
> be arranged being what says once to storing the words, and a token
> many ways to say the words organized another way. it has no
> equilivency to the idea of redundancy reduction at all if you think
> what proportional difference this can be. for example how effective it
> can be has nothing to do with how much redundancy there is. it's
> effective on account of how you can find a pattern part of another
> pattern to be like saying what's in both patterns once, but tokens
> that say the pattern every which way. and most importantly to figure
> what a proportion it has, it's to find anything that's not even part
> of the pattern except some to be where a curve says that but the rest
> as what's already there.
>
> see how that could of been compressed if every word showed up only
> once in the sphere, but all to say about how it writes is curves that
> draw through the sphere connecting the words... so just say math curve
> designations in sphere area.it's not just finding the longest strings
> in common to be a token, it's finding reorganized words. see it may
> seem the same to say once a word and token it, but it's not to say how
> a sentence is backwards and forwards as what has the setence said once
> but a different token. even though that's a few tokens right instead?
> that's not a proportion that compares to the idea of how much
> backwards what can be forwards the other way, but even not that way
> but in the middle to be what has after it some of another pattern
> already made for and before some of even another. like where you put
> them in the sphere maybe?
>
> it's another proportion though besides what redundancy reduction can
> even achieve in idea... because it's not working with that limit of
> seeing a pattern reorganized another way. see... it's putting words
> before and after the other way, but them to say other sentences any
> way that has those words is to be only the part of the sentence that
> isn't those words before and after where in the sphere you draw a
> curve that connects beginning, the middle part, then the end part as
> what.

I follow you perfectly, and I'm tellin ya again: that is way too CPU
intensive and the benefits are minimal. If you got all this spare time
why don't you dedicate it to video encoding. It's still far from
complete.

mcjason

unread,
Jul 29, 2008, 6:42:58 AM7/29/08
to

Overall though isn't compression usually finding the tradeoff to be at
an easy line to see...

a token for what repeats instead of each time it repeats, a token that
isn't anything else said for real to be data but mixes with data that
can be anything, so a token the way it's seperate from real data.

so the tradeoff is to find what is smaller to say a token for, for
what repeats, like how a token can mean something bigger than the
token.

so a token that's worth it instead of the real data, is a token
smaller than the real data but said mixed with real data, a token that
means something bigger instead of the token.

so a tradeoff of how a token is worth saying the way a token can be
said, for what is bigger than the token it says, for what is found to
have repeat occurances that it's better to have once what repeats and
a token for each time it repeats.

don't most compression's find that to be about right?
where it's to say a token smaller than what it means, because the
token means something else, where the token is mixes with what doesn't
have a token the way it's actually a token and not real data taken as
not a token.

so it's to find a ratio better of the idea... for what can have a
token instead, for how it is to say a token mixes with data, for how a
token is unique to each thing a token can mean, for how a token mixes
with real data is with real data that can take up the idea of what a
token can be, for what repeats as given a token how much it repeats
for how the token is smaller to say mixes with real data, to say what
repeats once for there to be a token for it. for what's worth it.


so that's as good a compression ratio they can have is around how that
idea works right?


I'm trying to find a way to say good how I think the idea I'm trying
to express is distinguishing from this idea very much...


it's mostly trying to say how the size of a curve line doesn't matter
as much for the idea of how the size of the blocks can be..

so there might be many small blocks like single characters, but then a
few big ones where the curved line makes a whole part out of them
together where the curved line is smaller.
but then it's also how with those same parts you can add other parts,
and have another curved line say something else, but with using some
of the parts there already. and parts there already weren't added
again. but now a curved line as the size says something like maybe
much bigger if parts there already can mostly add up to something with
only a few new parts.

how ultimately it's like trying to say every word there is in
english.. but like this...

where every prefix/suffix/base word/etc is seperated... then put each
part in a sphere right? add a space in there.. numbers and stuff too.
draw a spiral curve in there where a whole sentence fits together...
but keep going though...
how much do you think the size of the spiral would get for the size of
how much it says to see each point from where the spiral begins?
but even yet.. say whole sentences or two words together that show up
too often as a single block, as better yet for what the size of the
spiral is.

see how it's not even to think the spiral mixes with real data?
because if you make a sphere with parts and keep seperate from that
the spiral.

is anyone trying to see this right though for what it says....

"ewiruyweBBBBsdkjfbsadBBBBshfdsjkhnfsdBBBBsdfhksjBBBBsdjfhsdkjhfdBBBB"
so said like
"ewiruywe:BBBB:sdkjfbsad:shfdsjkhnfsd:sdfhksj:sdjfhsdkjhfd"

see how... see how for that it's to say those blocks, but as
seperated?
once BBBB right? a part that repeats.

so now to say the whole string... it's a curve that connects
"ewiruywe:BBBB:sdkjfbsad:shfdsjkhnfsd:sdfhksj:sdjfhsdkjhfd" parts
together....
so a curved line that does this...
from "ewiruywe" to "BBBB" to "sdkjfbsad" to "BBBB" to "shfdsjkhnfsd"
to "BBBB" to "sdfhksj" to "BBBB" to "sdjfhsdkjhfd" to "BBBB"

so see how the file compressed is saying each block, and a curved line
to connect the blocks?

see how though?

easy... I get it.

but now see this... ?

how big can this be for how many repeat occurances of BBBB there are ?

forever big right? so forever it can say that each BBBB is to have a
block seperated of before and after BBBB... right? but then it's to
not add BBBB again.
that's forever though... any size, that it's to see this tradeoff of
making the file smaller...
if seperatating BBBB is to say a block seperated like before and
after, but then a curved line already so big to say the blocks
together so far, is a curved line made more to say another two
blocks,
then it's to see in any size there can be more... only if seperating
before and after BBBB, not adding BBBB again, and saying the curved
line made more.. is smaller.

so if that's always smaller what? is it to see that in size of data
being any size... that even one that that has repeat occurances can
make the whole file smaller for each repeat occurance... without the
pigeonhole problem? for how this works?

if the block sepearation and curved line is worth saying one less
"BBBB" for how much bigger that is... then it's to take care of data
any size of repeating BBBB to make it smaller ?

Jim Leonard

unread,
Jul 29, 2008, 2:15:32 PM7/29/08
to
On Jul 29, 5:42 am, mcjason <mcja...@gmail.com> wrote:
> so if that's always smaller what? is it to see that in size of data
> being any size... that even one that that has repeat occurances can
> make the whole file smaller for each repeat occurance... without the
> pigeonhole problem? for how this works?

No.

Please get back on your meds.

0 new messages