Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

OT - frequency statistics over Base64 encoded files

90 views
Skip to first unread message

Soviet_Mario

unread,
Mar 4, 2021, 12:57:25 PM3/4/21
to

I would be willing to use a very slightly modified version
of Base64 to encode binary in a PRINTABLE way, with only
slight expansion (4/3).

The output format would have to be amenable to both plain
text editors and web browers. Most dislike huge LINES
without interruptions (newline or carriage return or both)

So two of the standard set of symbols of Base 64 have to be
replaced.

I am chosing symbol 62 and 63, respectively '+' and '/' to
be replaced by either "SPACE" (ascii 32) or "NEWLINE" (ascii
10).

Do someone happen to know if they (62 and 63) are equivalent
as to their FREQUENCY in encoded "real world files ?".
I dunno what "real world" means here : I will be encoding
mainly PDF (zipped or not ?) ODT (zipped I assume), and .7Z
compressed.
This latter will maybe approximate a RANDOM BYTE file, flat
in spectrum of frequencies of symbols.

But, even assuming such an even distribution of 256 symbols
in the source file, would Base64 produce an equally even
distribution of output printable file ?
I have not enough math to extimate this by myself.

I ask this, as I would prefer to assign SPACE to the more
frequent between 62-th and 63-th symbol, and NEWLINE to the
less frequent. This would be renderized in windows as a
smaller number of longer text lines (rather than a great
number of shorter lines).


If the question is no relevant, then the choice would become
arbitrary.
I don't need any "demonstration", just opinions would be enough.

BTW : I am also assuming NEWLINE is the most conveniente and
portable line breaker symbol (over CR 13 or a couple CRLF
13-10). Is it wise ?


--
1) Resistere, resistere, resistere.
2) Se tutti pagano le tasse, le tasse le pagano tutti
Soviet_Mario - (aka Gatto_Vizzato)

marrgol

unread,
Mar 4, 2021, 2:29:07 PM3/4/21
to
On 2021-03-04 at 18:57, Soviet_Mario wrote:
>
> I would be willing to use a very slightly modified version
> of Base64 to encode binary in a PRINTABLE way, with only
> slight expansion (4/3).
>
> The output format would have to be amenable to both plain
> text editors and web browers. Most dislike huge LINES
> without interruptions (newline or carriage return or both)
>
> So two of the standard set of symbols of Base 64 have to be
> replaced.

Why? You can insert newlines anywhere in the encoded data
and make the lines as long (or as short) as you want without
touching the encoding at all. And if your decoder has an option
to ignore all characters not used by the base64 alphabet, you can
insert spaces anywhere too.

E.g. (with base64 from coreutils on linux):

$ echo -e "U292 aWV0 X0\n1hc mlvCg\n ==" | base64 -di


--
mrg

Jasen Betts

unread,
Mar 4, 2021, 3:00:53 PM3/4/21
to
most frequent is 0. probably 63 is second most frequent.

Anyway, encode with base 85 instead and just add line breaks at
regular intervals. base 85 gives a ratio of 5 symbols to 4 bytes. so by adding a
2 character break after every 75 symbols that's 60 bytes for 77 characters
slightly better than the 80 symbols used for base 64.

> BTW : I am also assuming NEWLINE is the most conveniente and
> portable line breaker symbol (over CR 13 or a couple CRLF
> 13-10). Is it wise ?

Does "NEWLINE" occur in unicode somewhere?

LF is posix standard, CRLF is internet standard.

--
Jasen.

jak

unread,
Mar 4, 2021, 8:57:42 PM3/4/21
to
Hi
instead of modifying base64 you could use the method by which emails are
encoded when the 'Content-Teransfer-Encoding' uses base64. The text is
wrapped every 72 characters. These are some example lines. It is an
attachment of a piece of .pdf file:

QWdnaXVuZ28gZGVsbCdhbHRybyB0ZXN0bzoKCnRvdGFsZSA0MjgKZHJ3eC0tLS0tLSAgMyBq
YWsgIGphayAgIDQwOTYgYXByICAzICAyMDE2IC5hZG9iZQpkcnd4cnd4ci14ICAzIGphayAg
amFrICAgNDA5NiBtYWcgMjQgMDk6MzggQXBwRGF0YQotcnctcnctci0tICAxIGphayAgamFr
ICAgICA4NiBhcHIgMTUgMTk6NTUgLmFzb3VuZHJjCi1ydy0tLS0tLS0gIDEgamFrICBqYWsg
IDQzNjc1IGx1ZyAgMiAxNTo1NCAuYmFzaF9oaXN0b3J5Ci1ydy1yLS1yLS0gIDEgamFrICBq
YWsgICAgMjIwIGFwciAgMiAgMjAxNiAuYmFzaF9sb2dvdXQKLXJ3LXJ3LXItLSAgMSBqYWsg
IGphayAgICAgNDcgYXByICAzICAyMDE2IC5iYXNoX3Byb2ZpbGUKLXJ3LXItLXItLSAgMSBq
.
.
.

cheers

Stefen Petruzzellis - fretwizer

unread,
Mar 4, 2021, 11:32:06 PM3/4/21
to
I generally go post by post unless someone has a past of especially extreme
irrationality. With Beauregard T. Shagnasty, I already realize what his
MO is, all he seeks is abuse and, as you have seen, he'll do _anything_
to get it. His most favorite routine is to play 'martyr' but reality shows
it's all of usenet who are his marks. i3wm is clearly my second favorite
DE and the only one I recommend to Linux newbies. Primary interface is
LXQt, though. A way to KF Beauregard T. Shagnasty is all you need after
the initial learning curve. Lots of morons are posting about the Linux
"ecosystem" as if it seriously would be a good thing.

I bet Beauregard T. Shagnasty does not even know what is wrong with Clinton.

He disturbs multiple groups of people who are like the grass when elephants
fight, but that's a stuck up jerk for you. What you and I care about isn't
a factor.

chrisv shared a specific series of signature lines several times now,
and invited Beauregard T. Shagnasty to show which ones aren't accurate
and provide the evidence to back his allegation. Not one single time has
Beauregard T. Shagnasty done so.

--
This Trick Gets Women Hot For You
https://duckduckgo.com/?q=dustin%20cook%20functionally%20illiterate%20fraud
<https://www.truepeoplesearch.com/results?name=4234911448&Diesel_Gremlin_Dustin_James_Cook>
<https://www.truepeoplesearch.com/details?phoneno=4234911448&rid=0x0&Diesel&Gremlin&Dustin_Cook>
Steve Carroll the Racist Swine

Soviet_Mario

unread,
Mar 5, 2021, 12:01:40 PM3/5/21
to
Il Thu, 04 Mar 2021 20:29:04 +0100, marrgol ha scritto:

> On 2021-03-04 at 18:57, Soviet_Mario wrote:
>>
>> I would be willing to use a very slightly modified version of Base64
>> to encode binary in a PRINTABLE way, with only slight expansion (4/3).
>>
>> The output format would have to be amenable to both plain text editors
>> and web browers. Most dislike huge LINES without interruptions (newline
>> or carriage return or both)
>>
>> So two of the standard set of symbols of Base 64 have to be replaced.
>
> Why? You can insert newlines anywhere in the encoded data and make the
> lines as long (or as short) as you want without touching the encoding at
> all.

mmm ... the original idea was to also deploy some HASHES (sha256 and
alike) together with the data, so that an user could detect
"corruption" (alteration).

In the case you suggest, I would have to hash the file before or after
manually adding sth to the content ?
Also I would be unhappy of increasing the SIZE of some (possibly) huge
attachments.

> And if your decoder

As of now I was writing from scratch my own 64 symbols (using spaces and
newlines in place of PLUS and SLASH).
Later I recalled the existence of Base64 and I had the idea of just
piping its output, replacing PLUS and SLASH with, not respectively,
SPACE / NEWLINE).

> has an option to ignore all characters not
> used by the base64 alphabet, you can insert spaces anywhere too.
>
> E.g. (with base64 from coreutils on linux):
>
> $ echo -e "U292 aWV0 X0\n1hc mlvCg\n ==" | base64 -di

I'll try this, cannot figure out,
Tnx






--
la firma la setto dopo

Soviet_Mario

unread,
Mar 5, 2021, 12:06:58 PM3/5/21
to
I will look for this encoding, to see if the content can also be hashed
and the conversion have direct / reverse (I mean convert TO / FRO)
standard tools. Tnx

> The text
> is wrapped every 72 characters. These are some example lines. It is an
> attachment of a piece of .pdf file:
>
> QWdnaXVuZ28gZGVsbCdhbHRybyB0ZXN0bzoKCnRvdGFsZSA0MjgKZHJ3eC0tLS0tLSAgMyBq
> YWsgIGphayAgIDQwOTYgYXByICAzICAyMDE2IC5hZG9iZQpkcnd4cnd4ci14ICAzIGphayAg
> amFrICAgNDA5NiBtYWcgMjQgMDk6MzggQXBwRGF0YQotcnctcnctci0tICAxIGphayAgamFr
> ICAgICA4NiBhcHIgMTUgMTk6NTUgLmFzb3VuZHJjCi1ydy0tLS0tLS0gIDEgamFrICBqYWsg
> IDQzNjc1IGx1ZyAgMiAxNTo1NCAuYmFzaF9oaXN0b3J5Ci1ydy1yLS1yLS0gIDEgamFrICBq
> YWsgICAgMjIwIGFwciAgMiAgMjAxNiAuYmFzaF9sb2dvdXQKLXJ3LXJ3LXItLSAgMSBqYWsg
> IGphayAgICAgNDcgYXByICAzICAyMDE2IC5iYXNoX3Byb2ZpbGUKLXJ3LXItLXItLSAgMSBq


nice ! I don't see the PLUS and HASH simbols. Is it just a coincidence ?
Which tool did you use to get this nice encoding ?

> .
> .
> .
>
> cheers

marrgol

unread,
Mar 5, 2021, 2:52:05 PM3/5/21
to
On 2021-03-05 at 18:01, Soviet_Mario wrote:
>>> I would be willing to use a very slightly modified version of Base64
>>> to encode binary in a PRINTABLE way, with only slight expansion (4/3).
>>>
>>> The output format would have to be amenable to both plain text editors
>>> and web browers. Most dislike huge LINES without interruptions (newline
>>> or carriage return or both)
>>>
>>> So two of the standard set of symbols of Base 64 have to be replaced.
>>
>> Why? You can insert newlines anywhere in the encoded data and make the
>> lines as long (or as short) as you want without touching the encoding at
>> all.
>
> mmm ... the original idea was to also deploy some HASHES (sha256 and
> alike) together with the data, so that an user could detect
> "corruption" (alteration).
>
> In the case you suggest, I would have to hash the file before or after
> manually adding sth to the content ?

Err on the side of caution and do both. :-)

> Also I would be unhappy of increasing the SIZE of some (possibly) huge
> attachments.

You are willing to increase the size of the original by 33.3% just by
encoding it; adding an LF after every 72 output characters will make
it 35.2% (or less than that if you make the lines longer). That's less
than 15kB more per 1MB of the encoded file -- is it really unacceptable?


--
mrg

jak

unread,
Mar 7, 2021, 5:18:07 AM3/7/21
to
The linux tool called "base64" (see: man base64) can do this for you.
With it you can choose which column to wrap the encoded text. It also
decodes. However, I had the need to have it both on Win, linux and osx,
so I wrote this routine in C language and if you need it I'll show you
the source code.

Steve - frelwizzen

unread,
Mar 7, 2021, 1:23:12 PM3/7/21
to
This is what Carroll does: he points to comment ABOUT people (made from
real people or socks), no matter how unsupported, and calls that evedence
even as he ignores what the target of his accusations actually has said.

Meanwhile I just point to what Carroll says:

<rrdeqo$kog$1...@fretwizzer.eternal-september.org>

And the reply:

<zctCH.2588$rY1...@fx40.iad>

Carroll admits to, at the very least, threats of harassing my employer,
then tries to minimize it by saying multiple posts somehow count as just
"one incident" of his outrageous behavior.

Then Carroll bizarrely insists people should WANT to be harassed by him
because that gives them a chance to prove they are innocent. Utter insanity!

He directly admits to the threats but tries to pretend he did not follow
through, but he gives away his game with this comment:

-----
...here you *still* are, trying to make it appear that *I* should
feel guilty for blowing the whistle
-----
He refers to his harassment as "blowing the whistle" -- but does not speak
of feeling guilty IF he had done so, but speaks about HAVING done so. He
tried to deny it, but then made it clear he harassed my employer.

To add more support to his admission there, he claims to not understand
how harassing people is bad, or how innocent people can be harmed by his
harassment. He admits he cannot see what is wrong with such harassment,
but then pretends he was stopped by some anonymous Jiminy Cricket who made
him rethink how wrong his actions are. Why does he contradict himself so
much?

In short: He proves much of what I have been saying about him right there.
His words. No need to dig back.

After he screwed up, and as predicted, "new" people showed up. While he
currently is denying he uses such socks, in the past he has admitted to
it:

<d278fe60-ce88-403d...@googlegroups.com>
-----
I faked those Mac accounts, (i.e Toasty) because you
deserved it
-----

Yeah, he does bad things because his victims deserve it. That is his worldview...
he needs his anonymous Jiminy Cricket to stop him. He cannot stop himself.

--
What Every Entrepreneur Must Know
https://www.bing.com/search?q=dustin+cook+the+functionally+illiterate+fraud
Dustin Cook the functional illiterate fraud

Michael Glaser

unread,
Mar 10, 2021, 11:41:55 AM3/10/21
to
You insisting a lie is real does not make it true. LOL! Give it up, man...
even you have kill filtered that idiot. Given what he is (and does) no one
should blame anyone for pushing to be rid of his type of stupidity.

David is a insane lunatic who repeatedly claims anyone who disagrees with
Troll Killer Snit to be a sock and, somehow, he wants us to believe that
makes sense. Gotta be drugs. Already moved on from that. You are too slow!

Believe it or not, just recently David was declaring himself the only "true
Linux advocate", and claiming that "advocates" (in satire-quotes) are trying
to rewrite ChromeOS from scratch. Despite all the gloating David's done
on this topic, the 'Social Media Manager' does not get how to do this. It
seriously takes a couple seconds to click and drag across text and 'Google'
it.

--
Eight things to never feed your cat!
https://www.google.com/search?q=steve+carroll+the+racist+swine
https://www.bing.com/search?q=%22FUNCTIONAL%20ILLITERATE%20FRAUD%22
https://www.google.com/search?q=dustin%20cook%20functionally%20illiterate%20fraud
Steve Petruzzellis the Racist Swine

Stephen - frelwizzer

unread,
Mar 10, 2021, 10:27:32 PM3/10/21
to
Yup. Sadly this is what we have to put up with. Trolls who clearly have
no reason for being here other than to attack Desk Rabbit. Desk Rabbit
clearly has a ton of wisdom to teach and he could learn to answer questions.
Of course this is without a doubt the least appropriate group for doing
that because too much of the responses are wailing, word twisting and
other phooey. Now let's see, who knows AppleScript, is an insomniac, has
forged Desk Rabbit's ID, has poor impulse control and is a colossal conniption-
fit throwing loser who posts ignorant claptrap even when he's NOT flooding...
AND... who struggles to blame everything he is doing on "advocates" and
has for over a decade? And in response you have nothing but a crack to
start something.


-
I Left My Husband & Daughter At Home And THIS happened
https://www.youtube.com/watch?v=3NmOycD4yKU
https://swisscows.com/web?query=steve%20carroll%20%22racist%20swine%22
Racist Swine Steve Carroll

Steve Petruzzellis - fretwizer 3494

unread,
Mar 10, 2021, 10:43:34 PM3/10/21
to
On Sunday, March 7, 2021 at 3:18:07 AM UTC-7, jak wrote:
He is incontrovertibly dishonest, he got called out on it and he's doing
the expected BS taught in https://goo.gl/cc6Noz as he strives to be given
a smattering of honor... but it will not work. Recently I did work on and
showed some JavaScript for the front end which is the only thing you can
do when trying to avoid Steve 'Racist Swine' Petruzzellis's nonstop crap
while reading with Google Groups. Linux offers the least of everything to
the average user. I'm about to plonk him, myself. Like all idiots, he's
repeatably looking for some way to attack, no matter how absurd the accusation.
I will not read his response to this post. He is embarrassed, wants to save
face, and will lash out. Most likely opening with a conceited "LOL", as
if what I have written is SO crazy. That BADish "response" was the final
stroke, for me.

--
My Snoring Solution
https://duckduckgo.com/?q=Steve+Petruzzellis%3A+racist+swine
https://duckduckgo.com/?q=dustin+cook%3A+functionally+illiterate+fraud
0 new messages