adapted base64 encoding

217 views
Skip to first unread message

lukas....@gmail.com

unread,
Jul 10, 2013, 8:15:26 PM7/10/13
to passli...@googlegroups.com
Hi!
I've got some questions according to this endoding.

Is it your own invention or there is a rfc or other standard for it?
If it is your own invention why you decided to create another "b64like" standard (slightly shorter output is not a good reason for me) ?

"the output of this function is identical to stdlib’s b64_encode, except that it uses . instead of +, and omits trailing padding = and whitepsace."
Maybe it would be better to change word identical to similar.

"This is enlarged by appromixately 4/3 by the base64 encoding, resulting in a checksum size of 27, 43, and 86 for each of the respective algorithms listed above."
Since you ommit padding it is not exactly 27, 43, 86 but at most. As you see decision to trail output is error prone.

Lukas

P.S.
No offence I really like your lib, thank you for working on it!

Eli Collins

unread,
Jul 11, 2013, 5:27:30 PM7/11/13
to passli...@googlegroups.com, lukas....@gmail.com
Hi!

Happy to explain that design decision, it was an irritating one to make.

That encoding is one of my own invention, and the docstring for ab64_encode() is the specification.  That's why the docstring uses the phrasing "identical ... except ...", it's not informally saying the encoding is similar in some way, it's laying out exactly how the encoding is formed: take base64 as defined by RFC4648, replace index-62 character "+" with a ".", omit the padding "=" and any whitespace.  The output should in all other ways be identical to the base64 spec, or it's a bug in the code.  I don't feel this was treading new ground, since there's already a wide range of base64 variants which make modifications like this, and the RFC acknowledges such modifications may be needed for various domain-specific purposes.

As to why I even bothered? Sadly the answer is backwards compatibility.

My apologies for the digression, but a little background...  Originally most Unix systems used the des_crypt hash to store passwords.  It encoded it's data using a (never formalized) scheme that was similar to base64, but used the charset "A-Z0-9./", no padding character, big-endian octet-packing, and a character map that's completely different from base64.  For some reason, md5_crypt (des_crypt's primary successor) kept the custom character map, but swapped to little-endian octet packing (ala base64), and added a crazy byte-transposition step as well.  Bcrypt couldn't leave well enough alone either: it went back to des_crypt's encoding scheme, uses the same charset, but completely rearranged the character map yet again.  Gaaah!  It was maddening getting all those implemented in passlib, and to work up test vectors -- there are no standards or good references for any of those encodings.

So when I sat down to design a PBKDF2-based hash format, I decided I needed to retain the same character set that had been in use since des_crypt, in order to minimize the chances of storage incompatibilities.  But I wanted this format to be portable and easily to reimplement, so instead of using one of des_crypt / md5_crypt / bcrypt's crazy encoding schemes, I settled on standard base64 plus the two small changes.  That way, other pbkdf2_sha256 implementations could leverage their language's existing base64 routines, and just make a couple of simple character replacements afterwards.  Hence the "ab64" scheme.

Regarding the padding characters, I'm not sure what you mean by "Since you omit padding it is not exactly 27, 43, 86 but at most. As you see decision to trail output is error prone." The digest portion of a pbkdf2_sha256 hash will always be 32 raw bytes, and will always require 43 characters when encoded (the governing equation is `encoded_size = ceil(raw_size * 4/3.0)`, the rounded portion corresponds to 0, 8, or 16 unused bits encoded into the resulting string). Because the size is already known, the pbkdf2_sha256 parser will throw an error if the digest is the wrong size, and the trailing "=" characters are just wasted space. This is allowed by RFC4648 Section 3.2, but more to the point, RFC4648 Section 5 Paragraph 3 states that if the correct length is known implicitly, then the padding can be omitted.

Hope that clears things up!

- Eli


On Wednesday, July 10, 2013 8:15:26 PM UTC-4, lukas....@gmail.com wrote:
Hi!
I've got some questions according to this endoding.

Is it your own invention or there is a rfc or other standard for it?
If it is your own invention why you decided to create another "b64like" standard (slightly shorter output is not a good reason for me) ?

"the output of this function is identical to stdlib’s b64_encode, except that it uses . instead of +, and omits trailing padding = and whitepsace."
Maybe it would be better to change word identical to similar.

Lukas Odzioba

unread,
Jul 18, 2013, 7:28:42 AM7/18/13
to Eli Collins, passli...@googlegroups.com
2013/7/11 Eli Collins <el...@astllc.org>:
> Happy to explain that design decision, it was an irritating one to make.
>
> That encoding is one of my own invention, and the docstring for
> ab64_encode() is the specification. That's why the docstring uses the
> phrasing "identical ... except ...", it's not informally saying the encoding
> is similar in some way, it's laying out exactly how the encoding is formed:
> take base64 as defined by RFC4648, replace index-62 character "+" with a
> ".", omit the padding "=" and any whitespace. The output should in all
> other ways be identical to the base64 spec, or it's a bug in the code. I
> don't feel this was treading new ground, since there's already a wide range
> of base64 variants which make modifications like this, and the RFC
> acknowledges such modifications may be needed for various domain-specific
> purposes.

Becauese translation of "identical.. except.." sounds a bit weird in
my native language I just wanted to ask about that, thanks for
explainations.

> So when I sat down to design a PBKDF2-based hash format, I decided I needed
> to retain the same character set that had been in use since des_crypt, in
> order to minimize the chances of storage incompatibilities. But I wanted
> this format to be portable and easily to reimplement, so instead of using
> one of des_crypt / md5_crypt / bcrypt's crazy encoding schemes, I settled on
> standard base64 plus the two small changes. That way, other pbkdf2_sha256
> implementations could leverage their language's existing base64 routines,
> and just make a couple of simple character replacements afterwards. Hence
> the "ab64" scheme.

Ok now it is all clear to me.

> Regarding the padding characters, I'm not sure what you mean by "Since you
> omit padding it is not exactly 27, 43, 86 but at most. As you see decision
> to trail output is error prone." The digest portion of a pbkdf2_sha256 hash
> will always be 32 raw bytes, and will always require 43 characters when
> encoded (the governing equation is `encoded_size = ceil(raw_size * 4/3.0)`,
> the rounded portion corresponds to 0, 8, or 16 unused bits encoded into the
> resulting string).

Of course you're right, I misunderstood that.

> Hope that clears things up!

Yes, thank you very much and sorry for inconvenience,
Lukas
Reply all
Reply to author
Forward
0 new messages