Possible to make base64 pick the decode encoding automatically?

214 views
Skip to first unread message

hey...@gmail.com

unread,
Feb 2, 2021, 4:08:06 AM2/2/21
to golang-nuts
Hi,

I have an io.Reader whose content is encoded in base64 with encoding type unknown. Since there shouldn't be any ambiguity between the two, is it possible to make the base64 automatically pick the right one to decode?

Currently I have to read everything out to pin down the encoding, which defeats the purpose of using an io.Reader.

Is there a solution to this problem?

Thanks in advance.


roger peppe

unread,
Feb 2, 2021, 7:50:08 AM2/2/21
to hey...@gmail.com, golang-nuts
In case you find it helpful, here's a clone of the base64 command that I wrote in Go. I did it precisely because I wanted to be able to decode any encoding scheme interchangeably.


I agree that it might be useful to have some of this functionality available in the standard library.

  cheers,
    rog.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/0ccee37d-319e-41b3-9bfd-3dc46e0fad78n%40googlegroups.com.

Robert Engels

unread,
Feb 2, 2021, 8:37:08 AM2/2/21
to roger peppe, hey...@gmail.com, golang-nuts
Base64 is always ASCII. The encoded data may be in an arbitrary format. You need to pass additional metadata or try and detect its encoding. 

On Feb 2, 2021, at 6:50 AM, roger peppe <rogp...@gmail.com> wrote:



Axel Wagner

unread,
Feb 2, 2021, 9:17:19 AM2/2/21
to Robert Engels, roger peppe, hey...@gmail.com, golang-nuts
This question isn't about the decoded data, but about *which* base64 format is used - i.e. if it uses padding or not and what 2 characters are used outside of a-zA-Z0-9. The most common ones use +/ and -_, so it's easy to tell which is used and just accept either (and padding can be viewed as optional during decoding anyway).

Amnon

unread,
Feb 2, 2021, 11:43:23 AM2/2/21
to golang-nuts
Reading through a bufio.Reader is often useful for these situations.
You can peek the beginning of the input in order to decide which decoder to use.

Another option is to use the io.TeeReader to duplicate the reader,
and then send one copy to each decoder.
One will succeed, and give you the output.
But you will need to drain the one that fails to prevent the TeeReader form stalling.

Axel Wagner

unread,
Feb 2, 2021, 11:57:53 AM2/2/21
to Amnon, golang-nuts
Rogers approach seems like the best one to me - wrap the input in a custom `io.Reader` that transparently replaces `-_` with `+/` respectively (and drop trailing `=`). The bufio approach doesn't work, because there is no guarantee that one of the distinguishing characters is early in the stream and the "send it to multiple decoders" approach duplicates effort and wastes resources.

Robert Engels

unread,
Feb 2, 2021, 12:43:59 PM2/2/21
to Axel Wagner, Amnon, golang-nuts
What “padding” are you referring to? Each must be 2 characters. And there is a standard that covers this https://tools.ietf.org/html/rfc4648

On Feb 2, 2021, at 10:57 AM, 'Axel Wagner' via golang-nuts <golan...@googlegroups.com> wrote:



Axel Wagner

unread,
Feb 2, 2021, 1:35:30 PM2/2/21
to Robert Engels, Amnon, golang-nuts
On Tue, Feb 2, 2021 at 6:43 PM Robert Engels <ren...@ix.netcom.com> wrote:
What “padding” are you referring to? Each must be 2 characters. And there is a standard that covers this https://tools.ietf.org/html/rfc4648

Yes, there indeed is. Section 5 describes a second encoding scheme, used for URLs and the like. Section 3.2 also talks about the padding I'm referring to (it's defined elsewhere in the standard) and mentions that, in certain situations, it can be omitted. In particular, you can omit padding and, in the decoder, implicitly pad to a multiple of 4 bytes.

I don't really understand what's the argument is here. The question was if it is possible to handle all four encoding schemas supported by the Go base64 package in one swoop, because as-is, the API requires you to pick one schema and just see if it returns an error. Roger provided, IMO, a pretty good answer to that: You can wrap the io.Reader in one that transparently rewrites any of the four into one well-known one, which can then be handled by the corresponding decoder. His link provides the code for an implementation of such a reader.

hey...@gmail.com

unread,
Feb 2, 2021, 7:04:55 PM2/2/21
to golang-nuts
Your translate reader works really well, thanks for sharing it.

I have seen code that tried to decode base64 four times in the wild, which led me to posting this, hope something like this could be incorporated into the standard library. 

robert engels

unread,
Feb 3, 2021, 12:10:32 AM2/3/21
to Axel Wagner, Amnon, golang-nuts
Sorry it just doesn’t “feel right”. There are different encoding scheme as laid out in the RFC. and other RFCs that cover their uses.

If you have a system that states “send us Base64 data” it is poorly specified - better to state, send us Base64 data according to RFC 4648 base64url format or according to RFC-2045.

In fact, the RFC states:

"This encoding may be referred to as "base64url".  This encoding
   should not be regarded as the same as the "base64" encoding and
   should not be referred to as only "base64".  Unless clarified
   otherwise, "base64" refers to the base 64 in the previous section.”

It also states:

"If non-alphabet characters are ignored, instead of causing rejection
   of the entire encoding (as recommended), a covert channel that can be
   used to "leak" information is made possible."

So having a “meta/relaxed decoder” usually leads to specification/interoperability/security problems down the road. I realize that in the “real world” you are often forced to interoperate with these “bad” systems, but as most things in Go, better to be explicit and report errors rather than be clever.



hey...@gmail.com

unread,
Feb 3, 2021, 1:26:10 AM2/3/21
to golang-nuts

> So having a “meta/relaxed decoder” usually leads to specification/interoperability/security problems down the road
I respectfully disagree. Since it's only relaxed with regard to decoding, it follows the robustness principle where you be liberal in what you accept.

Within a system, the encoding should be explicitly defined, but when that system has to consume base64 data from outside, being liberal actually avoids interoperability problems.

Wojciech S. Czarnecki

unread,
Feb 3, 2021, 7:37:45 AM2/3/21
to golan...@googlegroups.com
Dnia 2021-02-02, o godz. 22:26:10
"hey...@gmail.com" <hey...@gmail.com> napisał(a):

> > So having a “meta/relaxed decoder” usually leads to
> specification/interoperability/security problems down the road

> I respectfully disagree. Since it's only relaxed with regard to decoding,
> it follows the robustness principle where you be liberal in what you accept.

I disagree with such disagreement in this (security) context.
"Robustness" stated as "accept lousy data" is against security principle "vet your input thorough".

> Within a system, the encoding should be explicitly defined, but when that
> system has to consume base64 data from outside, being liberal actually
> avoids interoperability problems.

In security context "avoids interoperability problems" may morph to more accurate "avoids preventing access to our systems by an adversary" - as adversaries are known to eagerly and clandestinely interoperate with our software using whatever means we left them to exploit. (Off the hat example: consuming "liberal" JSON input may allow an attacker to disrupt data guarded by a simple MAC scheme.)

TC,

--
Wojciech S. Czarnecki
<< ^oo^ >> OHIR-RIPE

Axel Wagner

unread,
Feb 3, 2021, 8:05:32 AM2/3/21
to golang-nuts
Maybe it helps to point out that the statements "you should design your system to thoroughly validate input and reject it if it's invalid" and "there are contexts, where trying to be as flexible as possible in trying to make sense of an input" can both be true.

For example, I would agree that if you build a service, you should generally try to not to be too liberal in what you accept. Because Hyrum's Law implies that over time, this would create a more and more complex de-facto standard you (and others) need to implement.
But I would also argue, that it's useful to have an interactive tool you can just throw things to and have it try and make sense of it - if I just find a Base64 string on the internet and want to know what it means, it's inconvenient to have to fully specify or convert the format, especially if I'm just doing what that tool would be doing anyway.
So, this argument is a false dichotomy. Both ends of the spectrum have their place in practice.

I would also emphasize that we should give ways to the actual question. It is frustrating to ask "how can I do X" and only being told "you shouldn't want X". Of course we can present arguments for why we think X is not a good idea, but answering the question should be first and foremost.
And we can always have a thorough discussion of whether such an API should exist or not, once someone proposes to add it to the stdlib.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Wojciech S. Czarnecki

unread,
Feb 3, 2021, 1:37:12 PM2/3/21
to golan...@googlegroups.com, Axel Wagner
Dnia 2021-02-03, o godz. 14:04:42
"'Axel Wagner' via golang-nuts" <golan...@googlegroups.com> napisał(a):

> Maybe it helps to point out that the statements "you should design your
> system to thoroughly validate input and reject it if it's invalid" and
> "there are contexts, where trying to be as flexible as possible in trying
> to make sense of an input" can both be true.

Of course that often both are desired. Just rarely we can attain both at once.
Thats why I emphasized "in this (security) context".

As for the OP problem: the best would be to have a flexible decoder that also
returns an indicator what unexpected format discrepancy it forgave. This would
be easy to do for static data; I see no easy solution for such signaling on streams
though.
Reply all
Reply to author
Forward
0 new messages