[ruby-core:67570] [ruby-trunk - Bug #10740] [Open] Base64 urlsafe methods are not urlsafe

drago...@gmail.com

unread,

Jan 13, 2015, 6:07:55 PM1/13/15

to ruby...@ruby-lang.org

Issue #10740 has been reported by Scott Blum.

----------------------------------------
Bug #10740: Base64 urlsafe methods are not urlsafe
https://bugs.ruby-lang.org/issues/10740

* Author: Scott Blum
* Status: Open
* Priority: Normal
* Assignee:
* ruby -v: ruby 2.1.3p242 (2014-09-19 revision 47630) [x86_64-darwin14.0]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
Base64.urlsafe_decode64 is not to spec, because it currently REQUIRES appropriate trailing '=' characters.
Base64.urlsafe_encode64 produces trailing '=' characters.

'=' is not web safe, and is not recommended for base64url. Some specs even disallow.

Suggested fix:

~~~
# Returns the Base64-encoded version of +bin+.
# This method complies with ``Base 64 Encoding with URL and Filename Safe
# Alphabet'' in RFC 4648.
# The alphabet uses '-' instead of '+' and '_' instead of '/'
# and has no trailing pad characters.
def urlsafe_encode64(bin)
strict_encode64(bin).tr("+/", "-_").tr('=', '')
end

# Returns the Base64-decoded version of +str+.
# This method complies with ``Base 64 Encoding with URL and Filename Safe
# Alphabet'' in RFC 4648.
# The alphabet uses '-' instead of '+' and '_' instead of '/'.
# Trailing pad characters are optional.
def urlsafe_decode64(str)
str = str.tr("-_", "+/")
str = str.ljust((str.length + 3) & ~3, '=')
strict_decode64(str)
end
~~~

--
https://bugs.ruby-lang.org/

drago...@gmail.com

unread,

Jan 13, 2015, 6:11:35 PM1/13/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Scott Blum.

Note that SecureRandom.urlsafe_base64 does the right thing by default, with the note "By default, padding is not generated because "=" may be used as a URL delimiter."

----------------------------------------
Bug #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-50981

drago...@gmail.com

unread,

Jan 13, 2015, 6:16:00 PM1/13/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Scott Blum.

https://github.com/ruby/ruby/pull/815

----------------------------------------
Bug #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-50982

a...@fsij.org

unread,

Jan 13, 2015, 7:45:00 PM1/13/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Akira Tanaka.

File base64-urlsafe-encode64-search-result.txt added

I like this feature.
(I think this issue is a feature, not a bug.)

However I think the current behavior should be choosable for compatibility.

I searched Base64.urlsafe_encode64 in gems: base64-urlsafe-encode64-search-result.txt
Not all use removes "=".
I guess some will have problem if we change the behavior.

----------------------------------------
Bug #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-50984

---Files--------------------------------
base64-urlsafe-encode64-search-result.txt (19.9 KB)

--
https://bugs.ruby-lang.org/

ma...@ruby-lang.org

unread,

Jan 14, 2015, 12:43:58 AM1/14/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Yusuke Endoh.

Tracker changed from Bug to Feature
Status changed from Open to Feedback
Assignee set to Yusuke Endoh

Hello, I'm a maintainer of lib/base64.

I don't think that this is a bug. RFC 4648 is still the latest standard of Base64. (Note that RFC 6920 does not obsolete RFC 4648.) Because lib/base64 is an implementation of Base64, it should comply with RFC 4648, at least, by default. Moving to the feature tracker.

I found Python's ticket about the same issue: http://bugs.python.org/issue1661108
They decided to follow the spec, as-is, even though it looks broken. I respect them.

That being said, I understand that the current behavior is not useful for some people. I don't think it is a good idea to change the behavior because of compatibility issue (as akr said), but I'm happy to add something like "no padding" option. However, RFC 4648 also says:

> The pad character "=" is typically percent-encoded when used in an
> URI [9], but if the data length is known implicitly, this can be
> avoided by skipping the padding; see section 3.2.

I have no idea what it is talking about; the data length is known with or without padding. But spec is spec. According to it, I think urlsafe_decode64 must receive the data length argument. I have no idea how the method should handle the argument, though ;-( I'm unsure if this is a right direction.

Related discussion: http://stackoverflow.com/questions/4080988/why-does-base64-encoding-requires-padding-if-the-input-length-is-not-divisible-b

So, I'm uncertain what to do. Any idea?

--
Yusuke Endoh <ma...@ruby-lang.org>

----------------------------------------
Feature #10740: Base64 urlsafe methods are not urlsafe
https://bugs.ruby-lang.org/issues/10740#change-50986

* Author: Scott Blum
* Status: Feedback
* Priority: Normal
* Assignee: Yusuke Endoh

bas...@gmail.com

unread,

Jan 14, 2015, 2:52:57 PM1/14/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Tony Arcieri.

Hi Yusuke,

RFC6920 is just an example of an RFC which refers to RFC4648 and stipulates that something encoded in base64url MUST NOT be padded. According to RFC4648 this is allowed.

Specifically in the case of RFC6920, the data length is known implicitly because we are parsing the data out of a URI.

I don't think there is a need to pass the length in as a parameter. I just think that Base64.urlsafe_decode64 should tolerate unpadded inputs.

----------------------------------------
Feature #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-51010

drago...@gmail.com

unread,

Jan 14, 2015, 2:56:45 PM1/14/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Scott Blum.

I suspect the reason the spec is that way is that it's easier to calculate what the decoded length will be if the encoding is always divisible by 4, since it's just `(encoded_len / 4) * 3`. It makes more since in the context of wire protocols such as email MIME where base64 originally came from. In a language like Ruby where strings have lengths the data length is always known so I suspect it's less relevant.

It is worth noting that SecureRandom.urlsafe_base64 has an optional `padding` parameter which defaults to false. I think ideally we should follow that example, and default to no padding on the encode side. But if that's too risky we could default padding to false to maintain the current behavior.

On the decoding side, it seems like a no-brainer to be lenient and fill in the proper padding. Otherwise, you have the bizarre situation where:

`Base64.urlsafe_decode64(SecureRandom.urlsafe_base64(len) # raises if len % 3 != 0`

----------------------------------------
Feature #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-51011

ma...@ruby-lang.org

unread,

Jan 14, 2015, 10:47:06 PM1/14/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Yusuke Endoh.

My point is so simple: lib/base64 should comply with RFC 4648 as far as possible. Please explain your proposal based on RFC 4648 instead of RFC 6920 (that is NOT a spec of Base64), the behavior of the other libraries, etc. If you think RFC 4648 is unreasonable, please tell it to IETF.

Tony Arcieri wrote:
> According to RFC4648 this is allowed.

I know. RFC 6290 makes such an exception. But there is no reason why THIS library does so. Note that this library is general-purpose, not for a specific use case such as an URL.

Scott Blum wrote:
> Otherwise, you have the bizarre situation where:
>
> `Base64.urlsafe_decode64(SecureRandom.urlsafe_base64(len) # raises if len % 3 != 0`

The situation itself is unfortunate.

I noticed that RFC 4648 does not mention the case where the padding lacks. It just says that the library MAY ignore extra paddings, though.

> If more than the allowed number
> of pad characters is found at the end of the string (e.g., a base 64
> string terminated with "==="), the excess pad characters MAY also be
> ignored.

So, it might be acceptable to tolerate unpadded input. Of course, we must still care about a compatibility issue.

--
Yusuke Endoh <ma...@ruby-lang.org>

----------------------------------------
Feature #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-51018

bas...@gmail.com

unread,

Jan 15, 2015, 11:45:09 PM1/15/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Tony Arcieri.

Hi Yusuke,

The specific text in RFC4648 is here:

"Implementations MUST include appropriate pad characters at the end of encoded data **unless the specification referring to this document explicitly states otherwise.**"

There is a very specific allowance in RFC4648 to support unpadded base64url encoding for *any* RFC which chooses to omit it.

----------------------------------------
Feature #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-51038

ma...@ruby-lang.org

unread,

Jan 16, 2015, 8:18:28 AM1/16/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Yusuke Endoh.

File urlsafe_base64.patch added

Tony Arcieri wrote:
> My interpretation of RFC4648 would suggest this behavior:
>
> Base64.urlsafe_encode64(bin) should produce padded output like it does today
> Base64.urlsafe_decode64(str) should work on both padded and unpadded inputs,

Thank you, sounds reasonable. I like the behavior of Java's Base64.Decoder:

https://docs.oracle.com/javase/8/docs/api/java/util/Base64.Decoder.html

> The Base64 padding character '=' is accepted and interpreted as the end of the encoded byte data, but is not required. So if the final unit of the encoded byte data only has two or three Base64 characters (without the corresponding padding character(s) padded), they are decoded as if followed by padding character(s). If there is a padding character present in the final unit, the correct number of padding character(s) must be present, otherwise IllegalArgumentException ( IOException when reading from a Base64 stream) is thrown during decoding.

How about this?

# This method complies with ``Base 64 Encoding with URL and Filename Safe
# Alphabet'' in RFC 4648.
# The alphabet uses '-' instead of '+' and '_' instead of '/'.

+ # Note that the result can still contain '='.
+ # You can remove the padding by setting "padding" as false.
+ def urlsafe_encode64(bin, padding: true)
+ str = strict_encode64(bin).tr("+/", "-_")
+ str = str.delete("=") unless padding
+ str

end

# Returns the Base64-decoded version of +str+.
# This method complies with ``Base 64 Encoding with URL and Filename Safe
# Alphabet'' in RFC 4648.
# The alphabet uses '-' instead of '+' and '_' instead of '/'.

+ #
+ # The padding characters are optional.
+ # This method accepts both correctly-padded and unpadded input.
+ # Note that it still rejects incorrectly-padded input.
+ def urlsafe_decode64(str)
+ str = str.tr("-_", "+/")
+ if !str.end_with?("=") && str.length % 4 != 0
+ str = str.ljust((str.length + 3) & ~3, "=")
+ end
+ strict_decode64(str)
end

Off topic:

> because RFC4648 allows other RFCs that implement RFC4648-compliant base64url encoding to explicitly stipulate that there is no padding.

RFC 4648 says that the encoder MUST NOT add line feeds, unless bla bla:

> Implementations MUST NOT add line feeds to base-encoded data unless
> the specification referring to this document explicitly directs base
> encoders to add line feeds after a specific number of characters.

Also, it says that the decoder MUST reject the input containing line feeds, unless bla bla:

> Implementations MUST reject the encoded data if it contains
> characters outside the base alphabet when interpreting base-encoded
> data, unless the specification referring to this document explicitly
> states otherwise.

RFC4648-compliant encoder WITH the exemption emits a data with line feed, and RFC4648-compliant decoder WITHOUT the exemption rejects the emitted data. Which is broken? IMO, RFC 4648 is broken ;-)

--
Yusuke Endoh <ma...@ruby-lang.org>

----------------------------------------
Feature #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-51051

urlsafe_base64.patch (2.97 KB)

--
https://bugs.ruby-lang.org/

bas...@gmail.com

unread,

Jan 16, 2015, 12:36:34 PM1/16/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Tony Arcieri.

That looks good to me, thank you!

----------------------------------------
Feature #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-51056

drago...@gmail.com

unread,

Jan 16, 2015, 2:06:46 PM1/16/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Scott Blum.

That looks awesome. I'll update my PR.

----------------------------------------
Feature #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-51057

drago...@gmail.com

unread,

Jan 16, 2015, 3:57:38 PM1/16/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Scott Blum.

Updated https://github.com/ruby/ruby/pull/815 and merged in changes from Yusuke Endoh

----------------------------------------
Feature #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-51060

no...@ruby-lang.org

unread,

Jan 16, 2015, 9:31:00 PM1/16/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Nobuyoshi Nakada.

Why does `urlsafe_decode64` use `strict_decode64`, but not just `unpack("m")`?

----------------------------------------
Feature #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-51064

ma...@ruby-lang.org

unread,

Jan 16, 2015, 9:49:44 PM1/16/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Yusuke Endoh.

Nobuyoshi Nakada wrote:
> Why does `urlsafe_decode64` use `strict_decode64`, but not just `unpack("m")`?

unpack("m") and Base64.decode64 are based on RFC 2045. unpack("m0"), Base64.strict_decode64, and Base64.urlsafe_decode64 (base64url) are based on RFC 4648.

RFC 2045 allows characters outside the base alphabet, such as CR and LF, and RFC 4648 does not (by default).

--
Yusuke Endoh <ma...@ruby-lang.org>

----------------------------------------
Feature #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-51065

ma...@ruby-lang.org

unread,

Jan 31, 2015, 10:38:23 AM1/31/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Yusuke Endoh.

Status changed from Feedback to Assigned

Thank you all. I'll commit the patch in a few days unless there is objection.

--
Yusuke Endoh <ma...@ruby-lang.org>

----------------------------------------
Feature #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-51312

* Author: Scott Blum
* Status: Assigned

drago...@gmail.com

unread,

Jan 31, 2015, 6:32:15 PM1/31/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Scott Blum.

Awesome. :)

----------------------------------------
Feature #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-51313

ma...@ruby-lang.org

unread,

Feb 13, 2015, 7:44:53 AM2/13/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Yusuke Endoh.

Sorry for the late action, I've committed the patch. Thank you!

--
Yusuke Endoh <ma...@ruby-lang.org>

----------------------------------------
Feature #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-51489

* Author: Scott Blum
* Status: Closed

drago...@gmail.com

unread,

Feb 13, 2015, 1:26:33 PM2/13/15

to ruby...@ruby-lang.org

Issue #10740 has been updated by Scott Blum.

Awesome, thanks!

----------------------------------------
Feature #10740: Base64 urlsafe methods are not urlsafe

https://bugs.ruby-lang.org/issues/10740#change-51498

Reply all

Reply to author

Forward