UTF8 characters in HTTP headers

1,658 views
Skip to first unread message

r.kasi...@samsung.com

unread,
Aug 21, 2013, 12:55:03 AM8/21/13
to blin...@chromium.org
Hi,

Currently, if we get any utf8 (non-ascii but valid utf8) character in http header values, during the header parsing they are converted to UChar (during the call of String::fromUTF8(const char*)) as opposed to LChar. This is wrong and fails use cases like http://www1.w3c-test.org/webappsec/tests/cors/submitted/opera/staging/resources/cors-headers.php, which sends a header X-Custom-Header-Bytes with valid UTF8 characters, but they are not rendered properly since they get converted to UChar. Firefox, Safari and IE do parse the header value correctly, but Chrome (as well as nightly WebKit/WebKit2 builds) fails to parse this header value correctly.

As far as my understanding of RFC2616, this is a valid case and we should support it. I intend to raise a bug for this and upload a CL to correct this. Please let me know if I am wrong or any other concerns.

Regards,
Ravi Kasibhatla.

r.kasi...@samsung.com

unread,
Aug 21, 2013, 1:12:12 AM8/21/13
to blin...@chromium.org
I have raised the bug crbug.com/276769 and attached a sample http which showcases the issue. The sample http content can also be seen at http://pastebin.com/SH2KGY2h.

Glenn Adams

unread,
Aug 21, 2013, 1:12:30 AM8/21/13
to r.kasi...@samsung.com, blink-dev
You should review RFC5987, Character Set and Language Encoding for Hypertext Transfer Protocol (HTTP) Header Field Parameters [1].

 

Regards,
Ravi Kasibhatla.

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

r.kasi...@samsung.com

unread,
Aug 21, 2013, 2:23:23 AM8/21/13
to blin...@chromium.org, r.kasi...@samsung.com
Thanks Glenn for the reference. As per RFC5987 also, the character set and language encoding in HTTP headers must be ISO-8859-1 and UTF-8 character sets, both of which are basically 8-bit single byte encoded characters. So, I still view this as a bug in Chromium.

Regards,
Ravi Kasibhatla.

Glenn Adams

unread,
Aug 21, 2013, 3:15:13 AM8/21/13
to r.kasibhatla, blink-dev
You need to read up on UTF-8, e.g., at [1][2]. UTF-8 is not a character set, it is a character encoding system for the Unicode Character Set, consisting of from one to four bytes per encoded character.

Using the techniques in RFC5987, one may encode HTTP header parameters that include Unicode code points in the range U+000000 to U+10FFFF.

Reply all
Reply to author
Forward
0 new messages