http client fails to handle non-ascii http location header

983 views
Skip to first unread message

Jazz

unread,
Nov 7, 2014, 11:00:31 PM11/7/14
to ve...@googlegroups.com
Hello,

I am using Vert.x version 2.1.4

The http client "location" header contains non-ascii/random ascii characters instead of the correct language. String decoding issue?

I tried to trace the issue and I found:
In "DefaultHttpClientResponse" class:
headers = new HttpHeadersAdapter(response.headers());

headers is instanceof  io.netty.handler.codec.http.HttpMessage.headers()
respone is instanceof io.netty.handler.codec.http.HttpResponse

headers value in debug mode in eclipse is:
DefaultHttpResponse(decodeResult: success)
HTTP/1.1 302 Found
Date: Sat, 08 Nov 2014 03:39:29 GMT
Server: Apache/2.4.10 (Unix) PHP/5.3.29 mod_wsgi/3.4 Python/2.7.6 OpenSSL/1.0.1h
X-Powered-By: PHP/5.3.29
Location: redirect3.php?XXXXXXXX << X represents random ascii chars instead of the correct language
Content-Length: 0
Content-Type: text/html; charset=utf-8

In chrome browser the result looks like this:
HTTP/1.1 302 Found
Date: Sat, 08 Nov 2014 03:56:56 GMT
Server: Apache/2.4.10 (Unix) PHP/5.3.29 mod_wsgi/3.4 Python/2.7.6 OpenSSL/1.0.1h
X-Powered-By: PHP/5.3.29
Location: redirect3.php?%D8%A7%D8%AE%D8%AA%D8%A8%D8%A7%D8%B1
Content-Length: 0
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8

thanks



Tim Fox

unread,
Nov 8, 2014, 1:57:48 AM11/8/14
to ve...@googlegroups.com
I don't think HTTP permits non-ascii characters in HTTP headers
--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alexander Lehmann

unread,
Nov 8, 2014, 4:53:43 AM11/8/14
to ve...@googlegroups.com
HTTP does not permit unencoded characters, afaik there is no default charset defined in HTTP. The content-type header obviously only applies to the content.

Having said that, it would be useful if vert.x would apply character encoding on the location header to fix that with a reasonable default.


BTW, while we are on the subject of standards, Location is defined to be an absolute url, using a relative url will mostly work, but may break some clients.

Alexander Lehmann

unread,
Nov 8, 2014, 5:12:39 AM11/8/14
to ve...@googlegroups.com
Sorry, I should have asked a more general question beforehand:

Can you please verify why the server is sending unencoded urls at all? If the server is in php, it should handle the charset correctly and encode the arabic chars in utf-8 before sending the header. (if not, you may have a security hole waiting to happen)

Jazz

unread,
Nov 8, 2014, 10:34:59 AM11/8/14
to ve...@googlegroups.com
Unfortunately not all people follow standards.
I have seen some servers send unencoded/non-ascii location header, and since major browsers are able to handle those requests. I wanted my http client to be able to handle those requests too.

Alexander,
I created a php redirect page to test the http client. I did not encode the text on purpose because I wanted to test the http client.
Also, I have seen some sites that uses relative urls, and chrome/firefox handles those just fine.

Anyways, such http requests are not common luckily therefore, I consider this issue a low priority.

Alexander Lehmann

unread,
Nov 8, 2014, 2:06:37 PM11/8/14
to ve...@googlegroups.com
yeah, if everybody followed the standards, programming would be much easier.

It seems that the conversion of the headers is coming from netty, so you will probably have to ask them if they would change that.

Alexander Lehmann

unread,
Nov 9, 2014, 5:48:26 AM11/9/14
to ve...@googlegroups.com
It is possible to fix the header if you really need to, the encoding of the string is wrong (probably since the encoding is not handled at all, since it is not expected).

https://gist.github.com/alexlehm/9b8d3202552e223bc55e

I assume this operation is a no-op when the string is correctly encoded as UTF-8, but it will break a unicode string.
Reply all
Reply to author
Forward
0 new messages