Sorry, Jay; I’ve just tested this and I hit:
Servlet (@ /...) exception:
bytes->string/utf-8: string is not a well-formed UTF-8 encoding
string: #"timmeh \351"
context...:
/usr/local/racket/extra-pkgs/web-server/web-server-lib/web-server/http/bindings.rkt:9:7
loop
/usr/local/racket-6.5/share/racket/collects/racket/contract/private/arrow-val-first.rkt:357:18
This is in web-server/http/bindings.rkt (where I count no less than five
`bytes->string/utf-8`s); and I really do think that that should be
bytes->string/latin-1 both because it covers all 256 code points AND
it is what HTTP asks for.
That would fix my issue (I hope).
Also, looking at byte-upcase / bytes-ci=? in
web-server-lib/web-server/private/util ; can I make a couple of
suggestions:
1. I think Eli points out in issue where \277 and \276 are not ci=?
to each other. I’m not sure of his specific example; because in
Latin-1, they are "3/4" and an upside down "?" -- which I wouldn’t
personally consider ci=? But further up the character set; I would
say that \311 E' and \350 e' ARE ci=? : but only in Latin-1.
So should there not be a byte-upcase/latin-1 and byte-upcase/ascii-7
and a bytes-ci=?/latin-1 and bytes-ci=?/latin-1
2. Since this is implemented in a web-server / HTTP context (and for the
reasons I set out above w.r.t. the bindings); should util.rkt not use
bytes-ci=?/latin-1 ?
Since I have an ISO-8859-1 table in front of me:
(define (byte-upcase/latin-1 b)
(if ((or (<= 97 b 12) ; ascii-7: a-z range
(<= 224 b 246) ; latin-1: a` to o"
; latin-1: -:- is not the lower case of x
(<= 248 254)) ; latin-1: o/ to |p
(- b 32)) ; 97 - 65 = 32
b))
On 05/05/16 18:46, Jay McCarthy wrote:
> Hi Tim,
>
> I consider this an error. The Web server tries to avoid interpreting
> anything as UTF-8 unless asked by the servlet. Header comparison
> incorrectly converted to UTF-8 and I just pushed a fix. Can you verify
> that it works now with your workload?
>
> Jay
- D&C 64:33