Hi all!
I've recently raised an issue [1] that relates to Character Encoding and Content-Length in Pedestal (and probably Ring in general). It could be that I'm very much mistaken or missing something, but I can't for the life of me figure out how Pedestal actually handles converts a String to a byte array via an explicit encoding.
The suggestion was that JSON encoding handles this for you — but it still gives you back a `String`, which is internally UTF-16. You may be able to encode characters if you want to get ASCII back, e.g. `{:escape-non-ascii true}` in Cheshire), but you still get a String back. If you do choose to encode non-ascii characters, it happens that the String will only contain characters in the ASCII set, so everything works fine. But you shouldn't need to, esp. since JSON pretty much dictates that UTF-8 be used for the serialisation format.
The only possible way to get UTF-8 back from a Java String, would be, *I think*, to get a raw `byte[]` using `(.getBytes s StandardCharsets/UTF_8)` — so if you want to *guarantee* that UTF-8 will be sent, you have to do it yourself right now, and pass the byte array to Pedestal so it gets copied to the Servlet response.
I am mistaken? Is the default platform charset guaranteed to be UTF-8, so the OutputStreamWriter that Pedestal uses will implicitly convert with that? Am I just too paranoid about this?
Thanks and sorry for keeping on about this!
Orestis
[1]:
https://github.com/pedestal/pedestal/issues/582