Encoding in requests

Claes Bergsten

unread,

Mar 3, 2006, 8:48:58 AM3/3/06

to

Hi everyone!

I have a WAS 6 installed together with a IBM HTTP 6 and Ive deployed
several
webapplications on it.

The requests coming in to the servlets/actions are randomly encoded
either with
ISO-8859-1 or UTF-8.

request.getAttribute("attr") returns correctly in some applications and
with missing
å, æ, ø in some applications on the SAME server.

Where is the setting that defines what encoding is used in each
application?

Can someone please point me in the right direction here.

Dexthor

unread,

Mar 3, 2006, 9:08:17 AM3/3/06

to

You can set the preferred encoding on the JVM System Environment
Properties:

client.encoding.override=UTF-8
default.client.encoding=UTF-8

If you dont want client to specify encoding type, then you can use
client.encoding.override.

HTH
Dexthor.

Claes Bergsten

unread,

Mar 6, 2006, 4:19:34 AM3/6/06

to

Thanks for your response.

I will try the JVM settings, still abit puzzled though
request.getCharacterEncoding is always empty,
even if I manually change the encoding in my browser.

How can you know what encoding the parameters in the request are
encoded with?

Also there must be something more to it, as I said this is several
applications on the same server.
All of them sets pageEncoding to ISO-8859-1, but some of them still
encodes the request parameters with UTF-8.

Claes

Cyrille Le Clerc

unread,

Mar 6, 2006, 6:01:10 AM3/6/06

to

Hello Claes,

Your seem to suffer from a well known browser encoding issue : many
browsers forget to tell the server which character encoding they used.

Java specification plans to adress this issue calling
ServletRequest.setCharacterEncoding() - see spec extract below.
Typically, this call would be performed in a Servlet Filter.
Sun provides a sample of such a filter here :
http://java.sun.com/products/servlet/Filters.html

Hope this helps,

Cyrille
--
Cyrille Le Clerc
cyrille...@pobox.com
cyrille...@fr.ibm.com

Extract from the Servlet API Specification (I took version 2.3 fcs)

"SRV.4.9 Request data encoding

Currently, many browsers do not send a char encoding qualifier with the
Content-Type header, leaving open the determination of the character
encoding for reading HTTP requests. The default encoding of a request
the container uses to create the request reader and parse POST data
must be "ISO-8859-1", if none has been specified by the client
request. However, in order to indicate to the developer in this case
the failure of the client to send a character encoding, the container
returns null from the getCharacterEncoding method.
If the client hasn't set character encoding and the request data is
encoded with a different encoding than the default as described above,
breakage can occur. To remedy this situation, a new method
setCharacterEncoding(String enc) has been added to the ServletRequest
interface. Developers can override the character encoding supplied by
the container by calling this method. It must be called prior to
parsing any post data or reading any input from the request. Calling
this method once data has been read will not affect the encoding."

Claes Bergsten

unread,

Mar 6, 2006, 6:38:23 AM3/6/06

to

Thank you Cyrille,

Second week Im struggeling with this and after lots of reading Im
starting to grasp
the entire encoding process.

But let me explain the exact part that I can not understand here.

I have 2 applications deployed on the same server. Call them App1 and
App2. (Version of Struts 1.1 in both applications)

Ive created a testpost.jsp in both applications they both post to a
struts action.
The jsp sets both pageEncoding and Content-type to ISO-8859-1 and I
double check before I post that the browser has ISO8859-1 selected.
The struts action just writes the parameter to System.out.

If I post f. ex. abcøabc I get following output
App1: abcøabc
App2: abcbc ( the øa disappears)

new String( parameter.getBytes(), "UTF-8") will make App2 output
abcøabc correctly.

Naturally my question is why does App2 encode in UTF-8 at own will?
With getCharacterEncoding == null there is no way to make this
universally safe.

Thanks for helping!

Claes

Cyrille Le Clerc

unread,

Mar 6, 2006, 10:02:53 AM3/6/06

to

Hi Claes,

Here are few comments :

- I don't know why App2 is encoding in UTF-8 at own will. Did you try
to reproduce in an open source servlet container ? If you do, you will
be able to step by step debug and see the tricky code.

- I already played with new String( parameter.getBytes(), "UTF-8"). It
was hell ; at the end of the day, I didn't even remember my name.
In your sample, parameter is a String so the conversion problem
(byte->string) already occured. It's too late

- Could you change all your jsp/html pages to UTF-8 ? This would become
simpler

- As you force the response encoding (ie jsp encoding), you should
force the request encoding. You should do this with a servlet filter (I
didn't see any way to do it elegantly in struts)