Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

UTF-8 and the euro and japanese character from a <FORM/> POST

0 views
Skip to first unread message

Richard M

unread,
Dec 6, 2002, 4:04:20 AM12/6/02
to
We have a form on a web page that we are trying to interpret into
unicode inside a C++ COM object from an ASP page. The form has
accept-charset="UTF-8" and the page a
<meta http-equiv="content-type" content="text/html; charset=utf-8">
to get the browser to encode the data in UTF-8 format.

However when I retrieve the string from the response.form object I
get data that doesn't make sense to me..

On the form submitted was A(euro symbol)Z

On the response.form we see the UTF-8 encoding..

...&artabs=A%E2%82%ACZ&...

But when I try to intepret the binary from the BSTR returned from the
form object I get in hex...


41 00 -- A
E2 00 1A 20 AC 00 -- UTF-8, I don't think so
5A 00 -- Z
00 00 -- NULL

I would have expected something different in this case, the unicode
for the euro symbol is x20AC. I am also getting strange behaviour
for some japanese characters. Pasted in text to the textarea is
the characters corresponding to the unicode entities.

&#12398;&#12513;&#12531;&#12496;&#12540;

again on the response.form we get the UTF-8 encoded characters as we
would expect..

...&artabs=%E3%81%AE%E3%83%A1%E3%83%B3%E3%83%90%E3%83%BC&...

But again when I look at the string that I have to deal with from the
form object I end up with strange things...

E3 00 81 00 AE 00 -- This is the UTF-8 encoding of the first character
E3 00 92 01 A1 00 -- This and the rest aren't any sort of UTF-8 that I
understand
E3 00 92 01 B3 00
E3 00 92 01 90 00
E3 00 92 01 BC 00
00 00

They all seem to be offset by 10f in the second byte of the
UTF-8 for some strange reason.

Can anybody out there help me I'm about to go mad.
The object is written in C++ using ATL and I am retrieving the
information from the request.form object in the C++ object.

Ta muchly.

Richard Mitchell
Softie
VBN

deligentman

unread,
Dec 11, 2002, 1:31:09 AM12/11/02
to

This might have already been done by you
but if you seperate the problem in to 2 pieces

1st is
get corrctly encoded data to the asp page from the form
you can examine this string if its properly encoded. on the asp page it
self.

2nd is
write the string to a database (nvarchar) field then read from database
and display on the browser using simple
server.createobject("Adodb.recordset") technique

then you would be knowing that front end and the back end is reliable and go
for the
nuts and botls of the back end. ATL components must be built with unicode
versions
and use mdac 2.6 and OLESTRs

regards
Del

"Richard M" <richard....@vbnonline.com> wrote in message
news:eb3c2c87.02120...@posting.google.com...

0 new messages