Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

XMLHTTP responseText unicode conversion?

345 views
Skip to first unread message

Will Koffel

unread,
Feb 2, 2003, 1:41:33 PM2/2/03
to
I've been struggling for a couple days with this problem. I'll try to
describe it simply, and hopefully someone will have a pointer or two.

My test case uses Microsoft.XMLHTTP to retrieve a URL.

var url = "http://www.google.com/search?q=bli";
var http = new ActiveXObject("MSXML2.XMLHTTP");
http.open("GET",url,false);
http.send();
var tmpData = http.responseText;

That search returns a page with a number of accented-characters (char
> 127). The HTML returned is UTF-8, which is standard. However,
tmpData seems to be convinced that those raw bytes actually
represented UTF-16 (native JScript Win32 encoding).

If I create a textarea and manually cut-and-paste the offending
strings into it, and then set tmpData to be
document.getElementById("textIn").value, then it works fine, but
there's obviously some magic going on with the character encoding.
And I need to be able to retrieve HTML programmatically, of course.

using responseXML doesn't help, because then tmpData.xml is empty
(presumably because the response simply isn't XML). Most of the cases
I've found on the newsgroups are from people using XMLHTTP to retrieve
actual XML.

Is there another ActiveX object which I haven't found that will do an
HTTP request and return me a JScript string properly converted from
UTF-8 to UTF-16? Does anyone have a javascript function that will do
that conversion, or is it too late by the time the HTML gets into the
variable tmpData?

The ultimate goal is that I need to do some manipulation of the
results, and then send them back to the browser, and I need the
browser to still be able to render the accents and special chars.
Right now, the browser just renders question-marks and/or boxes.

Any ideas are greatly appreciated! I can provide more details where
necessary, of course.

-Will

Joe Fawcett

unread,
Feb 2, 2003, 2:14:32 PM2/2/03
to

"Will Koffel" <wko...@alum.mit.edu> wrote in message
news:ab0547b3.03020...@posting.google.com...

According to the docs the server is not setting the encoding correctly then.
What is the source url?

Joe


Will Koffel

unread,
Feb 2, 2003, 3:41:03 PM2/2/03
to
>According to the docs the server is not setting the >encoding correctly
then.
>What is the source url?
>
>Joe

Can you point me at the docs to which you are referring? None of the
documentation I've found specifies that XMLHTTP.responseText will have
encoding conversion done on it.

The source URL is in my code below. Just a Google search. The response
HTML does indeed include at least a meta tag specifying UTF-8, and I
expect it has the header as well.

-Will

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Han

unread,
Feb 3, 2003, 12:52:22 AM2/3/03
to
Hi Will

One way is 1. reading it as binary first, 2. save it to a file, 3. read it
again with proper charset in adodb.stream.

sFile = "test.txt"
sURL = "http://localhost/test.asp"

Set objXMLHTTP = CreateObject("MSXML2.serverXMLHTTP.4.0")

objXMLHTTP.Open "GET", sURL, False
objXMLHTTP.Send

set strm1=createobject("adodb.stream")
With strm1
.type = 1
.open
.write objXMLHTTP.responsebody
.savetofile sFile, 2' adSaveCreateOverWrite
.close
End With

set strm2=createobject("adodb.stream")
With strm2
.type = 2
.charset="euc-kr" 'Use any proper charset
.open
.loadfromFile "test.txt"
msgbox .readText
.close
End With

--
Have a nice day.
Han Pohwan, Microsoft MVP, Korea

"Will Koffel" <wko...@alum.mit.edu> wrote in message
news:ab0547b3.03020...@posting.google.com...

Joe Fawcett

unread,
Feb 3, 2003, 5:19:16 AM2/3/03
to

"Will Koffel" <wko...@alum.mit.edu> wrote in message
news:uC89ntvyCHA.1620@TK2MSFTNGP11...

> >According to the docs the server is not setting the >encoding correctly
> then.
> >What is the source url?
> >
> >Joe
>
> Can you point me at the docs to which you are referring? None of the
> documentation I've found specifies that XMLHTTP.responseText will have
> encoding conversion done on it.
>

Just from the sdk that comes with msxml4 parser.

Will Koffel

unread,
Feb 3, 2003, 11:45:04 AM2/3/03
to
Thanks, Han. A variant on this definitely provides a valid
work-around for me. If I get responseBody (the raw Bytes) and write
it to a file, and then read that file off disk, the conversion is
perfect!

Now I'm trying to find a way to do it without the intermediate stage
of writing to disk. I'm wondering if, using responseStream, there is
a way to get the same file-loading-and-converting characteristics, but
by using a stream instead.

I tried:

var stream = new ActiveXObject("ADODB.Stream");
stream.Type = 1;
stream.Open();
stream.Write(http.responseBody);
var htmlData = stream.ReadText();

But it errors on the ReadText command. I was hoping that by writing
binary data, and asking to read text, it would do the intelligent
thing, and convert the UTF-8 back into a BSTR, but no such luck.

Anyone know a way to do this?

-Will


"Han" <hp4...@kornet.net> wrote in message news:<#hnTfh0yCHA.2308@TK2MSFTNGP09>...

Paul Randall

unread,
Feb 3, 2003, 1:08:05 PM2/3/03
to

"Will Koffel" <wko...@alum.mit.edu> wrote in message news:ab0547b3.03020...@posting.google.com...
> Thanks, Han. A variant on this definitely provides a valid
> work-around for me. If I get responseBody (the raw Bytes) and write
> it to a file, and then read that file off disk, the conversion is
> perfect!
>
> Now I'm trying to find a way to do it without the intermediate stage
> of writing to disk. I'm wondering if, using responseStream, there is
> a way to get the same file-loading-and-converting characteristics, but
> by using a stream instead.
>
> I tried:
>
> var stream = new ActiveXObject("ADODB.Stream");
> stream.Type = 1;
> stream.Open();
> stream.Write(http.responseBody);
> var htmlData = stream.ReadText();
>
> But it errors on the ReadText command. I was hoping that by writing
> binary data, and asking to read text, it would do the intelligent
> thing, and convert the UTF-8 back into a BSTR, but no such luck.
>
> Anyone know a way to do this?
>
> -Will

Hi, Will
The ADODB stream object may be able to do this for you. The stream's type has to match the method you use to access the stream.
.Read/.Write can only be done if .Type is binary. .ReadText/.WriteText can only be done if .Type is text. Many of the stream's
properties can only be changed while the pointer property is set to zero, so I routinely change the pointer to zero before changing
any of the other properties. Changing the pointer to zero does not destroy any information in the stream; you can change it back to
what it was if you need to.

-Paul Randall


Han

unread,
Feb 3, 2003, 10:12:57 PM2/3/03
to
Hi Will

I don't know the way. Only if we can use strConv or something in script....
I had tried several tricks before posting, without success.

0 new messages