My test case uses Microsoft.XMLHTTP to retrieve a URL.
var url = "http://www.google.com/search?q=bli";
var http = new ActiveXObject("MSXML2.XMLHTTP");
http.open("GET",url,false);
http.send();
var tmpData = http.responseText;
That search returns a page with a number of accented-characters (char
> 127). The HTML returned is UTF-8, which is standard. However,
tmpData seems to be convinced that those raw bytes actually
represented UTF-16 (native JScript Win32 encoding).
If I create a textarea and manually cut-and-paste the offending
strings into it, and then set tmpData to be
document.getElementById("textIn").value, then it works fine, but
there's obviously some magic going on with the character encoding.
And I need to be able to retrieve HTML programmatically, of course.
using responseXML doesn't help, because then tmpData.xml is empty
(presumably because the response simply isn't XML). Most of the cases
I've found on the newsgroups are from people using XMLHTTP to retrieve
actual XML.
Is there another ActiveX object which I haven't found that will do an
HTTP request and return me a JScript string properly converted from
UTF-8 to UTF-16? Does anyone have a javascript function that will do
that conversion, or is it too late by the time the HTML gets into the
variable tmpData?
The ultimate goal is that I need to do some manipulation of the
results, and then send them back to the browser, and I need the
browser to still be able to render the accents and special chars.
Right now, the browser just renders question-marks and/or boxes.
Any ideas are greatly appreciated! I can provide more details where
necessary, of course.
-Will
According to the docs the server is not setting the encoding correctly then.
What is the source url?
Joe
Can you point me at the docs to which you are referring? None of the
documentation I've found specifies that XMLHTTP.responseText will have
encoding conversion done on it.
The source URL is in my code below. Just a Google search. The response
HTML does indeed include at least a meta tag specifying UTF-8, and I
expect it has the header as well.
-Will
*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
One way is 1. reading it as binary first, 2. save it to a file, 3. read it
again with proper charset in adodb.stream.
sFile = "test.txt"
sURL = "http://localhost/test.asp"
Set objXMLHTTP = CreateObject("MSXML2.serverXMLHTTP.4.0")
objXMLHTTP.Open "GET", sURL, False
objXMLHTTP.Send
set strm1=createobject("adodb.stream")
With strm1
.type = 1
.open
.write objXMLHTTP.responsebody
.savetofile sFile, 2' adSaveCreateOverWrite
.close
End With
set strm2=createobject("adodb.stream")
With strm2
.type = 2
.charset="euc-kr" 'Use any proper charset
.open
.loadfromFile "test.txt"
msgbox .readText
.close
End With
--
Have a nice day.
Han Pohwan, Microsoft MVP, Korea
"Will Koffel" <wko...@alum.mit.edu> wrote in message
news:ab0547b3.03020...@posting.google.com...
Just from the sdk that comes with msxml4 parser.
Now I'm trying to find a way to do it without the intermediate stage
of writing to disk. I'm wondering if, using responseStream, there is
a way to get the same file-loading-and-converting characteristics, but
by using a stream instead.
I tried:
var stream = new ActiveXObject("ADODB.Stream");
stream.Type = 1;
stream.Open();
stream.Write(http.responseBody);
var htmlData = stream.ReadText();
But it errors on the ReadText command. I was hoping that by writing
binary data, and asking to read text, it would do the intelligent
thing, and convert the UTF-8 back into a BSTR, but no such luck.
Anyone know a way to do this?
-Will
"Han" <hp4...@kornet.net> wrote in message news:<#hnTfh0yCHA.2308@TK2MSFTNGP09>...
Hi, Will
The ADODB stream object may be able to do this for you. The stream's type has to match the method you use to access the stream.
.Read/.Write can only be done if .Type is binary. .ReadText/.WriteText can only be done if .Type is text. Many of the stream's
properties can only be changed while the pointer property is set to zero, so I routinely change the pointer to zero before changing
any of the other properties. Changing the pointer to zero does not destroy any information in the stream; you can change it back to
what it was if you need to.
-Paul Randall
I don't know the way. Only if we can use strConv or something in script....
I had tried several tricks before posting, without success.