The code for the channel's asyncOpen method is something like this:
asyncOpen: function(observer, context) {
var html = getStringFromJava(this.URI);
var sc = Components.classes["@mozilla.org/io/string-input-stream;
1"];
var ist =
sc.createInstance(Components.interfaces.nsIStringInputStream);
observer.onStartRequest(this, context);
var len = html.length;
ist.setData(html, len);
observer.onDataAvailable(this, context, ist, 0, len);
ist.close();
observer.onStopRequest(this, context, Components.results.NS_OK);
};
Problem: The above works well for regular ASCII text, it fails
miserably however, if the HTML string returned contains characters
outside ASCII (e.g. for Chinese characters). I just get mangled text
in my browser instead. If I try to log the HTML string to the
Javascript console OR display it using an alert box, it looks just
fine, with all the characters preserved. I have tried setting the
contentCharset property on the channel to UTF-8, UTF-16, UCS-2 and
ISO-10646(unicode?) with no luck. AFAIK, even if you do not make the
setting explicitly in the channel, you can make this change on the fly
in FF using View > Character Encoding; this did not help, I just could
not get it to work. Note that the HTML returned does not have any
encoding elements specified, just a style and hyperlinked text.
Notes:
1) Instead of using the channel, if I set the contents of the current
browser's document to the string returned, it looks fine:
gBrowser.selectedBrowser.contentDocument.childNodes[1].innerHTML =
html;
This was actually my initial implementation before I ran into the
encoding problem, but it really is a hack and I would like to use the
channel instead.
2) Instead of HTML, I can return an array composed of the raw string
bytes (using UTF-8 encoding). In such a case I have a loop which
reconstructs the string like so:
var html = "";
for (i = 0; i < bytes.length; i++) {
html += String.fromCharCode(bytes[i]);
}
If I use this string (html) while specifying UTF-8 as the
contentCharset, it works and I get the expected text in my browser.
This is my current solution, but it is an obviously inefficient and
unwanted o(n) routine. On my machine it takes nearly 3 seconds for 5
pages worth of text, though most regular queries have smaller results
and only take a quarter of a second.
3) The JS file which contains this code is encoded using UTF-8, so I
can hardcode a Javascript string to include some Chinese characters.
These look fine in my editor, but when I display this string using the
console or an alert, it looks mangled. However, if I use normal
channel code (without any of these mods), it looks just fine in the
browser if the contentCharset is set to UTF-8.
I don't think the Java source is the problem because Javascript is
capable of displaying the string correctly in the console / alert box.
It seems to merely be a question of finding the right encoding to use
but most of the common options have lead to disappointing results. Can
I use a different input stream? Any suggestions? What am I missing /
doing wrong?
Thanks!
Brian.
<snip>
http://lxr.mozilla.org/mozilla/source/xpcom/io/nsIStringStream.idl#55
setData's first parameter is "string", which is XPIDL/XPConnect for
"random 8-bit data that got that way by taking UCS2 and chopping off the
high byte". You want to first encode your string into bytes then send
it over. nsIScriptableUConv is probably useful for you.
Basically, just send the *raw* bytes over (this means showing it in JS
won't show the right characters, but that's fine because decoding is
just done later).
HTH
--
Mook
That was the ticket. Here is the revised asyncOpen method which works
like a charm.
asyncOpen: function(observer, context) {
var html = getStringFromJava(this.URI);
observer.onStartRequest(this, context);
var converter = Components.classes["@mozilla.org/intl/
scriptableunicodeconverter"]
.createInstance(Components.interfaces.nsIScriptableUnicodeConverter);
converter.charset = "UTF-8";
var ist = converter.convertToInputStream(html);
var len = ist.available();
observer.onDataAvailable(this, context, ist, 0, len);
observer.onStopRequest(this, context, Components.results.NS_OK);
}
Thanks! I appreciate the quick response.
Brian.