<script type="text/javascript" src="dirrerentlyEncodedScriptFile.js"
charset="ISO-8859-1"/>
The content of that JS file is not converted properly.
Online test case:
http://getfirebug.com/tests/issues/429/index.xhtml
I checked the channel object (related to
dirrerentlyEncodedScriptFile.js request) and it's contentCharset is
not set.
Accorging to the doc the contentCharset uses charset given in
contentType (after ';'), but if the contentType doesn't specify any
charset, the "charset" attribute from the <script> element should be
respected, correct?
I'll file a new bug if this is expected behavior.
Honza
This testcase worksforme (in the sense that in a build without Firebug
installed it throws because |console| is undefined, and in a build with
Firebug installed it logs "got something:foo" to the console.
> Accorging to the doc the contentCharset uses charset given in
> contentType (after ';'), but if the contentType doesn't specify any
> charset, the "charset" attribute from the<script> element should be
> respected, correct?
Yes, and it is.
-Boris
The bug here is that the Script panel is not readable for the .js files.
jjb
How are you getting the source text? If you refetch the JS, and don't specify
the charset bits yourself, then that'd be the same as requesting the code
yourself in the webbrowser - which has it in UTF-16, which would explain why
it's not readable...
~ Gijs
Is the Script panel using the <script> node's charset attribute?
In fact, how is the Script panel getting the text it's getting?
-Boris
Oh, I see what you're asking. You want to know whether the charset
reported by the _channel_ should take into account the @charset
attribute of the <script>?
There's no guarantee of that whatsoever. The consumer can use @charset
as a charset hint (and set it on the channel), or it can just detect
when the channel didn't set a charset and look at @charset at that
point. Right now it does the latter. I suppose we could switch to
doing the former; that might help you if you're relying solely on the
channel to get the encoding information.
Of course you'll still have to do the BOM and document character set
stuff yourself. There really has to be a better way to do this.
Perhaps we should have a way for a debugger to be notified with the
Unicode script text instead of having to sniff channels for data? Or
something?
-Boris
To be clear, this would only work with HTTP. Other channel impls do not
have this behavior, and the API doesn't describe it as an option (unlike
with content-type). So this is not in fact something the consumer can do.
-Boris
Firebug currently uses nsITraceableChannel to register a stream tee
listener and get an HTTP response from the tee using pipe. All
responses are cached (to avoid any refetch from the server) and used
later by the Script panel (and e.g. by the Net panel to show
responses).
Honza
In this specific case, the js script included in the page uses
different charset than the page itself so instead of:
var conv = // scriptableunicodeconverter component
conv.charset = document.characterSet;
return conv.ConvertToUnicode(text);
There should be something like (where request is nsIHttpChannel):
conv.charset = request.contentCharset;
return conv.ConvertToUnicode(text);
So, the conversion uses character set of the request instead of the
'global' page character set.
> There's no guarantee of that whatsoever. The consumer can use @charset
> as a charset hint (and set it on the channel), or it can just detect
> when the channel didn't set a charset and look at @charset at that
> point.
Is there any way how to get the <script> element and consequently the
@charset attribute from the _channel_ object? The code tracking http
responses doesn't have currently any glue about the UI. It's starting
in http-on-examine-response event. This could be actually useful even
for other things, but I actually doubt it's possible.
> Right now it does the latter. I suppose we could switch to
> doing the former; that might help you if you're relying solely on the
> channel to get the encoding information.
Yes, this would make better sense.
I have filled a bug for this.
https://bugzilla.mozilla.org/show_bug.cgi?id=536529
Of course the charset from the response Content-Type header (if any)
can have bigger priority.
> Of course you'll still have to do the BOM and document character set
> stuff yourself. There really has to be a better way to do this.
> Perhaps we should have a way for a debugger to be notified with the
> Unicode script text instead of having to sniff channels for data? Or
> something?
Where I can get more info about how to do the BOM stuff? Should I also
report a bug for this. Not sure what options do we have here...
Honza
nsITraceableChannel gives you bytes. How do you convert the bytes to
characters?
-Boris
No.
> I have filled a bug for this.
> https://bugzilla.mozilla.org/show_bug.cgi?id=536529
And I just marked it invalid per my later post in this thread....
>> Of course you'll still have to do the BOM and document character set
>> stuff yourself. There really has to be a better way to do this.
>> Perhaps we should have a way for a debugger to be notified with the
>> Unicode script text instead of having to sniff channels for data? Or
>> something?
> Where I can get more info about how to do the BOM stuff? Should I also
> report a bug for this. Not sure what options do we have here...
Didn't I just mention an option above? The end of the paragraph you quoted.
-Boris