while parsing a file with Johnson, the following error occurred:
Assertion failed: (chars = js_InflateString(context->js,
StringValuePtr(file_contents), &length)), function parse_io, file
immutable_node.c, line 51.
I guess it has to do with the encoding of the string being parsed.
Can you suggest any workaround to overcome this problem.
If you can create a test case, this'll be easier to track down and fix.
I'm not sure if this is right, but if you're talking about a string that's not UTF-8, I don't know what's going to happen. Pretty sure everything in JS has to be UTF-8.
> while parsing a file with Johnson, the following error occurred:
> Assertion failed: (chars = js_InflateString(context->js,
> StringValuePtr(file_contents), &length)), function parse_io, file
> immutable_node.c, line 51.
> I guess it has to do with the encoding of the string being parsed.
> Can you suggest any workaround to overcome this problem.
> I'm not sure if this is right, but if you're talking about a string that's not UTF-8, I don't know what's going to happen. Pretty sure everything in JS has to be UTF-8.
That is what I though also but I read a thread on another ML that
stated that you could find some files encoded in other Unicode
encodings.
Here an example of code the Johnson assertion will barf on:
http://cecil.auckland.ac.nz/scripts/menu.js
But it is perfectly executable though and I even run this sample in
SpiderMonkey which Johnson should be using.
Regards,
greg/
On 5月20日, 午前6:28, Steven Parkes <smpar...@smparkes.net> wrote:
> If you can create a test case, this'll be easier to track down and fix.
> I'm not sure if this is right, but if you're talking about a string that's not UTF-8, I don't know what's going to happen. Pretty sure everything in JS has to be UTF-8.
> On May 17, 2010, at May 17,8:00 AM , greg wrote:
> > Dear Johnson developers,
> > while parsing a file with Johnson, the following error occurred:
> > Assertion failed: (chars = js_InflateString(context->js,
> > StringValuePtr(file_contents), &length)), function parse_io, file
> > immutable_node.c, line 51.
> > I guess it has to do with the encoding of the string being parsed.
> > Can you suggest any workaround to overcome this problem.
> That is what I though also but I read a thread on another ML that
> stated that you could find some files encoded in other Unicode
> encodings.
> Here an example of code the Johnson assertion will barf on:
> http://cecil.auckland.ac.nz/scripts/menu.js
> But it is perfectly executable though and I even run this sample in
> SpiderMonkey which Johnson should be using.
I'm not a unicode expert. I more or less learn it on an as-needed basis then tend to forget fairly quickly.
I do know trying to store binary in a JS string can cause it to barf. That's happened to me.
I'm just guessing here, but If this gets into SM via the shell or some other native I/O, I would imagine the problem is getting the encodings through Ruby sanely. It's entirely possible that having Ruby do the read and passing the result off to SM is mucking things up. I think this can be a little tricky but I haven't done much with it.
On Thu, 2010-05-20 at 09:43 -0700, Steven Parkes wrote: > I'm just guessing here, but If this gets into SM via the shell or some > other native I/O, I would imagine the problem is getting the encodings > through Ruby sanely. It's entirely possible that having Ruby do the > read and passing the result off to SM is mucking things up. I think > this can be a little tricky but I haven't done much with it.
This.
The problem is that JavaScript strings are UCS-2, and Ruby strings are ASCII bytes.
So we have to decide how we convert between them:
We treat JS strings as being full of bytes too, and ignore the upper 8 bits. This has the advantage that all Ruby strings can be converted to JS.
Or, we treat Ruby strings as UTF-8, and convert between them as necessary. This has the advantage that all JavaScript strings can be converted to Ruby... and it assumes that anyone using multi-byte characters in JS is probably doing the same in Ruby, and knows what they're doing. But, it means we can only handle valid UTF-8 strings.
The third option is of course to translate byte-for-byte between the two formats, so one JS character corresponds to two Ruby characters... but while that will allow all valid characters in both environments, the ruby-land strings would be near-useless.
Initially, we were going with Option 1, mostly due to lack of any specific consideration. Someone tried to use multi-byte strings, and complained that JavaScript wasn't seeing useful values; absent a compelling reason to keep the previous behaviour, I changed it to implement Option 2.
I guess another possibility would be to choose between the Option 1 and Option 2 behaviours based on $KCODE or something.
The final option is the most complicated... try as hard as we possibly can to avoid ever converting from one type of string to the other, and instead provide duck-type-compatible String-like proxies.
Thank you very much for your precisions. I guess that as said Matthew,
option 2 was implemented, which means that any JS string could be
translated to ruby strings, provided it is a valid UTF-8 string. In my
case, the JS String is not a UTF-8 then?
What kind of methods can I use to implement the 3rd option then?
(provided it would help solve my case).
Regards,
greg/
On 2010/05/22, at 1:58, Matthew Draper <matt...@trebex.net> wrote:
> On Thu, 2010-05-20 at 09:43 -0700, Steven Parkes wrote:
>> I'm just guessing here, but If this gets into SM via the shell or
>> some
>> other native I/O, I would imagine the problem is getting the
>> encodings
>> through Ruby sanely. It's entirely possible that having Ruby do the
>> read and passing the result off to SM is mucking things up. I think
>> this can be a little tricky but I haven't done much with it.
> This.
> The problem is that JavaScript strings are UCS-2, and Ruby strings are
> ASCII bytes.
> So we have to decide how we convert between them:
> We treat JS strings as being full of bytes too, and ignore the upper 8
> bits. This has the advantage that all Ruby strings can be converted to
> JS.
> Or, we treat Ruby strings as UTF-8, and convert between them as
> necessary. This has the advantage that all JavaScript strings can be
> converted to Ruby... and it assumes that anyone using multi-byte
> characters in JS is probably doing the same in Ruby, and knows what
> they're doing. But, it means we can only handle valid UTF-8 strings.
> The third option is of course to translate byte-for-byte between the
> two
> formats, so one JS character corresponds to two Ruby characters... but
> while that will allow all valid characters in both environments, the
> ruby-land strings would be near-useless.
> Initially, we were going with Option 1, mostly due to lack of any
> specific consideration. Someone tried to use multi-byte strings, and
> complained that JavaScript wasn't seeing useful values; absent a
> compelling reason to keep the previous behaviour, I changed it to
> implement Option 2.
> I guess another possibility would be to choose between the Option 1
> and
> Option 2 behaviours based on $KCODE or something.
> The final option is the most complicated... try as hard as we possibly
> can to avoid ever converting from one type of string to the other, and
> instead provide duck-type-compatible String-like proxies.