[johnson-talk] Assertion failed

greg

unread,

May 17, 2010, 11:00:29 AM5/17/10

to Johnson

Dear Johnson developers,

while parsing a file with Johnson, the following error occurred:
Assertion failed: (chars = js_InflateString(context->js,
StringValuePtr(file_contents), &length)), function parse_io, file
immutable_node.c, line 51.

I guess it has to do with the encoding of the string being parsed.

Can you suggest any workaround to overcome this problem.

Thank you very much.

Greg

Steven Parkes

unread,

May 19, 2010, 5:28:38 PM5/19/10

to johnso...@googlegroups.com

If you can create a test case, this'll be easier to track down and fix.

I'm not sure if this is right, but if you're talking about a string that's not UTF-8, I don't know what's going to happen. Pretty sure everything in JS has to be UTF-8.

greg

unread,

May 20, 2010, 12:07:27 PM5/20/10

to Johnson

Hello Steven,

> I'm not sure if this is right, but if you're talking about a string that's not UTF-8, I don't know what's going to happen. Pretty sure everything in JS has to be UTF-8.

That is what I though also but I read a thread on another ML that
stated that you could find some files encoded in other Unicode
encodings.
Here an example of code the Johnson assertion will barf on:
http://cecil.auckland.ac.nz/scripts/menu.js

But it is perfectly executable though and I even run this sample in
SpiderMonkey which Johnson should be using.

Regards,
greg/

Steven Parkes

unread,

May 20, 2010, 12:43:25 PM5/20/10

to johnso...@googlegroups.com

> That is what I though also but I read a thread on another ML that
> stated that you could find some files encoded in other Unicode
> encodings.
> Here an example of code the Johnson assertion will barf on:
> http://cecil.auckland.ac.nz/scripts/menu.js
>
> But it is perfectly executable though and I even run this sample in
> SpiderMonkey which Johnson should be using.

I'm not a unicode expert. I more or less learn it on an as-needed basis then tend to forget fairly quickly.

I do know trying to store binary in a JS string can cause it to barf. That's happened to me.

I'm just guessing here, but If this gets into SM via the shell or some other native I/O, I would imagine the problem is getting the encodings through Ruby sanely. It's entirely possible that having Ruby do the read and passing the result off to SM is mucking things up. I think this can be a little tricky but I haven't done much with it.

But it may be something completely different ...

Matthew Draper

unread,

May 21, 2010, 12:58:27 PM5/21/10

to johnso...@googlegroups.com

On Thu, 2010-05-20 at 09:43 -0700, Steven Parkes wrote:
> I'm just guessing here, but If this gets into SM via the shell or some
> other native I/O, I would imagine the problem is getting the encodings
> through Ruby sanely. It's entirely possible that having Ruby do the
> read and passing the result off to SM is mucking things up. I think
> this can be a little tricky but I haven't done much with it.

This.

The problem is that JavaScript strings are UCS-2, and Ruby strings are
ASCII bytes.

So we have to decide how we convert between them:

We treat JS strings as being full of bytes too, and ignore the upper 8
bits. This has the advantage that all Ruby strings can be converted to
JS.

Or, we treat Ruby strings as UTF-8, and convert between them as
necessary. This has the advantage that all JavaScript strings can be
converted to Ruby... and it assumes that anyone using multi-byte
characters in JS is probably doing the same in Ruby, and knows what
they're doing. But, it means we can only handle valid UTF-8 strings.

The third option is of course to translate byte-for-byte between the two
formats, so one JS character corresponds to two Ruby characters... but
while that will allow all valid characters in both environments, the
ruby-land strings would be near-useless.

Initially, we were going with Option 1, mostly due to lack of any
specific consideration. Someone tried to use multi-byte strings, and
complained that JavaScript wasn't seeing useful values; absent a
compelling reason to keep the previous behaviour, I changed it to
implement Option 2.

I guess another possibility would be to choose between the Option 1 and
Option 2 behaviours based on $KCODE or something.

The final option is the most complicated... try as hard as we possibly
can to avoid ever converting from one type of string to the other, and
instead provide duck-type-compatible String-like proxies.

Matthew

--
mat...@trebex.net

signature.asc

Gregory Blanc

unread,

May 21, 2010, 10:41:49 PM5/21/10

to johnso...@googlegroups.com

Steven, Matthew,

Thank you very much for your precisions. I guess that as said Matthew,
option 2 was implemented, which means that any JS string could be
translated to ruby strings, provided it is a valid UTF-8 string. In my
case, the JS String is not a UTF-8 then?

What kind of methods can I use to implement the 3rd option then?
(provided it would help solve my case).

Regards,

greg/

Reply all

Reply to author

Forward