TurKit and Unicode?

53 views
Skip to first unread message

Michael Bernstein

unread,
Jun 4, 2010, 2:26:30 PM6/4/10
to Greg Little, turkit-d...@googlegroups.com
Can TurKit run UTF-8 scripts?  I have some input with non-latin character sets, so I began encoding my .js file in UTF-8.

For example, the one-line script (attached):
print("The web – is full of “data-driven apps.” 汎用=最大公約数幻想に訣別を。");

Gives me:
Retrying Script Evaluation: illegal character (C:\Users\msbernst\Documents\Soylent\code\turkit\utf8.js#1)
org.mozilla.javascript.EvaluatorException: illegal character (C:\Users\msbernst\Documents\Soylent\code\turkit\utf8.js#1)
    at org.mozilla.javascript.DefaultErrorReporter.runtimeError(DefaultErrorReporter.java:109)
    at org.mozilla.javascript.DefaultErrorReporter.error(DefaultErrorReporter.java:96)
    at org.mozilla.javascript.Parser.addError(Parser.java:146)
    at org.mozilla.javascript.TokenStream.getToken(TokenStream.java:825)
    at org.mozilla.javascript.Parser.peekToken(Parser.java:172)
    at org.mozilla.javascript.Parser.primaryExpr(Parser.java:2408)
    at org.mozilla.javascript.Parser.memberExpr(Parser.java:1955)
    at org.mozilla.javascript.Parser.unaryExpr(Parser.java:1813)
    at org.mozilla.javascript.Parser.mulExpr(Parser.java:1742)
    at org.mozilla.javascript.Parser.addExpr(Parser.java:1723)
    at org.mozilla.javascript.Parser.shiftExpr(Parser.java:1703)
    at org.mozilla.javascript.Parser.relExpr(Parser.java:1677)
    at org.mozilla.javascript.Parser.eqExpr(Parser.java:1633)
    at org.mozilla.javascript.Parser.bitAndExpr(Parser.java:1622)
    at org.mozilla.javascript.Parser.bitXorExpr(Parser.java:1611)
    at org.mozilla.javascript.Parser.bitOrExpr(Parser.java:1600)
    at org.mozilla.javascript.Parser.andExpr(Parser.java:1588)
    at org.mozilla.javascript.Parser.orExpr(Parser.java:1576)
    at org.mozilla.javascript.Parser.condExpr(Parser.java:1559)
    at org.mozilla.javascript.Parser.assignExpr(Parser.java:1544)
    at org.mozilla.javascript.Parser.expr(Parser.java:1523)
    at org.mozilla.javascript.Parser.statementHelper(Parser.java:1202)
    at org.mozilla.javascript.Parser.statement(Parser.java:707)
    at org.mozilla.javascript.Parser.parse(Parser.java:401)
    at org.mozilla.javascript.Parser.parse(Parser.java:359)
    at org.mozilla.javascript.Context.compileImpl(Context.java:2370)
    at org.mozilla.javascript.Context.compileReader(Context.java:1321)
    at org.mozilla.javascript.Context.compileReader(Context.java:1293)
    at org.mozilla.javascript.Context.evaluateReader(Context.java:1132)
    at edu.mit.csail.uid.turkit.RhinoUtil$2.func(RhinoUtil.java:108)
    at edu.mit.csail.uid.turkit.RhinoUtil.evaluate(RhinoUtil.java:72)
    at edu.mit.csail.uid.turkit.RhinoUtil.evaluateFile(RhinoUtil.java:106)
    at edu.mit.csail.uid.turkit.TurKit.runOnce(TurKit.java:252)
    at edu.mit.csail.uid.turkit.TurKit.runOnce(TurKit.java:287)
    at edu.mit.csail.uid.turkit.gui.Main.onRun(Main.java:584)
    at edu.mit.csail.uid.turkit.gui.Main.onEvent(Main.java:537)
    at edu.mit.csail.uid.turkit.gui.SimpleEventManager.fireEvent(SimpleEventManager.java:30)
    at edu.mit.csail.uid.turkit.gui.SimpleEventManager.fireEvent(SimpleEventManager.java:24)
    at edu.mit.csail.uid.turkit.gui.Main$6.actionPerformed(Main.java:131)
    at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
    at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
    at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
    at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
    at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown Source)
    at java.awt.AWTEventMulticaster.mouseReleased(Unknown Source)
    at java.awt.Component.processMouseEvent(Unknown Source)
    at javax.swing.JComponent.processMouseEvent(Unknown Source)
    at java.awt.Component.processEvent(Unknown Source)
    at java.awt.Container.processEvent(Unknown Source)
    at java.awt.Component.dispatchEventImpl(Unknown Source)
    at java.awt.Container.dispatchEventImpl(Unknown Source)
    at java.awt.Component.dispatchEvent(Unknown Source)
    at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
    at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
    at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
    at java.awt.Container.dispatchEventImpl(Unknown Source)
    at java.awt.Window.dispatchEventImpl(Unknown Source)
    at java.awt.Component.dispatchEvent(Unknown Source)
    at java.awt.EventQueue.dispatchEvent(Unknown Source)
    at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
    at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
    at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
    at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
    at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
    at java.awt.EventDispatchThread.run(Unknown Source)

Help or advice would be appreciated.  I'm guessing this is an issue with Rhino?
- Michael

utf8.js

Michael Bernstein

unread,
Jun 8, 2010, 7:27:18 PM6/8/10
to Greg Little, turkit-d...@googlegroups.com, David Crowell
Based on a UID group brainstorm today, it turns out that UTF-8 is actually supported.  (If you look at the source for slurp(), you see that it explicitly loads as UTF-8.)

The problem is actually that the TurKit GUI loads in the main script to run using the default character set of the machine, which is often Latin-1 or something not unicode.  This leads to the odd behavior that if I eval(read("a_unicode_file.js")) from a main file that is ASCII, everything works great.

I'm attaching a patch to RhinoUtil.java that fixes the problem by loading the main file in as UTF-8 encoding, which should be backwards compatible with normal ASCII or Latin-1 scripts.  I hope you might consider applying it and rolling out a new release?  In the meantime I created my own new version of TurKit with this patch to run a main file that is UTF-8 encoded.
- Michael

2010/6/7 Greg Little <gli...@gmail.com>
Empirically, based on that error, I don't think UTF-8 is supported.
However, you can encode unicode using \u, like "\u8a23". You can paste
汎用=最大公約数幻想に訣別を into
http://glittle.org/JavaStringEditor/JavaStringEditor.html to get
"\u6c4e\u7528\uff1d\u6700\u5927\u516c\u7d04\u6570\u5e7b\u60f3\u306b\u8a23\u5225\u3092".

2010/6/4 Michael Bernstein <msbe...@mit.edu>:
> Can TurKit run UTF-8 scripts?  I have some input with non-latin character
> sets, so I began encoding my .js file in UTF-8.
>
> For example, the one-line script (attached):
> print("The web - is full of “data-driven apps.” 汎用=最大公約数幻想に訣別を。");
RhinoUtil.java.patch

Greg Little

unread,
Jun 8, 2010, 8:03:23 PM6/8/10
to Michael Bernstein, turkit-d...@googlegroups.com, David Crowell
oh sweet, thanks! I have uploaded a new version with that patch.
~Greg

2010/6/8 Michael Bernstein <msbe...@mit.edu>:

Reply all
Reply to author
Forward
0 new messages