weird unicode characters in the generated JS

120 views
Skip to first unread message

Brett Cohen

unread,
Nov 23, 2015, 11:02:18 AM11/23/15
to RapydScript
I'm fighting dragons.

rapydscript file.pyj -m

(function(){"use strict";var ՐՏ_Temp;(function(){var __name__ = "__main__";function test(){ՐՏ_print("hello")}test()})();})();

rapydscript file.pyj -m | cat -v

(function(){"use strict";var M-UM-^PM-UM-^O_Temp;(function(){var __name__ = "__main__";function test(){M-UM-^PM-UM-^O_print("hello")}test()})();})();

locale

LANG=en_ZA.UTF-8
LANGUAGE=en_ZA:en
LC_CTYPE="en_ZA.UTF-8"
LC_NUMERIC="en_ZA.UTF-8"
LC_TIME="en_ZA.UTF-8"
LC_COLLATE="en_ZA.UTF-8"
LC_MONETARY="en_ZA.UTF-8"
LC_MESSAGES="en_ZA.UTF-8"
LC_PAPER="en_ZA.UTF-8"
LC_NAME="en_ZA.UTF-8"
LC_ADDRESS="en_ZA.UTF-8"
LC_TELEPHONE="en_ZA.UTF-8"
LC_MEASUREMENT="en_ZA.UTF-8"
LC_IDENTIFICATION="en_ZA.UTF-8"

using nvm

nvm current

v5.1.0 (also tested with v0.12.7)

a sed replace makes it browser friendly otherwise moz chokes with an invalid character.

any ideas what up?


Alexander Tsepkov

unread,
Nov 23, 2015, 11:17:02 AM11/23/15
to Brett Cohen, RapydScript
The unicode you're seeing is result of RS's "ՐՏ_" prefix, I changed it to unicode a few months ago to both make the output prettier than original _$rapyd$_ prefix and make it extremely hard for the user to cause name collisions with RS's auto-generated variables/functions. Modern browsers should have no problems with unicode, you mentioned that you're running this in the browser ("browser friendly")? Or are you actually using it in the node, since you provide its version number? I'd need more information on your environment to understand what's going on.

Also, another member had a similar issue in the past - about a month ago (I don't remember if it was asked on github issue list or on this mailing list), he was using multiple environments. For node, I believe updating the version fixed the issue (he was using a really old one) and for his other environment specifying that the file was utf8 fixed it. That may be the same issue. Also, I plan to add a compile/config flag later this week to let users change the prefix since I didn't expect different JS environments to be so bad about supporting unicode that's valid by JS standards.

Brett Cohen

unread,
Nov 24, 2015, 9:36:00 AM11/24/15
to RapydScript, brett...@gmail.com
using rapydscript under node to compile pyj to a js file, then load the js file in my html with a script tag.

firefox 42 running on manjaro linux 64 bit craps out a SyntaxError: illegal character when it sees the unicode char.

I think your compile switch will work perfectly, I run the generated js through a sed script replacing the unicode and it works 100%

cool project, thanks for all the effort you've put into it.

John P Charlesworth

unread,
Dec 2, 2015, 11:34:23 AM12/2/15
to RapydScript, brett...@gmail.com
Modern browsers will handle unicode but you have to tell them what the encoding is. You do that by adding a charset attribute to your script tags. charset="UTF-8" will work fine with the Javascript produced bt RapydSript. 

Jacques de Hooge

unread,
Dec 12, 2015, 12:42:00 PM12/12/15
to RapydScript
@Alexander:
I have the same problem. Did you already add that switch to alter the prefix. I don't see it when I use --help. Which switch is it?
Kind regards

Juan Francisco Roco

unread,
Dec 14, 2015, 11:56:29 PM12/14/15
to RapydScript
Try this (you don't need to change the  ՐՏ_ prefix):

<script type="text/javascript" src="script.js"  charset="UTF-8"></script>


charset="UTF-8" makes the magic

John P Charlesworth

unread,
Jan 21, 2016, 1:06:53 PM1/21/16
to RapydScript
There is another aspect to this which I only recently discovered. There seems to be a bug in Chrome, in that it does not always send the charset in the ContentType header, resulting in unrecognised token errors. Usually resetting the browser with Ctrl-F5 will remedy the problem but the charset will not get cached so you have to use Ctrl-F5 every time.This happens even with <meta charset="UTF-8"> right at the beginning of the <head> section and with charset="UTF-8" in all the <script> tags.

This seems to depend on how the server is set up because I have code that loads perfectly well on localhost (xampp apache server) but not on my website (also apache but seemingly set up with different options).

However, I discovered a workaround which seems to work perfectly, which is to change the extension of your javascript files from .js to .utf8.js. Apache interprets extensions like this as a hint to load the file with the utf-8 charset even if the browser hasn't asked for it. I haven't got round to checking whether any other browsers have the same problem, or whether any other servers will understand the utf8 in the file extension.

I got the idea from an interesting article on character encoding. The URL is
Reply all
Reply to author
Forward
0 new messages