Re: [squeak-dev] compiled squeakjs

51 views
Skip to first unread message

Eliot Miranda

unread,
Jan 18, 2015, 1:38:10 PM1/18/15
to The general-purpose Squeak developers list, Squeak Virtual Machine Development Discussion, newspeak...@googlegroups.com
Hi Florin,

On Jan 18, 2015, at 9:32 AM, Florin Mateoc <florin...@gmail.com> wrote:

> Hi,
>
> Inspired by Bert's project, I started thinking about how to get Smalltalk compiled to Javascript instead of interpreted.
> I do have previous experience in compiling Smalltalk to Java (after type inference, which we thankfully don't need
> here). But, the requirements are a bit tighter here: we have to take an unknown image, get it translated on the fly,
> completely automatically, and even allow the translated image to self-modify. Plus we cannot just decree that become:
> cannot be used
>
> Given that the input is an image, not sources, we'd better rely on the decompiler, so I started there. I think I fixed
> it, so that it can now decompile everything correctly.
> I also implemented a few AST transformations (similar to the ones that were necessary for Java, like normalizing the
> various boolean constructs and making them statements).
> I then started to write a Javascript pretty-printer, but I stopped when I realized that there were a few things missing:
> while non-local returns and resumable exceptions can be implemented using exceptions and an explicit stack of handlers,
> preemption (and Smalltalk's processes in general) were harder. After some research I came to the conclusion that this
> was doable if, instead of doing a direct pretty-printing of the Smalltalk nodes to Javascript, we also used the
> translation process to transform the code in continuation passing style. Then non-local returns become trivial and
> preemption can be implemented with closures, without needing access to the underlying execution stack.
> An interrupted context would have a no-arg closure representing the continuation instead of a pc. In general, only
> preemption points (which all have a corresponding continuation closure) would have to be mapped, and this would happen
> at image read time as well. The exception would be the debugger - I am not sure about that one yet.
> The primitive code would be inlined in the primitive methods, followed by a preemption point and the failure code.
> Unfortunately invocation would still not be direct, but looked up (and invoking DNU if needed), but I would store all
> the translated methods directly in the class prototype, so there would be no need for explicit superclass chain lookup.
> The instvars would also be stored directly in the class prototype (but with a prefix, to not conflict with the methods
> or with reserved keywords), and they would be accessed directly (with dot notation), except for assignments, which would
> record the owner (and the index in the owner), for all non-primitive types (not sure what to do about strings).
> Every method (and formerly Smalltalk block closure) would have a single temp called "thisContext", which would be an
> owner for the actual temps. The owners info would be used for implementing become: and allReferences.
>
> The ProtoObject and Object methods coming from Smalltalk would be stored in Object.prototype. Proxy classes would have
> their prototypes cleared and only contain the ProtoObject methods.
> Primitive type classes would have to be massaged a little: Number would have a union of methods from the Smalltalk
> Number subclasses, as well as the methods inherited from Magnitude.
> String would also have the methods inherited from Collection and SequenceableCollection, as well as from Character (and
> Magnitude) and Symbol - this one could be a little nastier, but I think it could be made to work.
> I would also map Array to Array, IdentitySet to Set and IdentityDictionary to Map. Weak collections are harder, because
> Javascript decided to make them not enumerable. Because of this, allInstances would also be a challenge.
>
> I am not sure yet about the bootstrap process. I just have a fuzzy feeling that Craig's Context running under SqueakJS
> might make it easier.
>
> I hope this gives a general idea about the approach. Please do point out weaknesses that I may have missed. For me this
> is fun and I will proceed slowly, as time permits, since I cannot do it at work.
> Of course, I am very interested to hear Bert's opinion :)



I like your approach, that if making everything work, not taking the simpe approach of translating what will work directly and disallowing the rest (as does Amber and Clamato etc). You might want to talk to Ryan Macnak and Gilad Bracha about their Newspeak implementation above JavaScript (they're also doing one above Dart).

I do think Bert's approach is fun, too. But I do feel extremely frustrated that no one is taking the obvious route of making a plugin to allow the Cog VM to be used directly, gaining much higher performance and reducing the number of execution platforms we have to support.

A plugin would use JavaScript to collect events, to render and to access the DOM (all of this code can be stolen from Bert's VM). The JavaScript component would connect to the VM via a socket. The VM itself would be quite small (it's already only around a megabyte of executable). For me arguments about the inconvenience and slowness of downloading and installing are not compelling given the ubiquity of Flash.

And then there really is /no/ difference in the execution semantics, and /no/ performance degradation, and the code is as portable as Bert's VM provided Cig runs on the platform.

I'd be doing this myself if I weren't working on getting Spur released, getting 64-but Spur working, working with Clément on Sista and looking at hosting Cog over Xen. Come on folks; someone out there must think this is useful and interesting.

> Florin

florin...@gmail.com

unread,
Jan 18, 2015, 3:59:48 PM1/18/15
to newspeak...@googlegroups.com, squea...@lists.squeakfoundation.org, vm-...@lists.squeakfoundation.org
Hi Eliot,

Thank you for pointing me to this project (in particular to NS2V8).

I have just read Ryan's post "Update on compilation to Dart and JavaScript" and I have a couple of questions:



Ryan,

Can you please explain what you do for initialization (especially for large arrays, sets, etc)? Assuming that Newspeak also has something like Smalltalk's nil, do you fill them at creation time with nil?
I was thinking of avoiding that and testing the receiver at every invocation for JavaScript's undefined instead. Of course, this just moves the pain point, I am not sure which is better. Also, the test for nil can be avoided when the receiver is "this" or "super" or some literal - one can optimize this even in other cases with some static analysis.

I also don't understand the line:
"NS2JS and NS2V8 both map Newspeak's basic types onto JavaScript's basic types by installing functions on the prototypes of Number, String, etc. We apply strict mode, so these functions do not operate on boxed values."

E.g. the following snippet works:

"use strict"
Number.prototype.test = function() {return 5};
var n = 2;
n.test()

So what does it mean that "these functions do not operate on boxed values"?


Thank you,
Florin

Ryan Macnak

unread,
Jan 18, 2015, 7:17:46 PM1/18/15
to newspeak...@googlegroups.com, The general-purpose Squeak developers list, Squeak Virtual Machine Development Discussion
On Sun, Jan 18, 2015 at 10:38 AM, Eliot Miranda <eliot....@gmail.com> wrote:
I do feel extremely frustrated that no one is taking the obvious route of making a plugin to allow the Cog VM to be used directly, gaining much higher performance and reducing the number of execution platforms we have to support.

I thought there were already a few projects that attached a Squeak VM to NaCl.

Ryan Macnak

unread,
Jan 18, 2015, 7:26:29 PM1/18/15
to newspeak...@googlegroups.com, The general-purpose Squeak developers list, Squeak Virtual Machine Development Discussion
On Sun, Jan 18, 2015 at 12:59 PM, <florin...@gmail.com> wrote:
Ryan,

Can you please explain what you do for initialization (especially for large arrays, sets, etc)? Assuming that Newspeak also has something like Smalltalk's nil, do you fill them at creation time with nil?

We eagerly assign nil to all slots. For regular objects, it is important that slots for a given class are always initialized in the same order, otherwise a JS engine like V8 won't consider all the instances to be of the same Map (hidden class), things will seem more polymorphic than they are, and optimizations won't happen. For arrays, I don't know whether this is more harmful to performance because we pollute type data about the array's elements with UndefinedObject or this is more helpful because we don't need checks for undefined in #at:. Certainly in terms of implementation effort, eager initialization is much better than sprinkling checks in all the places that access slots.
 
I was thinking of avoiding that and testing the receiver at every invocation for JavaScript's undefined instead. Of course, this just moves the pain point, I am not sure which is better. Also, the test for nil can be avoided when the receiver is "this" or "super" or some literal - one can optimize this even in other cases with some static analysis.

We do very little static analysis beyond the standard cheats for #ifTrue: and friends. Generally, it makes implementing reflection much more difficult. dart2js is a good example of a compiler with this problem.

I also don't understand the line:
"NS2JS and NS2V8 both map Newspeak's basic types onto JavaScript's basic types by installing functions on the prototypes of Number, String, etc. We apply strict mode, so these functions do not operate on boxed values."

E.g. the following snippet works:

"use strict"
Number.prototype.test = function() {return 5};
var n = 2;
n.test()

So what does it mean that "these functions do not operate on boxed values"?

When the function test is not in strict mode, every invocation requires the allocation of a number object to use as the receiver. In strict mode, the receiver is the number value directly. (ECMAScript 5.1 10.4.3)

Ryan

Eliot Miranda

unread,
Jan 18, 2015, 8:05:39 PM1/18/15
to newspeak...@googlegroups.com, The general-purpose Squeak developers list, Squeak Virtual Machine Development Discussion
As I understand it, running under NaCl requires reworking the JIT and has real problems doing the self-modifying code involved in inline caches, etc.  I want something that doesn't involve running under a managed run-time (the VM is a managed run-time, layering two on top of each other has always seemed like a poor choice to me).  And if NaCl made it really easy to do why haven't any of these projects delivered yet?

--
best,
Eliot

Florin Mateoc

unread,
Jan 19, 2015, 9:03:50 AM1/19/15
to newspeak...@googlegroups.com
I just realized that I had hit reply to list (and the chosen list was a squeak one) instead of reply all and some of you are most likely not subscribed to the squeak mailing lists, I am resending my reply here

Hi Ryan,

Thank you for your prompt reply,



On 1/18/2015 7:26 PM, Ryan Macnak wrote:
 


On Sun, Jan 18, 2015 at 12:59 PM, <florin...@gmail.com> wrote:
Ryan,

Can you please explain what you do for initialization (especially for large arrays, sets, etc)? Assuming that Newspeak also has something like Smalltalk's nil, do you fill them at creation time with nil?

We eagerly assign nil to all slots. For regular objects, it is important that slots for a given class are always initialized in the same order, otherwise a JS engine like V8 won't consider all the instances to be of the same Map (hidden class), things will seem more polymorphic than they are, and optimizations won't happen. For arrays, I don't know whether this is more harmful to performance because we pollute type data about the array's elements with UndefinedObject or this is more helpful because we don't need checks for undefined in #at:. Certainly in terms of implementation effort, eager initialization is much better than sprinkling checks in all the places that access slots.
 


There is no need to check for undefined in #at:, or in any place that just references slots. Undefined can be assigned and passed around in general. The only checks needed are for receivers at invocation time (and even those can  sometimes be avoided), and then (if receiver is undefined) delegate to the UndefinedObject instance


I was thinking of avoiding that and testing the receiver at every invocation for JavaScript's undefined instead. Of course, this just moves the pain point, I am not sure which is better. Also, the test for nil can be avoided when the receiver is "this" or "super" or some literal - one can optimize this even in other cases with some static analysis.

We do very little static analysis beyond the standard cheats for #ifTrue: and friends. Generally, it makes implementing reflection much more difficult. dart2js is a good example of a compiler with this problem.

I also don't understand the line:
"NS2JS and NS2V8 both map Newspeak's basic types onto JavaScript's basic types by installing functions on the prototypes of Number, String, etc. We apply strict mode, so these functions do not operate on boxed values."

E.g. the following snippet works:

"use strict"
Number.prototype.test = function() {return 5};
var n = 2;
n.test()

So what does it mean that "these functions do not operate on boxed values"?

When the function test is not in strict mode, every invocation requires the allocation of a number object to use as the receiver. In strict mode, the receiver is the number value directly. (ECMAScript 5.1 10.4.3)


Ah, ok, so you meant that by using strict mode, "these functions" operate more efficiently for primitive values, not that they do not work when the values are boxed

Florin
Reply all
Reply to author
Forward
0 new messages