> * No global/static state - I've found that, with very few exceptions, this leads to more modular and composable designs. If you're compiling a language that depends on global state, you can always add an implicit first parameter to every method call that's a "global state" object.Would this return the new global state or mutate the original state?
Hi Bob,I saw and read a few of your blog posts on Magpie, and they definitely piqued my interest. There are several things I particularly like about the design of Magpie:
* Multimethods (as far as I can tell) are the _only_ way to define / call methods
* One argument, one return value
* minimal syntax (no dot between callee and method, etc)
In some cases, I would have made different decisions in designing a language, but these are mostly quibbles and stylistic choices, and thus of lesser importance. Great job!
What follows is largely a brain dump; feel free to skim or ignore completely. :)I've on and off dabbled in designing and implementing my own language, but I've never gotten as far as you have with Magpie. One of the things I learned in doing so is that I'm actually much more interested in the internals of the language - the compiler and the virtual machine (as applicable). More fundamentally, I'm interested in the platform - the interface between the compiler and the virtual machine - the bytecode, the JIT compiler, the runtime services, etc.Where do you see Magpie going, in terms of the environments it runs on, interoperability with other languages, etc? Will it forever run in it's own VM, isolated and lonely? Will it compile to, for instance, java and CLR bytecode? Perhaps it will grow a javascript backend?
I ask, because in my mind, if you intend to go the direction of interoperability, one of the most important considerations is what your IR looks like. It's not something to be squirreled away, only for VM and compiler hackers. It should be highly-visible and well understood.
Below are a few of my own thoughts on the good design of an IR / platform / whatever. I'd be interested in discussing how (and if) these line up with your vision for Magpie.* Simplicity - It should be possible to bootstrap a brain-dead-simple interpreter in very little code.
My yardstick would be "around a couple thousand lines in pick-your-favorite-language." To this end, I think the only possible form of control flow (branching and looping, as well as function invocation) should be invoking a multimethod.
If statements, for loops, and other language-level control structures should be compiled down to potentially recursive invocations between a group of methods. Method implementations at the language level will become several IR-level methods, roughly corresponding to basic blocks in SSA form.* Static typing - This admits a much wider variety of static analysis, and many other benefits that I don't really want to argue here. The point I'd like to make, in relation to Magpie, is that static and dynamic typing, particularly in the context of multimethods, actually get along quite well.
When compiling a dynamicly-typed language down to a staticly-typed platform that supports multimethods, a call just become a cast to an implicitly-defined (structurally typed) interface, followed by a normal type-checked method call.* Few IR-level primitives - things like strings can be implemented cleanly as arrays (or lists, equivalently) of characters. Characters are really just integers, and even integers can be broken down into arrays of bits. Bits of course, can be different instances of some type.
This doesn't mean anything but a brain-dead-simple interpreter would actually implement them directly like that. Ideally, the VM would recognize the structure of the types and a set of core operations on them (addition, multiplication, concatenation, ...) and lower them to fast hardware operations. Worst case, the types in the standard library can be annotated for the benefit of the VM, to indicate where it can perform this optimization.* No global/static state - I've found that, with very few exceptions, this leads to more modular and composable designs. If you're compiling a language that depends on global state, you can always add an implicit first parameter to every method call that's a "global state" object.
* Strict dynamic code loading - Dynamically generating and loading code should be possible, but only in a new "VM context" (something like AppDomain in the .NET world). As a corollary, you should never be able to add (or delete or replace) implementations in a multimethod. You can always create a new context, deriving from the current context, and add you're new implementation there - but even then, the new code can't affect the dispatch of method calls within the original context.
--------------------As a side note, I've been contributing to a small JVM called Avian (https://github.com/ReadyTalk/avian). It comes with a simple JIT compiler, and while it can't compete with hotspot in terms of performance (it's not a push-over either),
it's much more hackable. I've long been interested in porting a new language / runtime to Avian. Magpie looks like a compelling choice. Thoughts?
Other back-ends would be fantastic too. One tricky part is that fibers are pretty core to Magpie so you'll have to figure out how make those cheap and lightweight. This is relatively easy on the C++ VM because the callstacks are just objects. It doesn't use the native C stack. That may be trickier on the CLR or JVM, but I don't know. I think there's an Erlang port to the JVM. I wonder how it handles processes?
But I think you're talking more about being able to share Magpie code that's been compiled to some IR between implementations? The C++ VM does have a bytecode format, but I appreciate the luxury of having that not be "standardized". It gives an implementation freedom to tailor it to that implementation.
I definitely find that appealing. Magpie used to be much more in that vein but (ironically enough) it became less so when I took out single dispatch and added multimethods. Most of the control flow structures were syntax extensions implemented in Magpie.My limited experience was that it was a really cool idea in terms of implementing the language, but it wasn't as useful in terms of ending up with a practical, useful language.
For what it's worth the current bytecode format for the VM does not consider a multimethod invocation to be a primitive. Picking the right method from a set of methods is actually a pretty complicated chunk of branching logic. So that decision tree gets compiled down to more primitive operations (equality checks, type tests, and branches).
If statements, for loops, and other language-level control structures should be compiled down to potentially recursive invocations between a group of methods. Method implementations at the language level will become several IR-level methods, roughly corresponding to basic blocks in SSA form.* Static typing - This admits a much wider variety of static analysis, and many other benefits that I don't really want to argue here. The point I'd like to make, in relation to Magpie, is that static and dynamic typing, particularly in the context of multimethods, actually get along quite well.I like static types but (and maybe this is just my problem) I find designing a good type system to be fantastically hard. I tried to strike a balance with Magpie where the language is dynamically typed but there are all these type patterns everywhere that look an awful lot like type annotations. My hope is that a smart implementation can eventually use those to make optimizations. And if Magpie ever does have tooling support, those type patterns will certainly help.
I think that works less well when you start worrying about encodings. Text is hard.
I'm definitely not a fan of global state, especially global mutable state. Magpie has top-level state at the module level, but modules don't implicitly share any state.
One thing I really wanted to do with Magpie is have the top level of a program be pure definitions, not imperative code, like it is in C, C++, Java, C#, etc. I think that makes things like static compilation (say to JS) and tooling much easier.
Unfortunately, I wasn't able to figure out a design where that made sense with the rest of Magpie's features. Since method patterns can reference top-level variables, which can in turn be intialized by calling methods (even new is a method after all), I ended up just admitting defeat and making the top-level imperative code like it is in most "scripting" languages. (In practice, its closest to Scheme where the top level is imperative but has implicit support for mutual recursion.)
I'm a big fan of Gilad Bracha's work on mirrors. (My day job is on the Dart language project, which Gilad also works on.) When Magpie gets some reflective capabilities, they'll likely be based heavily on that.
I would be really interested for you to try that and see how it goes. Having another Magpie implementation, especially by someone else will very likely help flush out ambiguities and weird corners of the language. (Not to mention having another place where people can run Magpie programs.)
Bob,Thanks for the in-depth reply!Other back-ends would be fantastic too. One tricky part is that fibers are pretty core to Magpie so you'll have to figure out how make those cheap and lightweight. This is relatively easy on the C++ VM because the callstacks are just objects. It doesn't use the native C stack. That may be trickier on the CLR or JVM, but I don't know. I think there's an Erlang port to the JVM. I wonder how it handles processes?You might be interested to know that Avian has native support for continuations (ala scheme). If by fibers, you mean something like green threads, coroutines, etc., I think they won't be too difficult to implement on Avian. Is the idea to have something like goroutines (from Go)?
Interesting. I don't have any concrete evidence, but I would expect keeping such a (near) minimal IR would confer a couple of advantages: it should be easier to grow a collection of tools around the platform/IR, and (more significantly, I think) optimizations have the potential to be much more general. If branches are just multimethod invocations, you can merge the efforts of optimizing branches and multimethods. Multimethods calls that are functionally just simple branches (perhaps they dispatch on one parameter, between two possible types) can be optimized equally with ifs and loops, for instance.
While this isn't a problem when you have access to the source code (in the compiler), it limits the ability to have separate compile and link steps. In the case of Magpie, this would limit the ability to maintain a cache of pre-compiled code, as python does in .pyo and .pyc files (IIRC).
I like static types but (and maybe this is just my problem) I find designing a good type system to be fantastically hard. I tried to strike a balance with Magpie where the language is dynamically typed but there are all these type patterns everywhere that look an awful lot like type annotations. My hope is that a smart implementation can eventually use those to make optimizations. And if Magpie ever does have tooling support, those type patterns will certainly help.I haven't stumbled upon a multimethod type system that I'm fully satisfied either. Perhaps the first step, proof-of-concept could be a linter that checks you Magpie code for type consistency, inferring types as appropriate.
I think that works less well when you start worrying about encodings. Text is hard.A very good point. I wouldn't try to come up with my own encoding, but rather pick a standard encoding (utf-8, for instance) and stuff it into a byte array (which itself can be easily, rigorously modeled from first principles).
Unfortunately, I wasn't able to figure out a design where that made sense with the rest of Magpie's features. Since method patterns can reference top-level variables, which can in turn be intialized by calling methods (even new is a method after all), I ended up just admitting defeat and making the top-level imperative code like it is in most "scripting" languages. (In practice, its closest to Scheme where the top level is imperative but has implicit support for mutual recursion.)I can't immediately point out a solution, but I wouldn't give it up as hopeless.
I've been browsing through the code for the C++ VM. It's very nice & clean!
I'd like to make a few suggestions: (impact)* (minor) I'm running on OS X, but I'm a great fan of the terminal (and not so much of xcode). I was able to modify the run_gyp script to generate a Makefile, but it'd be nice if this were a little more accessible.
* (major) I've really liked the idea of making operators _partially_ ordered, instead of completely ordered. Obviously, all of the commonly-accepted operator precedences remain the same ([+, -] < [*, /], [and, or] < [==, !=], ...), but some precedence relationships are left undefined ( [..., ..] ? [+, -], [is] ? [==, !=], perhaps others). For the precedence orderings that are undefined, the compiler forces the programmer to use parens to signal their intent; omitting parens in such cases is a syntax error.
This has the potential to improve the readability of code for people who aren't intimately familiar with the language's precedence hierarchy. For example, "1 .. 2 + 3" is somewhat ambiguous to the uninformed reader. If the programmer is forced to write "1 .. (2 + 3)" (because the ordering between .. and + is undefined), it makes the code much clearer. What do you think?
-Joshua