Fixing a corner case with method patterns

24 views
Skip to first unread message

Bob Nystrom

unread,
May 5, 2012, 5:31:45 PM5/5/12
to magpi...@googlegroups.com
One of the things I like about Magpie is how it uses patterns,
records, and destructuring both for things like pattern matching but
also for method parameters. Getting a lot of mileage out of a few
primitives is good, so once you have records/tuples, you might as well
use them for argument lists, right?

So far, it's worked pretty well. However, there is one corner case of
how patterns and records work that bugs me. Let me walk through some
examples. Consider this simple method:

def takeTwo(a, b)
print(a + " " + b)
end

If we call it like this:

takeTwo(1, 2)

It prints "1 2" just like we expect. If we call it with extra arguments:

takeTwo(1, 2, 3)

It *still* prints "1 2". The "a, b" pattern that the method uses for
its argument is destructuring the "1, 2, 3" record and it only cares
about the first two fields. I think this is generally pretty cool.
After all, the record we're passing to it *does* have the fields that
takeTwo expects, it just happens to have some extra ones which get
ignored.

What happens if we don't give it enough?

takeTwo(1)

Here, we get a NoMethodError. That's good too. takeTwo() requires a
record with two fields, and it can't be invoked if we don't meet its
requirements. No weird "undefined" arguments like JS.

So far, I think these semantics are fine. Now let's change it a bit:

def takeOne(a)
print("<" + a + ">")
end

If we call it like this:

takeOne(1)

It of course prints "<1>". What happens if we give it too many?

takeOne(1, 2)

If you guessed "<1>", you unfortunately guessed wrong. It prints "<1,
2>". We pass a single argument (the record "1, 2") to takeOne() and
its pattern ("a") just binds that to a variable. There's no
destructuring at all since "a" is just a simple variable pattern, so
we bind the entire record to that variable.

I think the ability to have records be first-class is good. But I
think the default behavior here is surprising and unpleasant.

For completeness, let's take another example, passing too few arguments:

takeOne()

Are you expecting NoMethodError? Again, you'd be wrong. Here we're
still passing an argument to takeOne(), the implicit (but still very
real) value "nothing", which is basically Magpie's "null". "nothing"
is a perfectly cromulent object, and variable patterns like "a" in
takeOne() bind to it with no problems. So here, we just pass "nothing"
to takeOne(), which then prints "<nothing>".

Again, I think that behavior is unpleasantly surprising.

## The Solution ##

I think the core issue here is that methods that (conceptually) take
multiple arguments in Magpie destructure, but methods like takeOne
that only take a single argument don't. That inconsistency is
confusing. It would be like if Java implicitly did rest param behavior
(the ... syntax) on every method that happened to take a single
parameter.

Weird, right?

Here's my proposed fix. When you define a method, it's parameter
pattern is *always* a record pattern. So this:

def takeOne(n)
...
end

Is implicitly identical to:

def takeOne(0: n)
...
end

(The "0:" syntax here is actually valid Magpie now. It means a record
whose field is named "0". In other words a 1-element tuple.)

So methods always take records and always destructure. It the method's
pattern is *already* a record, we don't again wrap that in a
one-element record, so takeTwo() here would be unchanged. This is how
we make one-parameter methods consistent with others.

Likewise, when you *call* a method, if you only pass a single
argument, that implicitly gets wrapped in a one-field record.

Given that, let's go through the examples again:

takeOne(1)

Here, since "1" is a single argument, it implicitly gets wrapped in a
1-tuple, as "0: 1". The pattern for takeOne is "0: a", so it
destructures out the "1", and prints "<1>" as expected.

takeOne(1, 2)

Here, "3, 4" is already a record, so we don't wrap it. So "0: 1, 1: 2"
gets passed to takeOne(). That matches "0: a", so we destructure out
the first field "1" and print "<1>" as desired. Good!

takeOne()

Here, we're passing "nothing". Magpie will not wrap that in a record
(it only does that when there is *explicitly* one argument), so
"nothing" gets passed straight in. That doesn't match the record
pattern "0: a" so we get a NoMethodError like we expect.

## One Loss

This does come at a slight loss of genericity. Magpie currently lets
you do this:

var b = 1, 2
takeTwo(b)

Here, we're stuffing an entire record into "b" and then calling
takeTwo(). It gets that record, destructures it and prints "1 2". So
you can call methods of any "arity" generically since all methods
actually do only take a single argument.

With this change, the above wouldn't work. The parser will see that
there is only a single argument to takeTwo and implicitly wrap it in a
record, so this would pass "0: (1, 2)" to takeTwo and not just the raw
record.

For clarities' sake, that's actually probably an improvement since
someone looking at this code would likely be surprised at the above
behavior. But it does prohibit you from doing some stuff you might
want to do.

I think this can be fixed at some point, by introducing an explicit
"don't box me bro" syntax, exactly like the spread operator in JS. So
you'd do something more or less like:

var b = 1, 2
takeTwo(b...)

And the "..." (or whatever other syntax) means "I know this looks like
just one argument, but don't mess with it".

## Bonus

This loss of generality actually has a really nice bonus too. This
change means that we can tell from the *text* of the callsite what
kind of record a method is being called with. That means we can use
that *statically* to bind to a more precise method. That makes it much
easier to make Magpie go fast.

For example, consider:

def say(a)
print("one " + a)
end

def say(a, b)
print("two " + a + " " + b)
end

def callSay(c)
say(c)
end

callSay(1)
callSay(1, 2)

Imagine we are compiling "callSay" to some kind of bytecode. Right now
with the current semantics, all we can do is compile to a call to a
single "say" multimethod that contains both specializations. That's
because we can't tell until *runtime* whether "c" is a record that
matches the second specialization or not.

With this proposed change, we always know the argument's record
pattern at compile time. That means say(a) and say(a, b) are basically
overloads that we can resolve statically. We can get a call to say(a,
b) and compile it directly to a call to the right specialization and
pass the set of arguments straight. Otherwise (without trying to be
more sophisticated in the compiler), we have to:

1. Allocate and create a record to hold "1, 2".
2. Pass it to the "say" multimethod.
3. Go through all of the specializations and find the best match.
4. Destructure out "1" and "2" into "a" and "b".

So this should make it easier to optimize the VM.

This *doesn't* mean that *all* dispatch is static. That would suck!
Multimethods still do runtime dispatch for things like type and value
patterns, and for records nested within the top-level method pattern.
That's where all the fun magic of multimethods come into play.

If you have:

def say(a is Int)
print("a number")
end

def say(a is String)
print("a string")
end

Here when you call say(), we'll still be picking the best method at
runtime, which is what's great about multimethods.

Overall, I think this is an improvement that cleans up a nasty corner
case that's bugged me for a while. But it's entirely possible I
haven't thought through the ramifications of this all the way.
Thoughts?

- bob

PS. A quick status update to reward you for making it this far.
Progress on the real C++ VM is going well. There's still a ton of work
to do, but I feel like it's going smoothly. There's a suite of tests
under test/ that all pass on the new VM. The GC seems to more or less
work. So far, performance seems to be in the ballpark of other dynamic
languages like Python, which makes me really happy since I haven't
done any optimization yet and there's plenty of low-hanging fruit.

If things keep going like they're going, Magpie may end up being in
the same league performance-wise with other successful dynamic
languages, which would be awesome.

Mike Austin

unread,
May 6, 2012, 5:16:07 AM5/6/12
to magpi...@googlegroups.com
At first glance, this makes sense and looks like a good way to go - no more special cases for 1 argument functions.  I actually don't like variable arguments since it complicates a language - might as well wrap it up in an array and be more explicit.  The splat operator in Ruby is handy, but sometimes looks too much like magic.

I'm glad to see Magpie is making progress.  Mike

Alexander Solovyov

unread,
May 30, 2012, 8:09:26 AM5/30/12
to magpi...@googlegroups.com
On Sun, May 6, 2012 at 12:31 AM, Bob Nystrom <munifi...@gmail.com> wrote:
> PS. A quick status update to reward you for making it this far.
> Progress on the real C++ VM is going well. There's still a ton of work
> to do, but I feel like it's going smoothly. There's a suite of tests
> under test/ that all pass on the new VM. The GC seems to more or less
> work. So far, performance seems to be in the ballpark of other dynamic
> languages like Python, which makes me really happy since I haven't
> done any optimization yet and there's plenty of low-hanging fruit.

BTW, on topic of speed/etc - did you consider using PyPy for writing Magpie VM?

--
Alexander

Bob Nystrom

unread,
May 30, 2012, 12:58:48 PM5/30/12
to magpi...@googlegroups.com
I did, briefly. The blog post (that I can't seem to find now) that
someone wrote about using PyPy for their language VM was fascinating.

Technically speaking, it may be that using that for Magpie's VM would
end up being faster. But I also have to factor in:

1. What kind of codebase is easy for me and others to hack on.
2. Keeping things simple and comprehensible.
3. How much time I have to spent wading through code and learning some
tech stack versus writing code and getting stuff done.
4. Toolchain complexity.

Writing a vanilla VM in C++ makes all of those pretty straightforward,
especially given that I already know C++ fairly well. I worry that
PyPy and RPython would just suck up all of my time and take the fun
out of hacking on the VM.

- bob
Reply all
Reply to author
Forward
0 new messages