Thanks,
/Autrijus/
Not unless you want to write the Halting engine that determines that 3
is in fact more specific that 2..10. It's based on definition order,
so that if you have dependencies in you condition (which you
oughtn't), you'd better define the multis together to get well-defined
semantics.
> > The upshot is that these are now errors:
> >
> > sub foo ($x) is rw { $x }
> > my $a;
> > foo($a) = 4; # runtime error - assign to constant
>
> I assumed lvalue subs would implicitly return void and an
> assignment goes to the function slot of the args used in the assignment
> and subsequent calls with these args return exactly this value.
> In that respect arrays and hashes are the prime examples of lvalue
> subs. Other uses are interpolated data, Delauny Triangulation etc.
Well, in the absence of optimization, what's usually going on is that
the lvalue sub is returning a tied proxy object, which you then call
STORE on.
Luke
> Not unless you want to write the Halting engine that determines that 3
> is in fact more specific that 2..10. It's based on definition order,
> so that if you have dependencies in you condition (which you
> oughtn't), you'd better define the multis together to get well-defined
> semantics.
That seriously sucks.
Multis rock because they let you append to an interface from your
perspective.
If it's just a pretty form of casing, then we aren't gaining
anything, IMHO.
--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &
/\ kung foo master: /me groks YAML like the grasshopper: neeyah!!!!!!
http://svn.openfoundry.org/pugs/docs/mmd_match_order.txt
Values may be compiled into where clauses which are eventually just
a big given/when behind the scenes, but the order in which they are
checked must be integrated with type checking, and must be sorted to
make sense.
If you cannot define a more particular case of a method in order to
optimize for:
* speed
* simplicity
then you lose on a lot of what makes MMD a useful tool for post-oop.
In code which was preplanned for MMD, that was properly ordered, mmd
is useful as a subset of it's behavior - it's just pattern matching.
This is nice, but has none of the extensibility that MMD can offer
if done differently.
/\ kung foo master: /me sneaks up from another MIME part: neeyah!!!!!
He meant:
http://svn.openfoundry.org/pugs/docs/notes/mmd_match_order.txt
Luke
> > the one defined LATER in the file wins
That should read
"the one defined in the LATER file wins"
=)
> If we're going to make a choice for the user (something we usually
> avoid), we might as well go with the one that I would pick :-)
Blah blah blah, write a pragma, blah blah blah.
I tend to agree on generic -> specific, but if it is to be read like
given/when in a way, which arguably it is behind the scenes, then
maybe we should make given { } take the statements in a block and
execute them from last to first?
> I like the idea of your tree of match order, I just don't like the
> tree itself too much.
It isn't a tree... see below
> If we're going to reorder things for the user,
> it does need to happen in a predictable way, even if it's not correct
> 100% of the time. I find your tree to be pretty complex (that could
> be because I don't understand the reasoning for the ordering
> decisions). I'd prefer something more like:
>
> 1. Constants
> 2. Junctions / Ranges
> 3. Regexes
> 4. Codeblocks
This is pretty match the same as what I proposed...
The sub points are usually clarifications, not a tree.... Did you
actually read it?
It discusses types, roles, inheritence, and so forth, as well as
measuring the "specifity" of junctions of values and types. It's
long because it goes into detail.
> Where none of them is recursively decended into for matching. That
> particular order has no special significance, it just felt natural.
> I'm just pointing out that it should be simple[1].
I agree with simplicity.
Please read the sequence i proposed - it's trying to define with
great detail the simplest rules I can think of.
> it is only simple and predictable when you have the whole class
> heirarchy in your head.
That's why under the fourth steps I detailed that MI confusions are
a fatal error, possibly at compile time.
--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &
/\ kung foo master: *shu*rik*en*sh*u*rik*en*s*hur*i*ke*n*: neeyah!!!!
Hmm. I wonder if we should just make the later ones win in all cases.
Generally when I structure code, I find it most natural to go from
general to specific. If we're going to make a choice for the user
(something we usually avoid), we might as well go with the one that I
would pick :-)
I like the idea of your tree of match order, I just don't like the
tree itself too much. If we're going to reorder things for the user,
it does need to happen in a predictable way, even if it's not correct
100% of the time. I find your tree to be pretty complex (that could
be because I don't understand the reasoning for the ordering
decisions). I'd prefer something more like:
1. Constants
2. Junctions / Ranges
3. Regexes
4. Codeblocks
Where none of them is recursively decended into for matching. That
particular order has no special significance, it just felt natural.
I'm just pointing out that it should be simple[1].
Still, I very much agree with your desire to be able to extend someone
else's interface, which we can solve by messing with the tiebreaking
order.
Luke
[1] That is also my complaint about the Manhattan metric for
multimethod resolution: it is only simple and predictable when you
--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &
/\ kung foo master: /me tips over a cow: neeyah!!!!!!!!!!!!!!!!!!!!!!
I suppose I was mostly commenting on the junctions part. I'm
proposing that All Junctions Are Created Equal. That is, there is no
specificity measuring on junctions. I also didn't really understand
your right-angle-tree-ratio measure. Does it have a name, and is
there a mathematical reason that you chose it?
Anyway, I think that once we start diving inside expressions to
measure their specificity, we've gotten too complex to be predictable.
Luke
Eek! no.
I think guards (our where closures which I call where clauses) are
enough... =)
If you want to optimize simple where clauses by introspecting their
PIL, that's a different story =)
--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &
It's really nothing more then a metaphor.
method bark ((Dog | Sheep) $inv:) {
}
method bark ((Retreiver & Sheppherd) $inv:) {
}
When I dispatch a 'bark' method on my $uber_dog, I'd like it not to
be a sheep. The way this is weigted is basically '&' is more
particular than '|'.
The tree dimentions thing deals with nested junctions. Basically you
"draw" out the tree on some 2d space, where the root is (0,0), &
combinations are drawn on the Y axis, and | combinations are drawn
on the X axis.
Parent axes are stretched to fit their children's drawings.
Here are two examples:
((Dog & Retriever) | (Sheep & Stupid))
Dog Sheep
| |
+------x------+
| |
Retriever Stupid
((Dog | Sheep) & (Retriever | Stupid))
Dog-----|-----Sheep
|
x
|
Retriever--|-----Stupid
By graphically spanning the tree you can see if it tends to be wide
(ORish) or deep (ANDish). The score of the junction is the ratio of
the length on the x axis, over the length on the y axis.
The bigger the combinator is (the | in the first example, the
& in the second one), the bigger the line in the picture will be,
because the boxes under the lines must fit in the spaces (hence the
right angles bit) between the combinator's line and it's siblings'.
This only showes up clearly in 3 level structures and up:
((Dog & (Retriever | Sheppard)) | Sheep
Dog
|
Sheep------------x------------|
|
|
Retriever--|--Sheppard
((Dog & (Retriever & Sheppard)) | Sheep
Dog
|
Sheep------x-------|
|
_+_
Retriever
|
|
|
Sheppard
(assume for fairness that strings are a block, 1x1, and each line is
1 block thick)..
Of course, you don't need to do that - you just do depth first
traversal, calculate the junction's "dimentions", and push upward.
The eventual junction will either be 'ANDish' or 'ORish', and and
the tendancy of the junctions is their order.
If there are candidates which are too close together for the same
parameter under a single shortname, the user should be warned.
For example, in the stupid sheep/retrieving dog example, you don't
want both definitions, but (Dog & Retriever) is definately more
specific than (Sheep | Dog).
We discussed this a bit on #perl6, and we concluded that my tree
example was stupid. If you have a better explanation, please commit
it to the pugs repo, even if you don't agree with this - just for
the sake of clarity (because I can't explain it any better).
> Anyway, I think that once we start diving inside expressions to
> measure their specificity, we've gotten too complex to be predictable.
I think that may be right, but just for junctions it's very
tempting.
--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &
> http://www.cs.washington.edu/research/projects/cecil/www/Papers/predicate-classes.html
Regardless of MMD, I think this is an interesting concept on it's
own.
classe Moosish does pred:where {
... # a where clause
} {
# class def
}
Does this mean that conflicting signatures assure that only one
'where' clause passes?
the pred trait accepts a higher order type as it's arg, and just
merges it with it's methods' types by hooking the metamodel's
'add_method' method, or whatever it's called (stevan?).
I'm not sure I know how to oppertunistically 'staticize' this,
though.
Interesting paper, although admittedly I only skimmed it.
--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &
/\ kung foo master: /me dodges cabbages like macalypse log N: neeyah!
I'll be glad to work on it, yes, and thanks for sending it.
I would definitely appreciate any help that other p6l folks can provide
in putting these into an appropriate form for the Synopses.
Pm
I would like to point out that for mere mortals, *any* MMD is already too
complex to be predictable. Some people can't even predict SMD. :-/
Regardless of the MMD policy (or range of policies) we allow/enforce,
I think we need to consider what the naive user is to do in the face
of the (to them) black box of MMD. I wonder if there's some way to
annotate a call to say exactly which routine you expect it to call, and
then if MMD dispatches elsewhere, you get some kind of a warning that
tells you exactly why it chose the other routine over your routine.
It doesn't have to dump the whole MMD decision tree on them, but
merely say something like "Argument $foo picked a constrained type
over an unconstrained type". Or "Argument $bark picked Dog with
distance 1 over Mammal with distance 2". Or "Argument $bark picked
'Dog where...' with distance 1-ε over Dog with distance 1".
Unless we can feed more specific information to the naive user
in an easily digestible form, I'm still inclined to say that *any*
constraint just subtracts "epsilon" from the distance, and if you don't
write your constraints to be mutually exclusive within a single file,
and you depend on the dispatch order to distinguish, it's erroneous.
(We could still subtract addtional values of epsilon for later files
to make Yuval happy--or at least less unhappy...)
Actually, a naive user probably doesn't even want to see the epsilons.
We could go as far as to make it:
Argument $bark picked Dog where... with distance 0.99
over Dog with distance 1
Then Yuval's overriding files can be distance 0.98, 0.97, 0.96, etc.
An epsilon of .01 should be small enough for anyone. (Or at least
any engineer.)
The warner should also detect ambiguities in constraints if we make all
contraints in a file the same epsilon. I just showed the warning for
a single argument, but it should probably tell the distance on all
the arguments that differ, and maybe even calculate the overall distance
for them.
Of course, if we make the MMD rules sufficiently complicated, we'll
just have to make the warning spit out a spreadsheet to show the
calculations. Then we hide all that behind an interview process,
just like all our wonderful tax preparation software...
Larry
Epsilons are a bit like handwaiving, in the sense that it's not as
clear cut.
I'd rather they be unexposed for the most part (see below for
exception), but that in general different sets of contraints are in
a total different ballpark of weighting.
I think any ambiguity that is not explicitly resolved from the
caller (by means of disambiguation syntax that is like the
annotation syntax, whatever it may be) should be an error
(preferably compile time, if the type of the value we're dispatching
on is inferred or declared).
> (We could still subtract addtional values of epsilon for later files
> to make Yuval happy--or at least less unhappy...)
;-)
I actually think that definition order is now irrelevant, with the
exact semantics I proposed.
Rob Kinyon had a strong argument (in #perl6) that anything that
depends on load order is bound to make someone's head hurt.
He has a point.
> Then Yuval's overriding files can be distance 0.98, 0.97, 0.96, etc.
> An epsilon of .01 should be small enough for anyone. (Or at least
> any engineer.)
Hmm... that's a useful hack, but I don't think it's much more than a
last-resort type hack. Either way, I will always place line long
comments on exactly why i'm adding that value. I'd rather have my
code document itself, much like functional pattern matching.
> Of course, if we make the MMD rules sufficiently complicated, we'll
> just have to make the warning spit out a spreadsheet to show the
> calculations. Then we hide all that behind an interview process,
> just like all our wonderful tax preparation software...
I think MMD's weakenss in this respect is that for it to be
intuitively useful, it needs just the right amount of complexity.
You don't want to think about numerical ranking, or placement in a
tree of loaded modules which you need to read 10 files to find out
(or maybe that you can't find out at all (perhaps a parrot
disassembler helps in some cases)), and you don't want to think
about rules of a complicated system either.
you'd like, ideally, for 95% of the cases to be intuitive as far as
writing goes (writing either being taking care of functionality or
using it). When writing you usually know what values you're dealing
with (or at least you're resolving to know later). In that case, if
you can make simple rules with some perl that you know, and they are
intuitively ranging from specific to generic in their constraint,
you should be happy. For when you're confused or forgetful and the
system isn't DWIM enough, i guess %4.5 can be dealt with using
errors.
I don't know of a method to take care of %0.5 elegantly, but I think
that encouraging MMD to not be used in order to lower the actual
number that %0.5 is is a bigger mistake.
--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &
/\ kung foo master: MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM: neeyah!
It's not clear to me why we shouldn't *also* have parse-tree level that is
taken to be consistent with whatever the current language is, as long
as we retain enough information to deparse it into the current language,
or to compile down to PIL.
: * The "Hash", "Int", "Str" etc are just roles:
: role Hash[?::returns=Any, ?::shape=Str] {
: }
: implementation classes are known as "PerlHash", "PerlInt" etc.
I thought part of the reason for allowing roles to act as classes via
an anonymous class is to allow both the role and the class in question
to be referred to via name "Hash".
: * Filehandles opens chomped by default; you need to add the `:unchomped` flag
: to turn off chumping.
s/unchomped/newlines/
: my $fh = open 'file'; # autochomp
: my $fh = open 'file', :newlines; # non-autochomp
: $fh.newline = '\r'; # now split on \r
: $str = $fh.readline;
: $str.newline; # \r
:
: for chomp =$fh { ... }
:
though perhaps it would be less confusing with :newline if we renamed
:unchomped to :savenl instead. Or maybe change 'newline' to 'nl'.
In that case, we have to differentiate them a little better than with
just a trailing 's', since open can take either of them. I think I
vote for :newlines and :nl("\r"). (Along with .newlines as boolean
and .nl as string/rule attributes on the handle, and a .nl string/match
attribute on the input string.)
: If .newline is a rule, then its captured variables are made available to the
: calling block as if it has done a matching.
More precisely, If $fh.nl is a rule, then $str = =$fh sets $/. The match
object is also returned as $str.nl.
: * `&prefix:<int>` now always mean the same thing as `&int`. In the symbol table
: it's all stored in the "prefix" category; &int is just a short name way for
: looking it up -- it's just sugar, so you can't rebind it differently.
Might end up actually storing the short form of those and inferring the
'prefix' as needed. Dunno.
: * Constrained types in MMD position, as well as value-based MMDs, are _not_
: resolved in the type-distance phase, but compile into a huge given/when
: loop that accepts the first alternative. So this:
:
: multi sub foo (3) { ... }
: multi sub foo (2..10) { ... }
:
: really means:
:
: multi sub foo ($x where { $_ ~~ 3 }) { ... }
: multi sub foo ($x where { $_ ~~ 2..10 }) { ... }
:
: which compiles two different long names:
:
: # use introspection to get the constraints
: &foo<ANONTYPE_1>
: &foo<ANONTYPE_2>
:
: which really means this, which occurs after the type-based MMD tiebreaking
: phase:
:
: given $x {
: when 3 { &foo<ANONTYPE_1>.goto }
: when 2..10 { &foo<ANONTYPE_2>.goto }
: }
:
: in the type-based phase, any duplicates in MMD is rejected as ambiguous; but
: in the value-based phase, the first conforming one wins.
See subsequent discussion, including the part that hasn't happened yet. :-)
Larry
>Rob Kinyon had a strong argument (in #perl6) that anything that
>depends on load order is bound to make someone's head hurt.
>
>He has a point.
>
>
Especially if one in working in something like mod_perl, and the order
various modules were actually loaded in can vary greatly from the order
they are listed in the source code.
Unless we have every lexical scope keep track of what order *it* thinks
all the MMD methods *should* have been loaded in, which overall feels
very painful.
I thought I've had is whether there should be a "subname" that can be
defined on a given multi, to identify it as distinct from the others,
and not having to type the full signature. Something analougous to
HTTP/HTML # suffixes. One could then use that subname in conjuction with
the short name to refer to a specific method. This could then let a user
easily skip MMD when DWIMmery fails. To be useful, it would need to be
simple syntax. I'll propose forcing "# as comment" to be "\s+# as
comment" (if it isn't already), and have subnames specified as
shortname#subname.
multi method foo#bar (Num x) {...}
multi method foo#fiz (String x) {...}
$y = 42;
$obj.foo#fiz($y); # even though $y looks like a Num
$obj.foo($z); # let MMD sort it out.
It's unclear if
$obj.foo<String>($y);
even works, or should work, even if it does.
It be no means solves all of Yuval's problems, but it would be a handy
workaround to un-multi your calls.
-- Rod Adams
> multi method foo#bar (Num x) {...}
> multi method foo#fiz (String x) {...}
>
> $y = 42;
> $obj.foo#fiz($y); # even though $y looks like a Num
> $obj.foo($z); # let MMD sort it out.
>
Having additional tags might also give us something to hang priority
traits off: "foo#bar is more_specific_than(foo#baz);" might influence
the order of clauses in the implicit given/when block. It feels like
there should be a generalization of operator precidence here (even
thought he two are superficially dis-similar, the looser/tighter concept
appears valid).
I like that =)
--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &
/\ kung foo master: /me has realultimatepower.net: neeyah!!!!!!!!!!!!
Intuitively I'd say $obj.foo(String<$y>) or something like that...
$obj.foo<String> reads like MMD on the return value to me, and in
that case I'd prefer
String<$obj.foo($y)>
or maybe a type is a part of the context? Then we can use C casting
syntax, and it'll actually make sense.
(where { ... })$value
;-)
--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &
/\ kung foo master: /me supports the ASCII Ribbon Campaign: neeyah!!!
On Jul 8, 2005, at 4:25 PM, Dave Whipp wrote:
> Rod Adams wrote:
>
>
>> multi method foo#bar (Num x) {...}
>> multi method foo#fiz (String x) {...}
>> $y = 42;
>> $obj.foo#fiz($y); # even though $y looks like a Num
>> $obj.foo($z); # let MMD sort it out.
>>
Instead of changing the parse rules for #, why not just use a trait?
multi method foo is short_name('bar') {...}
> Having additional tags might also give us something to hang
> priority traits off: "foo#bar is more_specific_than(foo#baz);"
> might influence the order of clauses in the implicit given/when
> block. It feels like there should be a generalization of operator
> precidence here (even thought he two are superficially dis-similar,
> the looser/tighter concept appears valid).
Although I like the idea of reusing this concept, I'm not sure that
it really solves the problem. Fundamentally, we're trying to make
MMD behave intuitively with no programmer effort.
--Dks
> Could we break them out into separate threads so that our poor summarizer doesn't go
> bonkers?
See? That's what specialization/particulation is good for. Thanks
for strengthening my point!
>
> On Jul 8, 2005, at 4:25 PM, Dave Whipp wrote:
>
>> Rod Adams wrote:
>>
>>
>>> multi method foo#bar (Num x) {...}
>>> multi method foo#fiz (String x) {...}
>>> $y = 42;
>>> $obj.foo#fiz($y); # even though $y looks like a Num
>>> $obj.foo($z); # let MMD sort it out.
>>>
>
>
> Instead of changing the parse rules for #, why not just use a trait?
>
> multi method foo is short_name('bar') {...}
I thought about that, but then thought that to become commonplace it was
a bit much to type. I also couldn't come up with a way to call a given
multi that matches on a given attribute, without adding even more
complexity to MMD.
>
>> Having additional tags might also give us something to hang priority
>> traits off: "foo#bar is more_specific_than(foo#baz);" might
>> influence the order of clauses in the implicit given/when block. It
>> feels like there should be a generalization of operator precidence
>> here (even thought he two are superficially dis-similar, the
>> looser/tighter concept appears valid).
>
>
> Although I like the idea of reusing this concept, I'm not sure that
> it really solves the problem. Fundamentally, we're trying to make
> MMD behave intuitively with no programmer effort.
Well, if one views MMD as "a list of methods to try, each with it's own
requirements on it's arguments", then it can completely solve the
problem, along with a method sort function.
1) take all methods the user specified "higher than/lower than/equal to"
out of the mix.
2) sort remaining methods via a standardized function.
3) put all the ones taken out in step 1 back in, where they are requested.
4) scan the methods, in order, for the first that accepts the given
arguments.
5) dispatch to the chosen one in #4
-or-
6) begin AUTOMETHing, etc.
Then all we need is a DWIMish sort function.
Some ideas:
-- longer parameter lists go before shorter ones.
-- if param(n) of one ISA param(n) of another, it goes first.
-- slurpies after non-slurpies
-- a hashkey of the parameter types (for deterministic coin flips)
I'm not committed to what goes into the method sort function, or in what
order, just the concept of it. To me it seems easier to visualize than
distances, etc. If nothing else, it should be easy to explain to users
and programmers.
With the name tagging idea from before, one could then say things like:
multi sub foo#lastresort (*@_) is after(foo#default) {...}
for when the default sort does things incorrectly.
A reasonable extensions of this would be to have a coderef attribute
that determines if a supplied set of arguments is acceptable, rather
than the default check. This is a possible MTOWTDI for the 'where' clauses.
Then again, there are likely several glaring problems with this idea
that I'm just not seeing at the moment.
-- Rod Adams
> Then all we need is a DWIMish sort function.
Think of parameter list shape (slurpiness, arity) as a mold you can
fit stuff into.
Then it becomes a simple matter of reducing the match list to your
compatible subs.
Then see
http://svn.openfoundry.org/pugs/docs/notes/mmd_match_order.txt which
proposes a DWIMish sort function.
--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &
/\ kung foo master: /me has realultimatepower.net: neeyah!!!!!!!!!!!!
> I would like to point out that for mere mortals, *any* MMD is already too
> complex to be predictable.
This is the relevant observation here.
This particular mortal's experience is that more than four variants, involving
parameters from more than two hierarchies makes it nearly impossible to
predict all the consequences of MMD.
That's why Class::Multimethods provides coverage and ambiguity-detection
tools, which I expect Perl 6 will need too.
> Regardless of the MMD policy (or range of policies) we allow/enforce,
> I think we need to consider what the naive user is to do in the face
> of the (to them) black box of MMD. I wonder if there's some way to
> annotate a call to say exactly which routine you expect it to call, and
> then if MMD dispatches elsewhere, you get some kind of a warning that
> tells you exactly why it chose the other routine over your routine.
> It doesn't have to dump the whole MMD decision tree on them, but
> merely say something like "Argument $foo picked a constrained type
> over an unconstrained type". Or "Argument $bark picked Dog with
> distance 1 over Mammal with distance 2". Or "Argument $bark picked
> 'Dog where...' with distance 1-ε over Dog with distance 1".
This is exactly the kind of coverage tools I mentioned above. I think it would
suffice to have a module that provides an <is targeting> trait.
> Unless we can feed more specific information to the naive user
> in an easily digestible form, I'm still inclined to say that *any*
> constraint just subtracts "epsilon" from the distance, and if you don't
> write your constraints to be mutually exclusive within a single file,
> and you depend on the dispatch order to distinguish, it's erroneous.
I very strongly support this approach. Perhaps with the elaboration that each
re-specialization subtracts an additional epsilon. So I could distinguish:
type SingleDigit := Int where [0..9];
type Three := SingleDigit where 3;
multi sub foo(Int n) {...} #1
multi sub foo(SingleDigit n) {...} #2
multi sub foo(Three n) {...} #3
foo(3); # dispatches to #3 (distance = -2ε)
foo(4); # dispatches to #2 (distance = -ε)
foo(43); # dispatches to #1 (distance = 0)
> (We could still subtract addtional values of epsilon for later files
> to make Yuval happy--or at least less unhappy...)
This would make Damian very unhappy as it discriminates against good
development practices like refactoring code into modules.
> We could go as far as to make it:
>
> Argument $bark picked Dog where... with distance 0.99
> over Dog with distance 1
>
> Then Yuval's overriding files can be distance 0.98, 0.97, 0.96, etc.
> An epsilon of .01 should be small enough for anyone. (Or at least
> any engineer.)
Magic numbers are a Really Bad Idea. We managed to avoid them for both
operator precedence and regular MMD. It would be a real shame to introduce
them here.
And I think they're unnecessary. Cumulative infinitesimal epsilons from
cumulative C<where> modifiers does the job just as well, and has the distinct
advantage of not restricting specializations to 99 levels.
> The warner should also detect ambiguities in constraints if we make all
> contraints in a file the same epsilon. I just showed the warning for
> a single argument, but it should probably tell the distance on all
> the arguments that differ, and maybe even calculate the overall distance
> for them.
Again, Class::Multimethods has prior art for this approach. See the
demo.analyse.pl example included in the distribution.
Damian
I guess I have used MMD more than most people in this discussion. Indeed,
having both written Class::Multimethods and supervised a PhD that involved
adding MMD to C++, one might have assumed that I've already "served my
sentence" ;-).
Nevertheless, all that experience has convinced me that the simpler the
dispatch rules, the more usable the resulting MMD mechanism is. That's why
I've consistently advocated uniform Manhattan distance over
"left-most-best-fit". That's why I've always recommended an explicit C<is
default> marker. That's why I was opposed to any particular numerical epsilon
value. That's why I don't favour special treatment for junctions or multiple
inheritance trees.
The goal is always the same: to find a parameter list that most accurately
matches the argument list, taking into account the type generalizations
introduced by inheritance and the type specializations introduced by C<where>
clauses.
So, in my view the MMD mechanism ought to be something like:
1. Gather all visible variants with a compatible number of
parameters (taking into account the requirements of any C<where>
constraints)
2. If there are no such variants, throw a "no such multi" exception
3. Work out the Manhattan distance from the argument list to each
variant's parameter list.
4. If there is a unique minimum, call that variant
5. Otherwise, discard every variant whose Manhattan distance
isn't minimal
5. Work out the degree of specialization of each remaining argument
list (i.e. the total number of C<where> specializations on the
variant's complete set of parameters)
6. If there is a unique maximum, call that variant
7. Otherwise, if there is a compatible variant with an <is default>
trait, call that variant
8. Otherwise, throw an "ambiguous call" exception.
This is a much less dwimmy solution than Yuval's or Luke's, but it has the
advantage that those eight steps reduce to eight words:
Unique least-inherited most-specialized match, or default
which will fit into most people's heads and still DWIM most of the time.
Note that specializations enter into the decision process at two points:
initially, they must be satisfied if the variant is to be considered "viable";
later, they are used as tie-breakers when resolving ambiguities.
Using them to select the initial set of viable candidates is critical. If I have:
multi sub foo (Int $x where { $^x < 10 }) {...}
multi sub foo (Num $x) {...}
then I almost certainly want a call to:
foo(42);
to successfully call the second variant, rather than throwing an exception like:
Can't call multi sub foo (Int $x where { $^x < 10 }) when $x is 42.
Keeping C<where> clauses as the sole "more specialized" marker also gives the
developer *more* control than adding extra rules for junctions would. For
example, someone might prefer to treat all junctions as being equally special:
multi sub Foo(Int&Str) {...}
multi sub foo(Int) {...}
or treat junctions as more special:
multi sub Foo((Int|Str) where Any) {...}
multi sub foo(Int) {...}
or treat junctions as less special:
multi sub Foo(Int|Str) {...}
multi sub foo(Int where Any) {...}
In each case, these variations in "significance" are now explicitly and
consistently marked.
Damian
OK, sorry if I missed this in an earlier discussion. For purposes of
calculating this Manhattan distance, I gather that we're treating lists of N
arguments/parameters as points in N-space. I further assume that the
monoaxial distance between a parameter coördinate and the corresponding
argument coördinate - the distance between two types, where the types are
known to be assignment-compatible - is the number of inheritance steps
between them?
And one more dumb question: why is it that the L[1] metric is superior to
the L[2] metric for this purpose?
The geometric interpretation does bring us into somewhat philosophical
territory. Not that that's anything new on this list. :)
Let me try a concrete example. Suppose that class Answer has subclasses
Animal, Vegetable, and Mineral, with respective subclasses Dog, Potato, and
Diamond. There are two methods named foo in scope, neither overriding the
other. One is declared to take (Animal, Vegetable, Mineral), the other
(Dog, Potato, Answer). Assuming the obvious memberships, which method
should foo(Snoopy, Mr_PotatoHead, HopeDiamond) call? And more importantly,
why do you feel that is the right answer?
According to Damian's metric, we have distances of 0+0+2=2 and 1+1+1=3, so
(Dog, Potato, Answer) is "closer" and would get called.
It doesn't seem to make much practical sense. Multimethods are
generally written to be exclusive of ancestral methods. Ordinary
methods are generally written to be cumulative with ancestral methods.
Larry
I just noticed that our rewrite doesn't quite work unless you rewrite
every "when" clause in the first form to also return $_, since "when"
blocks would escape past the return of the $_. Any form of "leave"
could have the same problem. I think the proper semantics of "but"
are that it ignores any return value however generated and pretends
the topic was returned. In fact, the original closure should probably
be evaluated in void context. So it's doing something more complicated
like:
my $foo = do given Cls.new {
given $_ {
.attr = 1;
}
$_;
}
};
But hey, that just makes the monkey-but sugar seem all the sweeter. :P
Larry
> OK, sorry if I missed this in an earlier discussion. For purposes of
> calculating this Manhattan distance, I gather that we're treating lists of N
> arguments/parameters as points in N-space. I further assume that the
> monoaxial distance between a parameter coördinate and the corresponding
> argument coördinate - the distance between two types, where the types are
> known to be assignment-compatible - is the number of inheritance steps
> between them?
Correct. This is the usual underlying metric, regardless of MMD scheme.
> And one more dumb question: why is it that the L[1] metric is superior to
> the L[2] metric for this purpose?
The use of summed lineal distance (L[1]) rather than RMS distance (L[2])
probably *isn't* superior as a closeness measure. But it's computationally
much simpler (and hence likely to be more efficient), it doesn't suffer from
precision issues in "photo finishes", and it is almost certainly easier for
the average programmer to predict correctly.
That said, I'd have no *particular* objection to an MMD implementation that
used RMS inheritance distance as its metric, provided the dispatch performance
was not appreciably worse.
Damian
>> Unique least-inherited most-specialized match, or default
>
>
> Do I read this correctly as dispatching partly in the class hierarchy
> and partly in the type hierarchy?
Err. The class hierarchy *is* the type hierarchy in Perl 6.
> Or do you mean with 'least-inherited'
> most specific non-where type and with 'most-specialized' the strictest
> where clause?
I mean: least cumulative derivation distance summed over the set of parameter
types, with greatest *number* of parameter specializations as a tie-breaker
(if required).
Damian
> Is there also an answer to the odering versus metric question?
> Why was the former rejected and the latter choosen?
MMD is really just a partitioning of the discrete N-dimensional search space
formed by the class hierarchies of the N parameters of a multimethod/multisub.
A linear summation (or other geometric) metric ensures that the search space
is completely partitioned by whatever variants are defined for a given multi.
A "pure ordering" dispatch scheme generally leaves large regions of the search
space unclaimed by any variant, necessitating the definition of numerous extra
variants solely for the purpose of disambiguation.
This larger number of ambiguous (and therefore undispatchable) argument lists
reduces the utility and forward compatibility of "pure ordering" multiple
dispatch, by forcing the developer to specify a much larger number of
variants, most of which will then have to delegate to a single variant (which
is typically the one a linear metric would have chosen anyway).
There's a strong parallel with allowing shift-reduce ambiguities in LR
parsers. You *could* make the developer explicitly cover all the possible
interpretations of a given sequence of symbols, but only at the cost of
exponentially increasing the number of grammar rules being required.
Similarly, since the number of potential variants is the Cartesian product of
the total sizes of the class hierarch(y|ies) for each parameter position,
getting adequate coverage of the MMD search space quickly becomes tedious if
most of the search space is (by default) ambiguous, as in "pure ordering"
dispatch schemes.
One very common problem with pure ordering schemes is that subsequent changes
in one or more class hierarchies can cause previously unambiguous cases to
become ambiguous, by extending a zone of ambiguity in the search space. In
contrast, because a metric approach always fully partitions the entire search
space, hierarchy changes may alter where a particular call dispatches to, but
only ever to a "closer", more appropriate variant.
Note too that Perl 6 *will* still support a form of "pure ordered" dispatch--a
left-most-closest-match scheme like that used by CLOS--via "invocant groups":
multi sub Foo(A: B: C:) {...}
multi sub Foo(A: D: C:) {...}
multi sub Foo(F: B: G:) {...}
This, of course, is not "global pure ordering", but rather "left-biased pure
ordering".
To summarize: pure ordering renders more of the MMD search space "ambiguous",
which is potentially safer but much less DWIMish. Metric schemes have far
fewer ambiguities and usually dispatch in an predictable way. Metric schemes
can still provide pure-ordering analyses via static analysis tools or special
warning modes, but pure ordering schemes can't avoid ambiguities or DWIM.
Of course, none of this prevents:
use MMD <pure>;
Damian
> My understanding is that a Role is an abstract (i.e. cannot be
> instantiated) blob of methods and, possibly, data. The purpose of a
> Role is to paste it into a class--in other words, a Role is a not a
> type, it is a part of a type.
Nope, it's a type. What is a type besides a named blob of methods and,
possibly, data?
> The point of this is to force the programmer to give a reasonable
> constraint...when they expect something that Wags, does it need to be
> a Dog, or will any Animal that can Wag be sufficient? Or are they
> really looking for absolutely anything that is capable of Wagging?
The latter.
As soon as you require signatures to enforce *how* the arguments fulfill
a type constraint, you've violated a principle design goal of roles,
namely that it's more important that something fulfills the requirements
of type ("RobotDog understands bark() in the context of Doglike, not
Treelike") than that it fulfills the requirements of the type by
inheriting from a base class, delegating to a contained object, applying
a role, or whatever other way you can imagine.
-- c
My understanding is that a Role is an abstract (i.e. cannot be
instantiated) blob of methods and, possibly, data. The purpose of a
Role is to paste it into a class--in other words, a Role is a not a
type, it is a part of a type. Assuming that that understanding is
correct, then I would expect the above multi declarations to be a
compile time error: 'multi (Wag $x)' means "expect some data, call
it $x, which will be of type Wag"...but Wag isn't a type, it's a
partial type.
I would require that declaration to be written as one of the following:
multi (Dog does Bark $x) # Dog is an actual type that can
be instantiated
multi (Animal does Bark $x) # Animal is an abstract base class
multi (Object does Bark $x) # Object (or whatever the name
is) is the root of the entire class/type hierarchy
I could see stretching a point and allowing the last one to be
rewritten as:
multi (does Wag $x) # Still references root type, but implicitly
The point of this is to force the programmer to give a reasonable
constraint...when they expect something that Wags, does it need to be
a Dog, or will any Animal that can Wag be sufficient? Or are they
really looking for absolutely anything that is capable of Wagging?
Doing it this way would (I think) be both more intuitive and would
resolve a lot of the ambiguity that was discussed farther on.
multi (does Bark $x)
multi (Dog does Bark $x) # Dog is more specific than "any
data type"
---new example
multi (Tree does Bark $x)
multi (Dog does Bark $x) # Equally specialized classes in
separate parts of the hierarchy. If passed an "Object does Bark"
param, this should be an error.
---new example
multi (Labrador does Bark $x) # Labrador is a subclass of Dog
and therefore more specific if passed a Labrador param
multi (Dog does Bark $x) # This catches any other 'Dog'
params
multi (does Bark $x) # This catches anything else
that Barks, including 'Dingo is Dog does not Bark'
> The interesting question is whether N should
> be 0 or 1 (or possibly epsilon). If we have
>
> multi (Bark $x)
> multi (Dog $x)
>
> arguably Dog is more specialized than Bark,
> [...]
> The problem with N == epsilon is in visualizing how Dog is more
> constrained
> than Bark.
Assuming that the declaration of Dog is 'class Dog does Bark', then
yes, I think Dog should be considered more specialized than Bark--
lots of things might Bark, but only a Dog will bark in precisely this
way. Or, to put it a different way, a 'Dog' class has a certain
specificity and a 'Bark' Role has a certain specificity: the
combination of the two produces something more specific than either.
(And, of course, 'Dog does Bark does Wag' (or 'Dog does Bark&Wag' if
you prefer) is yet more specific.)
--Dks
Implicit is that role distance is N + the distance to the nearest
class that incorporates that role for small values of N.
If class Dog does role Bark and also does role Wag, then passing a
Dog to
multi (Bark $x)
multi (Wag $x)
should result in ambiguity. The interesting question is whether N should
be 0 or 1 (or possibly epsilon). If we have
multi (Bark $x)
multi (Dog $x)
arguably Dog is more specialized than Bark, which says N maybe
shouldn't be 0. Whether it should be epsilon or 1 depends on how
you think of role composition, and whether you think of roles more
like immediate parent classes or generic paste-ins. You can think
of them as immediate parent classes, but in that case they're actually
closer than a real parent class that would have distance 1, since
role methods override parent class methods. That argues for N < 1.
The problem with N == epsilon is in visualizing how Dog is more constrained
than Bark. It is constrained, but its constraint is that a class is
required to be about a real object (or a fake real object in the case
of the Class object), whereas a role is not so constrained. This is
inside out from the view that "a role is a class that isn't allowed
to create an object", but while that's stated as a constraint, it's
a constraint in the service of more generality rather than more
specificity. So it's more accurate to say something like "a class is
a role that is constrained to deal with concrete objects". It's
sort of a "where exists" constraint.
So I could be happy with either 0 or epsilon. 0 is simpler, and fits
better with the role flattening world view. Epsilon has the additional
problem that it's, er, additive rather than subtractive in this case.
If we say Bark where Wag, it's simultaneously more specialized and
less specialized. I think epsilons should be restricted to explicit
"where" counting, though I'm liberal on the subject of whether those
explicit "where" mentions are in the actual signature or in a subtype
declaration somewhere. I suspect these should be equivalent distance
in where-space:
sub ( $x of Int where { $_ % 1 } )
subtype Odd of Int where { $_ % 1 }
sub { Odd $x }
I suppose a case could be made that a Bark could be considered more
constrained than a Dog, which would solve the addition/subtraction
problem, but it still violates the explicit-where dictum. So I'm still
in favor of 0 distance for role composition.
Note, however, that run-time mixins can increase distance by one if
they have to create an anonymous class. Additional mixins may or
may not increase the distance depending on whether the mixin mechanism
creates additional anonymous classes or just adds more roles to
the existing anonymous class. The former approach is probably cleaner,
and we don't really need the latter mechanism to compose multiple
mixins into an anonymous class simultaneously, assuming we can stack
$mammal does Bark does Wag;
and get some kind of list-associative simultaneous precedence. Or maybe
that should be spelled
$mammal does Bark&Wag;
Larry
> On Wed, 2005-07-13 at 16:07 -0400, David Storrs wrote:
>
>
>> My understanding is that a Role is an abstract (i.e. cannot be
>> instantiated) blob of methods and, possibly, data. The purpose of a
>> Role is to paste it into a class--in other words, a Role is a not a
>> type, it is a part of a type.
>>
>
> Nope, it's a type. What is a type besides a named blob of methods
> and,
> possibly, data?
A label that says how the data is stored internally. For example,
compare "Int" and "int". The former is a type and a blob of methods
and not necessarily abstract. The second is a type, is NOT a blob of
methods, and is definitely NOT abstract.
That was part of why I was careful to say "an ***abstract**** [...]
blob of methods...." (Emphasis added.)
>> The point of this is to force the programmer to give a reasonable
>> constraint...when they expect something that Wags, does it need to be
>> a Dog, or will any Animal that can Wag be sufficient? Or are they
>> really looking for absolutely anything that is capable of Wagging?
>>
>
> The latter.
Fine. Make them say that, then. Making people think about what they
really want is a good idea.
> As soon as you require signatures to enforce *how* the arguments
> fulfill
> a type constraint, you've violated a principle design goal of roles,
> namely that it's more important that something fulfills the
> requirements
> of type ("RobotDog understands bark() in the context of Doglike, not
> Treelike") than that it fulfills the requirements of the type by
> inheriting from a base class, delegating to a contained object,
> applying
> a role, or whatever other way you can imagine.
That's a reasonable point. However, I wasn't deliberately using
'does Bark' to mean 'has been composed with the role Bark'. I was
using it to mean "is able to perform the interface specified by
Bark". So, if you prefer, I'll invent a new term: isableto. (Which
I think is a lousy choice for the term, but I don't care what color
that bike shed is.)
Note that there are two subtly different kinds of type constraints
under discussion: the first is what class something is, and the
second is what behavior it can exhibit. The two are very slightly
different:
multi foo(Dog $x); # Must be of class
Dog, or a class that is a descendant of Dog
multi foo(Dog $x where { not $x.feral }); # Must be of
anonymous class "class Dog (or descendant) with $.feral set to false"
multi foo(Dog isableto Bark); # Must be of class
Dog (or descendant), and must be able to Bark
multi foo(Object isableto Bark); # Absolutely any
object so long as it can Bark...how it does that is up for grabs,
though.
multi foo( isableto Bark); # Same as previous
The distinction I'm drawing is between pure behavior and behavior
+state. I'm drawing this based on the idea that Roles are not
allowed to have state--that they are pure virtuals, only used for
defining interface. If that is not true, then (A) I apologize for
the wasted bandwidth and (B) I'd like to have it explained what Roles
offer that justifies their existence, since they won't be anything
but a restricted form of a class.
--Dks
> > What is a type besides a named blob of methods
> > and, possibly, data?
> A label that says how the data is stored internally. For example,
> compare "Int" and "int". The former is a type and a blob of methods
> and not necessarily abstract. The second is a type, is NOT a blob of
> methods, and is definitely NOT abstract.
I contend that that's meaningless to Perl. Why does the internal
representation of data matter if the interface is the same? In Perl 5
terms, should you not be able to pass a tied hash to a module that
expects a hash because the internal implementation may be very
different?
If yes, then we have very different ideas about types and signatures and
I'm not sure we can reconcile our views.
I realize that it could appear that I've shifted the example to
container types away from your argument about value types, but I don't
think it's an important distinction here.
> Making people think about what they really want is a good idea.
I agree, but only as far as what they really want is actually what they
really want. To put it another way, if you have a method that prints
something passed in, do you want a signature that says "This must be a
string, perhaps with a specified encoding and internal representation"
or a signature that says "This must be something that stringifies"?
I want the latter.
> Note that there are two subtly different kinds of type constraints
> under discussion: the first is what class something is, and the
> second is what behavior it can exhibit.
I think the former is silly, almost always useless, and nearly always
wrong, especially in library code.
> The two are very slightly different:
>
> multi foo(Dog $x); # Must be of class
> Dog, or a class that is a descendant of Dog
No, I think it must conform to the type of Dog, whether it is an
instance of the Dog class, an instance of a subclass of Dog, or
something that performs the Dog role.
Why does its internal implementation matter? What are you doing inside
foo() that its internals matter?
> multi foo(Dog $x where { not $x.feral }); # Must be of
> anonymous class "class Dog (or descendant) with $.feral set to false"
Again, I think it must conform to the type of Dog, but with the
additional constraint.
> The distinction I'm drawing is between pure behavior and behavior
> +state. I'm drawing this based on the idea that Roles are not
> allowed to have state--that they are pure virtuals, only used for
> defining interface. If that is not true, then (A) I apologize for
> the wasted bandwidth and (B) I'd like to have it explained what Roles
> offer that justifies their existence, since they won't be anything
> but a restricted form of a class.
Roles do have state.
They exist because:
1) hierarchies aren't the only way to express type relationships -- nor
are they often even a good way
2) conformity to an interface is more important than uniformity of
internal representation, especially in a language with late binding
3) non-hierarchical code-reuse is possible -- and very useful
4) making no distinction between class, role, and type names makes
genericity and polymorphism much easier and much more powerful
Aside from optimization and introspection, why is it useful for a method
signature to say "This has to be a Dog or something that inherits from
Dog"?
-- c
Please check your assumptions.
In addition to what chromatic said, I'd like to point out that you've
got the abstraction levels backwards by my lights: these days I
tend to think of the class as a restricted form of role. A class is
restricted to having to provide a working interface to real objects.
A role doesn't have to do that--it's just a semantic slice of interface
and maybe some handy default behavior. It's a bit of generic code
that doesn't have to make complete sense in isolation, as long as some
real class can make some sense of it when combined with other things.
Looking at it another way, roles are attempting to provide the
stable semantic currency that other languages try to provide with
final classes. Because roles are composed rather than inherited,
they avoid many of the problems that final classes engender in such
languages. Class finalization (as instigated by the class itself)
will someday be viewed as a tremendous botch, I think.
Larry
> On Wed, 2005-07-13 at 17:33 -0400, David Storrs wrote:
>
>
>>> What is a type besides a named blob of methods
>>> and, possibly, data?
>>>
>
>
>> A label that says how the data is stored internally. For example,
>> compare "Int" and "int". The former is a type and a blob of methods
>> and not necessarily abstract. The second is a type, is NOT a blob of
>> methods, and is definitely NOT abstract.
>>
>
> I contend that that's meaningless to Perl.
Um...huh? Sorry, I'm not trying to be sarcastic, but I honestly
don't understand how you can think that. There are things that you
can do with an Int that you cannot do with an int, so how can this be
a meaningless distinction? Or did I miss a decision saying that
'int' was just a request, and that 'int' data could freely be
autoboxed into an Int if necessary? That's quite possible...I
haven't reread the AESs in a long time and things change often enough
that I admit to not having a solid grip on the current state.
> Why does the internal
> representation of data matter if the interface is the same?
It shouldn't, I agree. My point was that the interface to Int (the
object) is not the same as the interface to int (the low-level
representation). For example, last I heard this was one of the
differences:
Int foo = 0 but true; # no problem
int foo = 0 but true; # problem! nowhere to store the trait
>> Making people think about what they really want is a good idea.
>>
>
> I agree, but only as far as what they really want is actually what
> they
> really want. To put it another way, if you have a method that prints
> something passed in, do you want a signature that says "This must be a
> string, perhaps with a specified encoding and internal representation"
> or a signature that says "This must be something that stringifies"?
>
> I want the latter.
Excellent, so do I. Our views are not that far apart after all.
>> The two are very slightly different:
>>
>> multi foo(Dog $x); # Must be of class
>> Dog, or a class that is a descendant of Dog
>>
>
> No, I think it must conform to the type of Dog, whether it is an
> instance of the Dog class, an instance of a subclass of Dog, or
> something that performs the Dog role.
Ok, but you're begging the question. I'm trying to illustrate my
definition of the word "type"; it doesn't help for you to say 'it
must conform to the type...' when 'type' is what I'm trying to define.
In my mind, 'class' is one component that goes into making up a
type. So is 'where clause'. So is 'roles that have been composited
in'. Etc. A 'type', in my mind anyway, is a promise about the
interface and state (*) that something will provide. Here are some
examples (illustrative, not definitive):
int $a # 4 bytes (on a 32-bit system), can't store traits
Int $a # Can do everything 'int' can do, can also store traits
Int $a where { 0 < $a < 7 } # Can do everything Int can do
but value is between 0 and 7 exclusive
I would contend that that list contains 3 types but only one class
(maybe two if you interpret the where clause as creating a new class).
Does this seem like a reasonable definition of 'type' to you?
Because I think we need to make sure we're using the same terms
before we continue this.
(*) I hestitated on whether to include the 'and state' clause there,
but finally decided to leave it. In that clause, I am only referring
to state that is either publicly visible (and therefore, arguably
part of the interface), or that governs (and therefore constrains)
the behavior of the interface. I am not including internal-only state.
Taking your final question out of order:
> Aside from optimization and introspection, why is it useful for a
> method
> signature to say "This has to be a Dog or something that inherits from
> Dog"?
Because it is an assertion that 'This' will be something that behaves
like a Dog, and has the exposed state that I expect to see from a
Dog. In other words, I agree with you--it's all about the
interface. The point I'm trying to make is that, from the
recipient's point of view, all of these statements are exactly the same:
implements the interface of Dog
=== is of class Dog
=== has been composited with the Role Dog
Given that, why do we need the second version? (Don't say
'TIMTOWTDI'. Extra WTDI are only useful if they have different
nuances.)
> Roles do have state.
Ok, I didn't realize that. My bad.
> They exist because:
>
> 1) hierarchies aren't the only way to express type relationships --
> nor
> are they often even a good way
>
> 2) conformity to an interface is more important than uniformity of
> internal representation, especially in a language with late binding
>
> 3) non-hierarchical code-reuse is possible -- and very useful
>
> 4) making no distinction between class, role, and type names makes
> genericity and polymorphism much easier and much more powerful
I agree with all these points. I still don't understand why Roles
need to be a separate thing, with a separate name and separate
semantics. Their purpose is simply to define an interface so that
you can do two things:
1) Make an object/class/etc that implements that interface and
advertises that it does so
2) Constrain a parameter by requiring its argument to implement
that interface
So, why does it matter whether we call the thing that implements that
interface a "Role" or an "interface" or a "class"? Why not call it a
"class" and get rid of the extra term, free up 'does' to be a
constraining term, etc?
--Dks
> I just want to hint the Perl6 community to the fact that there exists
> a US patent on geometric MMD:
Well, fortunately it's really just a patent on the specific combination of a
mathematical isomorphism and a well-known geometric algorithm to *optimize*
method dispatch, rather than on the Manhattan metric itself. So I very much
doubt there's a problem.
If MMD optimization does become an issue, there are plenty of other
non-patented algorithms available. Besides which, even without a highly
optimized dispatch mechanism you can get asymptotically close to the same
performance by jitting an MMD look-up table on-the-fly (possibly with some
table-compression where large hierarchies or many parameters are involved).
Damian
On Jul 13, 2005, at 7:32 PM, Larry Wall wrote:
> A class is
> restricted to having to provide a working interface to real objects.
Can't there be pure-abstract, non-instantiable classes? Or are you
still considering those to be interfaces to "real objects"? I
suppose, arguably, the class pseudo-object could make that work, but
it seems a sleight-of-hand.
> In addition to what chromatic said, I'd like to point out that you've
> got the abstraction levels backwards by my lights: these days I
> tend to think of the class as a restricted form of role.
Ok, fine, then let's get rid of classes and just use roles. Right
now, it sounds like the functionality of one of them is a proper
subset of the functionality of the other. If that's the case, why do
we need both? Or is there something that one of them can do that the
other cannot and that absolutely cannot be rolled into the other
one? (I hope that made sense.)
--Dks
Can I present an alternative way of viewing them, which I don't think
contradicts with what I've understood of them so far from the
Apocalypses and Synopses documents.
First a couple of definitions;
A "runtime class" is a package name, and a collection of functions
that form a "dispatch table". There are actually two tables - one
for private and one for public methods. The public table links to
"superclasses" where further dispatch may occur.
A "method" on a Role or a Class is a function that takes two
implicit arguments; a dispatch table for private method lookups, and
one for public methods lookups. The private dispatch table is bound
when the function is added to a runtime class, but the public
dispatch table is bound as late as possible (or until the class is
closed in some situations).
Here's the meat:
A Class is then a Role that gets its runtime class created, and all
its methods' private dispatch tables bound at declaration time.
A Role, on the other hand, leaves methods that can't be called,
until they are bound into a "runtime class". So they _look_ like
they are flattened as the runtime classes are composed, unless you
are introspecting to the sufficient level.
The "runtime class" would be the ::Type objects, and the real Class and
Role objects what you get from the .meta objects.
This might be a half empty / half full thing, I just thought you might
find that description interesting. Note that I don't deal with "state",
I'm just treating attributes as if all they are is a set of accessor
functions, which store in an unspecified&unimportant location.
Sam.
Damian writes:
> Similarly, since the number of potential variants is the Cartesian product of
> the total sizes of the class hierarch(y|ies) for each parameter position,
> getting adequate coverage of the MMD search space quickly becomes tedious if
> most of the search space is (by default) ambiguous, as in "pure ordering"
> dispatch schemes.
Indeed, pure MMD will be ambiguous in more cases. If you think
narrowly, then it causes you to write many disambiguating cases which
*usually* end up being what a manhattan metric would give you anyway.
But you can get away from that terrible duplication by being more
precise about your types. You can define type classes using empty
roles[1], where you simply organize your type dag into abstractions
that make sense to your dispatcher. The upshot of all this is that it
forces you to think in a way so that the "*usually*" above becomes an
"always". And it involves defining more (very small) types that make
sense to humans, not gratuitously many MMD variants which are pretty
hard to think about.
> One very common problem with pure ordering schemes is that subsequent changes
> in one or more class hierarchies can cause previously unambiguous cases to
> become ambiguous, by extending a zone of ambiguity in the search space.
> In
> contrast, because a metric approach always fully partitions the entire search
> space, hierarchy changes may alter where a particular call dispatches to, but
> only ever to a "closer", more appropriate variant.
You just made my primary argument against manhattan distance for me.
If you change something in the middle of the class hierarchy,
manhattan distance causes multimethods on the leaves to change
semantics. You say that pure MMD causes them to break when a change
occurs. Isn't that better than changing? Presumably you ran it
through the ambiguity checker and a test suite and got it right once,
and then you have to do it again when you refactor. This comes with
the meaningless "unit derivation" that a metric scheme defines.
Perhaps I've made this argument before, but let me just ask a
question: if B derives from A, C derives from A, and D derives from
C, is it sensible to say that D is "more derived" from A than B is?
Now consider the following definitions:
class A { }
class B is A {
method foo () { 1 }
method bar () { 2 }
method baz () { 3 }
}
class C is A {
method foo () { 1 }
}
class D is C {
method bar () { 2 }
}
Now it looks like B is more derived than D is. But that is, of
course, impossible to tell. Basically I'm saying that you can't tell
the relative relationship of D and B when talking about A. They're
both derived by some "amount" that is impossible for a compiler to
detect. What you *can* say is that D is more derived than C.
In conclusion, the reason that manhattan distance scares me so, and
the reason that I'm not satisfied with "use mmd 'pure'" is that for
the builtins that heavily use MMD, we require *precision rather than
dwimmyness*. A module author who /inserts/ a type in the standard
hierarchy can change the semantics of things that aren't aware that
that type even exists. If you're going to go messing with the
standard types, you'd better be clear about your abstractions, and if
you're not, the program deserves to die, not "dwim" around it.
Oh, and the mmd style should probably look like:
multi foo (...) is mmd<pure> {...}
multi bar (...) is mmd<manhattan> {...}
Rather than a pragma.
> Note too that Perl 6 *will* still support a form of "pure ordered" dispatch--a
> left-most-closest-match scheme like that used by CLOS--via "invocant groups":
>
> multi sub Foo(A: B: C:) {...}
> multi sub Foo(A: D: C:) {...}
> multi sub Foo(F: B: G:) {...}
>
> This, of course, is not "global pure ordering", but rather "left-biased pure
> ordering".
>
> To summarize: pure ordering renders more of the MMD search space "ambiguous",
> which is potentially safer but much less DWIMish. Metric schemes have far
> fewer ambiguities and usually dispatch in an predictable way. Metric schemes
> can still provide pure-ordering analyses via static analysis tools or special
> warning modes, but pure ordering schemes can't avoid ambiguities or DWIM.
>
> Of course, none of this prevents:
>
> use MMD <pure>;
>
>
> Damian
Luke
[1] And I find this to be useful even when using manhattan distance.
I'd like to be able to define such type classes out-of-band, like:
role Foo
defines Bar # Bar does Foo now
defines Baz # Baz does Foo now
{ }
Private and public methods are probably stored in a single table, but
the private methods just happen to have a name starting with :.
: The public table links to
: "superclasses" where further dispatch may occur.
Er, the concept of superclasses is not hidden in the dispatch table.
Different dispatchers may have different ideas about method ordering,
so the ISA information is really just passive trait data that must
be interpreted by cooperation between dispatchers and meta classes.
As such it's really a separate datum than the list of available methods.
(Certainly, a given dispatcher might want to cache parental links in
the current package's symbol table, but that's an optimization, and
depends on the particular dispatcher's semantics.)
: A "method" on a Role or a Class is a function that takes two
: implicit arguments; a dispatch table for private method lookups, and
: one for public methods lookups. The private dispatch table is bound
: when the function is added to a runtime class, but the public
: dispatch table is bound as late as possible (or until the class is
: closed in some situations).
I think you must be using "private" to mean something different than A12,
but that's okay as long as we realize that. I'd prefer to couch things
in terms of instantiated and uninstantiated generics, but we're probably
just talking about the same things with different terms.
: Here's the meat:
:
: A Class is then a Role that gets its runtime class created, and all
: its methods' private dispatch tables bound at declaration time.
I'd say the generic methods are forced to instantiate with either
the default implementation provided by the role or the overriding
implementation provided by the class, or by some combination of them
if the class's method wants to delegate to the role's method.
: A Role, on the other hand, leaves methods that can't be called,
: until they are bound into a "runtime class". So they _look_ like
: they are flattened as the runtime classes are composed, unless you
: are introspecting to the sufficient level.
:
: The "runtime class" would be the ::Type objects, and the real Class and
: Role objects what you get from the .meta objects.
Which you consider "realer" depends on whether you're a Platonist
or an Aristotelian, but okay. I would say your "runtime class"
is the representative proxy for all objects of its type, while the
meta classes are in charge of the dirty work. So who's "realer", the
Senator who stands in front and smiles, or the Senator's aides who
stand behind and frown? :-)
: This might be a half empty / half full thing, I just thought you might
: find that description interesting. Note that I don't deal with "state",
: I'm just treating attributes as if all they are is a set of accessor
: functions, which store in an unspecified&unimportant location.
We're just basically assuming a role can behave like a parent class to
the extent necessary to manage its state on behalf of the real object,
so for instance it can have its own BUILD submethod that somehow
magically gets incorporated into the object's own BUILD process.
But politically the role is just that of the biological parent, not
the guardian parent. You could almost think of a collection of roles
as sort of a sperm and egg bank.
Well, that's enough metaphors for today, at least till next week.
Larry
> Thanks for your very detailed explanation of your views on the Pure
> MMD scheme, Damian. I finally understand why you're opposed to it. I
> could never really buy your previous argument: "Manhattan distance is
> better".
That was never my *argument*, merely my statement-of-position. ;-)
> Indeed, pure MMD will be ambiguous in more cases. If you think
> narrowly, then it causes you to write many disambiguating cases which
> *usually* end up being what a manhattan metric would give you anyway.
> But you can get away from that terrible duplication by being more
> precise about your types. You can define type classes using empty
> roles[1], where you simply organize your type dag into abstractions
> that make sense to your dispatcher. The upshot of all this is that it
> forces you to think in a way so that the "*usually*" above becomes an
> "always". And it involves defining more (very small) types that make
> sense to humans, not gratuitously many MMD variants which are pretty
> hard to think about.
I don't see how this is a big win. You're still be forced to explicitly
disambiguate. Either by defining extra variants (which is tedious), or by
defining extra types (which is subtle).
> You just made my primary argument against manhattan distance for me.
> If you change something in the middle of the class hierarchy,
> manhattan distance causes multimethods on the leaves to change
> semantics. You say that pure MMD causes them to break when a change
> occurs. Isn't that better than changing?
No. If the dispatch behaviour changes under a Manhattan metric, then it only
ever changes to a more specific variant. Since MMD is all about choosing the
most specific variant, that's entirely appropriate, in the light of the new
information. If you change type relationships, any semantics based on type
relationships must naturally change.
On the other hand, under a pure ordering scheme, if you change type
relationships, any semantics based on type relationships immediately *break*.
That's not a graceful response to additional information. Under pure ordering,
if you make a change to the type hierarchy, you have to *keep* making changes
until the dispatch semantics stabilize again. For most developers that will
mean a bout of "evolutionary programming", where they try adding extra types
in a semi-random fashion until they seem to get the result they want. :-(
> Perhaps I've made this argument before, but let me just ask a
> question: if B derives from A, C derives from A, and D derives from
> C, is it sensible to say that D is "more derived" from A than B is?
> Now consider the following definitions:
>
> class A { }
> class B is A {
> method foo () { 1 }
> method bar () { 2 }
> method baz () { 3 }
> }
> class C is A {
> method foo () { 1 }
> }
> class D is C {
> method bar () { 2 }
> }
>
> Now it looks like B is more derived than D is. But that is, of
> course, impossible to tell. Basically I'm saying that you can't tell
> the relative relationship of D and B when talking about A. They're
> both derived by some "amount" that is impossible for a compiler to
> detect. What you *can* say is that D is more derived than C.
Huh. I don't understand this at all.
In MMD you have an argument of a given type and you're trying to find the most
specifically compatible parameter. That means you only ever look upwards in a
hierarchy. If your argument is of type D, then you can unequivocally say that
C is more compatible than A (because they share more common components), and
you can also say that B is not compatible at all. The relative derivation
distances of B and D *never* matter since they can never be in competition,
when viewed from the perspective of a particular argument.
What we're really talking about here is how do we *combine* the compatibility
measures of two or more arguments to determine the best overall fit. Pure
Ordering does it in a "take my bat and go home" manner, Manhattan distance
does it by weighing all arguments equally.
> In conclusion, the reason that manhattan distance scares me so, and
> the reason that I'm not satisfied with "use mmd 'pure'" is that for
> the builtins that heavily use MMD, we require *precision rather than
> dwimmyness*. A module author who /inserts/ a type in the standard
> hierarchy can change the semantics of things that aren't aware that
> that type even exists. If you're going to go messing with the
> standard types, you'd better be clear about your abstractions, and if
> you're not, the program deserves to die, not "dwim" around it.
That *might* be an argument that builtins ought to do "pure ordering"
dispatch, but it isn't an argument in the more general case. Most people won't
be futzing around with the standard type hierarchy, except at the leaves,
where it doesn't matter. Most people will be using MMD for applications
development, on their own type hierarchies. Someone who's (for example)
building an image processing library wants to be able to add/merge/mask any
two image types using the most specific available method. Adding a new image
type (or supertype) might change which method is most specific, but won't
change the image processor's desire. Indeed, changing the dispatch to a now
more-specific method *is* the image processor's desire.
So, in counter-conclusion, Pure Ordering MMD maximizes ambiguity and thereby
imposes extra development costs to achieve (generally) the same effect as
Manhattan MMD achieves automatically. Both schemes can be made to detect
ambiguities, and under both schemes those ambiguities can be resolved by the
judicious addition of either variants or interim classes. However, by default,
when POMMD detects an ambiguity, it just gives up and makes that ambiguity
fatal, whereas MMMD always automatically resolves ambiguities in a plausible
manner. I know which approach I'd prefer.
> Oh, and the mmd style should probably look like:
>
> multi foo (...) is mmd<pure> {...}
> multi bar (...) is mmd<manhattan> {...}
>
> Rather than a pragma.
The pragma would merely set the default, from which traits could cause
particular multis to vary.
> [1] And I find this to be useful even when using manhattan distance.
> I'd like to be able to define such type classes out-of-band, like:
>
> role Foo
> defines Bar # Bar does Foo now
> defines Baz # Baz does Foo now
> { }
You *really* think action-at-a-distance reverse composition is a good idea?
Seems like a maintenance nightmare to me.
Damian
> I would think that the programmer specifies what type a class
> implements by letting it do a set of roles. This implies that
> by default a class does the very unspecific Any.
Why would a class not also define a type?
-- c
> The use of summed lineal distance (L[1]) rather than RMS distance (L[2])
> probably *isn't* superior as a closeness measure. But it's computationally
> much simpler (and hence likely to be more efficient), it doesn't suffer
> from precision issues in "photo finishes", and it is almost certainly
> easier for the average programmer to predict correctly.
I dispute "much", specifically because the sum of the squares has the same
ordering as the square root of the sum of the squares.
(So we're talking only about 1 extra multiply per class under consideration,
and a possible overflow or need for floating point, rather than the more
direct assumption that ordering by RMS would need a square root evaluated)
But I have no opinion on the underlying merits of L[1] versus L[2], because
I have no experience of MMD.
> That said, I'd have no *particular* objection to an MMD implementation that
> used RMS inheritance distance as its metric, provided the dispatch
> performance was not appreciably worse.
Maybe I should cut that paragraph, to avoid re-opening a debate.
Nicholas Clark