Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Type Conversion Matrix (TAKE 3)

10 views
Skip to first unread message

Michael Lazzaro

unread,
Apr 28, 2003, 1:36:44 PM4/28/03
to perl6-l...@perl.org

Some changes, and comments re: possible approaches.

+: Automatic conversion, conversion is not lossy
*: undefness and properties will be lost
N: numeric range or precision may be lost [1]
F: numeric (float) conversion -- conversion to int is lossy
S: string conversion -- if string is not *entirely* numeric, is lossy
B: boolean conversion -- loses all but true/false
J: junction type; coercing to non-junction type may be lossy


FROM -> str Str int Int num Num bit Bit bool Bool Scalar
TO: str - * + * + * + * + * *J
Str + - + + + + + + + + J
int S *S - *N F *NF + * + * *J
Int S S + - F F + + + + J
num S *S + * - *N + * + * *J
Num S S + + + - + + + + J
bit B *B B *B B *B - * + * *J
Bit B B B B B B + - + + J
bool B *B B *B B *B + * - * *J
Bool B B B B B B + + + - J
Scalar + + + + + + + + + + -

Comments:

I'm not a big fan of having pragmas to deal with every possible
conversion; I'm not sure it's necessary. There's really only several
types of conversions there, and many of them are easy to deal with.
For the sake of argument:

*: (undefness and properties lost)

I'd argue these -- being able to convert an uppercase type to a
lowercase (primitive) type -- should always be allowed. If you're
sending an Int to something that requires an C<int>, you know that the
'something' can't deal with the undef case anyway -- it doesn't
differentiate between undef and zero. Thus, you meant to do that: it's
an "intentionally destructive" narrowing.

J: (scalar junctive to typed scalar)

Same thing here... you're specifically asking for only one aspect of
the scalar, so one could presume you meant to do that to. So always
allowed.

S: (string to numeric)

This is a trickier one, because the string _may_ not be entirely
numeric. In P5, a string like '1234foo' numifies to 1234, but you
don't always want that ... sometimes, you want to throw an error if the
string isn't 'cleanly' a number. So I'd argue this requires a pragma,
for all types of string --> numeric conversion, selecting between
"strict" and "non-strict" string conversions.

B: (conversion to boolean)

For things like int --> bit or num --> bit conversions, one can argue
that it's not a big deal, and those should _always_ be allowed. After
all, if you've specified those types, it's because you only care about
1/0 or true/false. So always allowed.

F: (float to int)

Again, one could argue that if you were sending a float to something
that required an int, you probably meant to do that. But,
historically, it can be a source of errors. Lots of errors, in fact.
So perhaps it should _not_ be allowed? Or perhaps a single pragma to
enable/disable it?

N: (numeric range)

This one is a giant pain. Converting, say, an Int to an int will,
in fact, fail to do the right thing if you're in BigInt territory, such
that the number would have to be truncated to fit in a standard <int>.
But 99% of the time, you won't be working with numbers like that, so it
would seem a horrible thing to disallow Int --> int and Num --> num
conversions under the remote chance you *might* be hitting the range
boundary. Then again, it would seem a horrible thing to hit the range
boundary and not be informed of that fact...

Looking at one potential simplification of the table, then, would yield
this:

FROM -> str Str int Int num Num bit Bit bool Bool Scalar
TO: str - + + + + + + + + + +
Str + - + + + + + + + + +
int S S - N F NF + + + + +
Int S S + - F F + + + + +
num S S + + - N + + + + +
Num S S + + + - + + + + +
bit + + + + + + - + + + +
Bit + + + + + + + - + + +
bool + + + + + + + + - + +
Bool + + + + + + + + + - +
Scalar + + + + + + + + + + -

... leaving only a handful of cases to deal with. Three of them, I
believe. Alas, three pragmas is still three pragmas too many, IMHO...
:-/


MikeL

Austin Hastings

unread,
Apr 28, 2003, 3:20:20 PM4/28/03
to Michael Lazzaro, perl6-l...@perl.org

--- Michael Lazzaro <mlaz...@cognitivity.com> wrote:

> *: (undefness and properties lost)
>
> I'd argue these -- being able to convert an uppercase type to a
> lowercase (primitive) type -- should always be allowed. If you're
> sending an Int to something that requires an C<int>, you know that
> the 'something' can't deal with the undef case anyway -- it doesn't
> differentiate between undef and zero. Thus, you meant to do that:
> it's an "intentionally destructive" narrowing.

I'm still not sure how much loss will be suffered in the conversion.

Specifically, Larry has made noises about being able to use the C<.id>
method of small types, via a "featherweight" mechanism where some sort
of out-of-band "management" object handles the delegations.

Since properties are just singleton method overrides, and undef could
be implemented as a property, the whole shebang reduces to a question
of implementation-specifics: how much "first class" behavior will small
types enact?

I can easily see:

- none. (Contrary to some posts from @Larry, but easy to describe.)

- minimal: Implements the methods supposedly inherited from class
Object or Primitive or Scalar or whatever (.id, .serialize, etc).

- small: Above, plus all the Large-type methods: conversion, coercion,
upper/lower case, etc.

- class: All class methods are implemented. (I prefer this as a minimum
because it means I can derive from a small type and rely on getting
method calls done.)

- slow-props: Additionally, properties and/or undef are supported, but
their implementation is probably going to perform really poorly.

- full-props: There's no difference between the small and large types,
performance wise, but at least part of the storage is guaranteed.

I have heard commentary from @Larry regarding the implementation of up
to the slow-props configuration.

=Austin

Michael Lazzaro

unread,
Apr 29, 2003, 1:57:36 PM4/29/03
to perl6-l...@perl.org
This is really, really important stuff:

I thoroughly believe this much -- that lowercase types can do all class
methods, including .id. For example, C<str> can still do C<str.chars>,
C<str.lc>, whatever. This much can be done by the compiler, and pretty
easily.

> - slow-props: Additionally, properties and/or undef are supported, but
> their implementation is probably going to perform really poorly.
>
> - full-props: There's no difference between the small and large types,
> performance wise, but at least part of the storage is guaranteed.
>
> I have heard commentary from @Larry regarding the implementation of up
> to the slow-props configuration.

These are the parts I'm still wary of. _Really_ wary, actually.

The central premise of the lowercase types, at least as I've always
understood it, is to be _small_ and _fast_, compared to the
full-fledged types. A way to perhaps get closer to C speed on
primitive algorithms, so you wouldn't have to escape to C to do it.

For example, simply saying:

my int $i = 10**6;
while ($i--) {...}

Would be faster using an C<int> rather than an C<Int>, because it
doesn't have to check for a 'truth' property, it should know it's a
straight (int != 0) comparision. So using primitive types could
produce something much closer to the equiv C code.

The most recent mention of lowercase types we've seen from Larry, IIRC,
is where he said he wanted an Array of 10**6 ints to be something durn
close to 10**6 native ints long, storage-wise. That leaves the option
open for some out-of-band storage of properties, perhaps -- but would
that mean that, every time you FETCHed one of those ints, you'd have to
go searching to re-attach its properties?

I'm still not seeing quite what the p6i plan is that would allow
properties to be attached to primitive types, and yet still have any
significant speed/space gains _whatsoever_ from the promoted, uppercase
types. It seems to me that, if you allow things like undef or 'but
true/false' on a primitive, you've introduced the need to perform all
the same checks, when accessing, as the promoted types. So what do you
gain by using primitives?

Or, to put it another way: can we make primitives smaller/faster enough
to justify the complexity of having primitives? And if we can't, why
have them?

Now, mind you, my best-case scenario is one in which we don't have to
make the int/Int and str/Str distinction at all... it would
automatically use the smaller, faster form when it "knows it can." But
I have yet to find much to give me faith that that's knowable. How
difficult would it be to _know_ that a given array of Ints will _never_
contain Ints that contain undef or any properties? Seems like dataflow
analysis that borders on magical, to me.

I realize that I am largely alone in harping about P5/P6 speed issues
(well, on this list), but I cannot emphasize enough how important it
will be to adoption. Speed issues in p5 are the _primary_ battle I
have when defending P5 inside companies, and are frequently a major
factor cited when one of these companies converts from Perl5 to
something else.

MikeL

Austin Hastings

unread,
Apr 29, 2003, 5:27:56 PM4/29/03
to Michael Lazzaro, perl6-l...@perl.org

I was with you up to _fast_. My original encounter with small types was
in the context of being I<small> -- a small type takes just exactly how
many bits you think it does. This was advertised as being really
valuable in arrays, etc.

>
> For example, simply saying:
>
> my int $i = 10**6;
> while ($i--) {...}

In this example, you really don't care how much space $i takes up,
right? I mean, perl could allocate an entire page of VM to store this
value, so long as it ran faster that way. (Yes, I've done that before
on SunOS, where you could pin a VM page... :-)

But in the other case:

my bit @is_profane;
my bit @is_proper;
my bit @is_scrabble;

for %dictionary.kv -> $word, $etymology {
given $definition {
my $wordnum;
when /profane/ { $wordnum //= word2num($word);
@is_profane[$wordnum] = 1; }
when /proper/ { $wordnum //= word2num($word);
@is_proper[$wordnum] = 1; }
when ! /proper/ && ! /profane/ && $word.length >= 3 {
$wordnum //= word2num($word);
@is_scrabble[$wordnum] = 1; }
}
}
}

I want to be relatively sure that my 2 million words don't consume
?overhead + 40 bytes/PMC * 3 PMCs/word * 2 million words = 240 million
bytes of storage plus ? overhead, but rather than they consume as close
as possible to ? overhead + 1 bit / entry * 1/8 bit / byte * 3 entries
/ word * 2 million words = ? overhead + 750,000 bytes.

So I'm pretty sure that "small types = small storage".

It's *NICE* if the small types also happen to be fast types -- but I
think it's a side effect. (In fact, Dan has some money and a pie riding
on overall performance...)

I think that you're saying "the small types are going to give away
genericity in favor of storage", and one the generic-object-dispatch
stuff goes away, we'll get better performance as a side effect.

But if "small" is a storage class -- a "variable type" as opposed to a
value type -- in the same way that "array" is a storage class, then it
should be possible to extend the class.

Specifically, we've had examples like:

class PersistentArray is Array {...}
my @a is PersistentArray;

I should be able to say:

class IOport is byte {...}
my $io_01E7 is IOport(phys_addr => 0x01e7);

just as effectively. So I want to make sure that the physical
properties remain constant, but I don't mind if there's management
overhead out-of-band.

> Would be faster using an C<int> rather than an C<Int>, because it
> doesn't have to check for a 'truth' property, it should know it's a
> straight (int != 0) comparision. So using primitive types could
> produce something much closer to the equiv C code.
>
> The most recent mention of lowercase types we've seen from Larry,
> IIRC, is where he said he wanted an Array of 10**6 ints to be
> something durn close to 10**6 native ints long, storage-wise.
> That leaves the option open for some out-of-band storage of
> properties, perhaps -- but would that mean that, every time you
> FETCHed one of those ints, you'd have to go searching to re-attach
> its properties?

I think this goes back to the run-time-dispatch stuff, and to strong
typing. If you declare:

my int $i;
# or
my $i is int;

then the system should be smart enough to know (that which is knowable)
about the object in question -- just as with any other type.

If the data flow indicates that the object isn't being messed with
during your loop, then the run-time dispatch should be cached and only
done once.

But the notion that the storage plan changes will cause problems all
the way down the line: When you pass a small type as a parameter,
everyone will look at you funny if it doesn't autopromote to a big
type. But a reference to int is probably going to have a WAY different
memory layout than a reference to Int -- and all params are reference
by default. Consider:

sub foo($i) { print "$i\n"; }

my int $x = 0xDEADBEEF;
foo($x);

By default, $i is a readonly reference parameter, obviously expecting a
reference to a generic scalar/object type.

If "reference-to-int" is the same thing as C<extern 'C' { int * }> then
the whole autodispatch kit and kaboodle goes out the window.

And if it ISN'T, then how big is an array of 10**6 of them?


> I'm still not seeing quite what the p6i plan is that would allow
> properties to be attached to primitive types, and yet still have any
> significant speed/space gains _whatsoever_ from the promoted,
> uppercase types. It seems to me that, if you allow things like
> undef or 'but true/false' on a primitive, you've introduced the
> need to perform all the same checks, when accessing, as the
> promoted types. So what do you gain by using primitives?

With some hand-waving, the plan is either going to involve
"featherweight" objects (see Damian's book), or dynamic conversion, or
thunks, or something far more sinister.

Let's assume that featherweight objects are involved. If they are, then
you've got a "manager" that is physically separate from the actual
storage of the object. In this mode, the compiler either calls the
manager with the address of the storage, or it calls the manager and
the manager looks back at the caller to determine the address of
storage.

In either case, the size of one int is pretty much the same as the size
of one Int. But you can see big gains when you have 10**6 of them next
to each other, and only one manager object.

OTOH, if we assume that dynamic conversion is the basis of all this,
then the compiler looks at each entry/exit point, and does type
compatibility. If the return value of a function call or expression is
KNOWN to be a compatible type, then great. Otherwise, force a
conversion. If the argument type of a called function is KNOWN to be a
compatible type, great, otherwise generate a temporary Int.

This is pretty comparable to what strong typing will do for any other
type (try constructing a similar exercise with a user-defined "complex"
type, for instance).

But it falls down quite badly when you are trying to take advantage of
the small type WITHOUT rewriting all your code -- once you cross the
"small-to-large" boundary, you've instantiated a temporary (large)
scalar to represent you until you return. So much for plan 'A'.

> Or, to put it another way: can we make primitives smaller/faster
> enough to justify the complexity of having primitives? And if we
> can't, why have them?

I think that on behavioral spec and size alone they can be justified.
When Paul was writing earlier about wanting to convert a 'C' structure
declaration, my first thought was "just code them as small types", vis:

struct MyStruct {
int i;
char buf[43];
float f;
};

versus:

class MyStruct is packed(align => 'word') {
has int $.i;
has byte @.buf is Array(size => 43);
has float $.f;
}

> Now, mind you, my best-case scenario is one in which we don't have to
> make the int/Int and str/Str distinction at all... it would
> automatically use the smaller, faster form when it "knows it can."

Hmm. I think the correct frequency of that behavior should be really
close to zero -- perl should be optimizing for speed by default, and
the "full blown" types should still be keeping that as their primary
tenet. First, run quick. Then, run quick as a typed object. Then, run
quick as an untyped object. Then, be small.

> But I have yet to find much to give me faith that that's knowable.
> How difficult would it be to _know_ that a given array of Ints will
> _never_ contain Ints that contain undef or any properties? Seems
> like dataflow analysis that borders on magical, to me.

But it doesn't have to be. All you need to know is that the array of
Int's doesn't contain undef _at the moment_, and you're fine.

OTOH, VM is really close to free. So why not store the whole enchilada
each time?

OTGH, perhaps there should be two classes of object, by default:
singular and collective. Things stored as collective objects could have
a slimmer memory profile, perhaps using a pointer lookup slightly
earlier in the process, but not giving back much performance in the
normal case.

> I realize that I am largely alone in harping about P5/P6 speed issues
> (well, on this list), but I cannot emphasize enough how important it
> will be to adoption. Speed issues in p5 are the _primary_ battle I
> have when defending P5 inside companies, and are frequently a major
> factor cited when one of these companies converts from Perl5 to
> something else.

I don't think you're the only person concerned about speed -- I'm an
embedded guy from way back, so "how's that look in metal" is kind of
second nature to me, although I shed that skin years ago.

But I think that confidence is pretty high that Larry's been doing a
lot of housekeeping to clean up the core, which should give us
performance even with P5 at a minimum. Then, Dan's gone into a wagering
frenzy, offering run performance benchmarks against Bill Gates and
dotNet with the title to their respective houses at stake. (Well, maybe
not. But it would be a pretty good bet... :-)

> MikeL

=Austin

Michael Lazzaro

unread,
May 1, 2003, 4:39:05 PM5/1/03
to Austin_...@yahoo.com, perl6-l...@perl.org

On Tuesday, April 29, 2003, at 02:27 PM, Austin Hastings wrote:
<some plausible explanations of how primitives might work>

Sorry for the pseudoWarnocking there; I was contemplating, esp. the
implications of var vs value typing and what that approach would allow.

I guess my sole criteria is this: I'd like to be able to write P6
routines that do, for example, image manipulation -- such as rotating a
gif 90 degrees -- and have those routines not be laughably slow
compared to a native C implementation.[*] That's pretty much my only
measure of "success" -- how far I can go, in my program, before I have
to abandon Perl for C.

Which means that working with individual bytes or integers, and
collections of bytes /integers, needs to JIT down to something as close
as possible to 'native'. My assumption has been that you wouldn't be
able to get that without explicitly declaring, in your code, that you
were willing to strip a var/val of its ancillary information, and
allowing the compiler to _enforce_ that stripped state.

Hmm... So are you saying that you think it will be possible to
"automatically" get primitives through dataflow analysis? I must say,
if that's possible, I don't think we should be making the Int/int,
Str/str, etc., distinctions at all, right?

MikeL

[*] Not that I'm personally going to be doing any image manipulations,
mind you. But it seems a convenient metric for the kind of behavior
I'm thinking about.

Luke Palmer

unread,
May 1, 2003, 5:15:29 PM5/1/03
to mlaz...@cognitivity.com, Austin_...@yahoo.com, perl6-l...@perl.org
> I guess my sole criteria is this: I'd like to be able to write P6
> routines that do, for example, image manipulation -- such as
> rotating a gif 90 degrees -- and have those routines not be
> laughably slow compared to a native C implementation.[*] That's
> pretty much my only measure of "success" -- how far I can go, in my
> program, before I have to abandon Perl for C.

As far as a speed, I say this is a good metric.

> Which means that working with individual bytes or integers, and
> collections of bytes /integers, needs to JIT down to something as
> close as possible to 'native'. My assumption has been that you
> wouldn't be able to get that without explicitly declaring, in your
> code, that you were willing to strip a var/val of its ancillary
> information, and allowing the compiler to _enforce_ that stripped
> state.

I would agree. C<int> should map to Parrot I* registers, or something
close to them. Likewise with C<num> and C<str> to N* and S*. Austin
says that Perl Int's should be fast anyway, and C<int> should just be
there for the size, but obviously he hasn't played with Parrot very
much.

The thing that wowed me when I checked out the source tree was the
primes example (the first example I ran :). When C<int>s are JIT'd,
Parrot performs darn close to C, and there's no way that's going to
happen with Perl Int's, considering all the magic associated with
them. They'll still be fast, but never C<int> fast, I presume.

> Hmm... So are you saying that you think it will be possible to
> "automatically" get primitives through dataflow analysis?

Not a chance. Some seem to think compiler optimization is the
solution to all of our speed problems. In the real world, there's
only so much it can do. Dataflow is so complex in so many of the
cases, analysis is lucky to find loop invariants, much less know
whether it'll need a "big" or "small" number.

> I must say, if that's possible, I don't think we should be making
> the Int/int, Str/str, etc., distinctions at all, right?

Well, the distinction is good even if it doesn't do anything in early
versions (even though it probably will). It's a psychological thing
that makes people think that code that uses them is going to run
really really fast (like C<register> in C). And if people feel like
their tight loops are running fast, they're free to make the design of
the rest of the program better, because they're done with the stuff
that "matters".

Or that's generally how my mind works when I'm writing something
speed-critical, at least.

> MikeL

Luke

Austin Hastings

unread,
May 1, 2003, 6:15:17 PM5/1/03
to Michael Lazzaro, Austin_...@yahoo.com, perl6-l...@perl.org

--- Michael Lazzaro <mlaz...@cognitivity.com> wrote:
>
> On Tuesday, April 29, 2003, at 02:27 PM, Austin Hastings wrote:
> <some plausible explanations of how primitives might work>
>
> Sorry for the pseudoWarnocking there; I was contemplating, esp. the
> implications of var vs value typing and what that approach would
> allow.

Yes. The distinction between stowage/storage types enables some pretty
wild things, if you start thinking in that direction. (Or it seems to,
at any rate -- I went a little bit in that direction and then decided
to return to "Hmmm...donuts" -- it's kind of scary.)


> I guess my sole criteria is this: I'd like to be able to write P6
> routines that do, for example, image manipulation -- such as rotating
> a gif 90 degrees -- and have those routines not be laughably slow
> compared to a native C implementation.[*] That's pretty much my only
> measure of "success" -- how far I can go, in my program, before I
> have to abandon Perl for C.

A very one-dimensional metric, that.

I'd suggest that there are a whole host of dimensions to measure along,
including expressiveness, intuitiveness, and maintainability.

I don't think that anybody anywhere believes that it would be worth
abandoning Tk (in whatever flavor) in favor of X programming in C.

But "raw performance" is appropriate for huge apps. You've mentioned
you have performance concerns -- what kind of stuff do you do?

> Which means that working with individual bytes or integers, and
> collections of bytes /integers, needs to JIT down to something as
> close as possible to 'native'. My assumption has been that you
> wouldn't be able to get that without explicitly declaring, in
> your code, that you were willing to strip a var/val of its
> ancillary information, and allowing the compiler to _enforce_
> that stripped state.

Actually, I'm wondering just exactly WHAT we'll be able to do, knowing
there's a JIT sitting around.

I'd very much like to be able to say:

macro arch(%defs) is parsed(<sub_block>) {
return %defs{$::CPU_TYPE} // %defs{'Perl'} // %defs{'default'};
}

sub inner_loop
will do arch(
x86 => {...},
mips => {...},
default => {...}
)

And just let it pass through to the JIT.

We'll have to wait for A99 for that one, I guess.

> Hmm... So are you saying that you think it will be possible to
> "automatically" get primitives through dataflow analysis? I must
> say, if that's possible, I don't think we should be making the
> Int/int, Str/str, etc., distinctions at all, right?

No. I'm saying that with the strong typing plus some DFA, it should be
possible to elide a huge fraction of features, especially if we have
pragmata available to "really promise" good type compliance.

If there's *no* type data available, then the data flow would have to
know more and more about the return types of functions, etc. Not
feasible when much of our work relies on autoconverting string to int
and the like.

Instead, I think that if you say "my Int $i = foo();" then either (1)
you get dynamic typechecking on the type of foo(); or (2) you promise
that foo() will return an Int.

The worry has been that at some level, program bugs can cause you to
return something-not-an-Int. Ergo, run-time type checks.

But if we can optimize that somehow (e.g., by using C<as TYPE>) such
that the comparison is a single-word operation, then we get killer
speed.

Consider this from back when:
my Vehicle $v = get_vehicle();

$v.start;
$v.accelerate(target_speed => 10);
$v.turn(direction => left, angle => 90);
$v.decelerate(target_speed => 0);
$v.turn_off;

This, everyone thinks, will polymorphically invoke the C<start> method
for god-knows-what type, derived from Vehicle most likely.

But suppose you KNOW that certain operations always dispatch the same
methods. Or that you would rather force a certain behavior.

You could say:

$v.Vehicle::start;
$v.Vehicle::accelerate(target_speed => 10);
...

and presto! No dynamic method dispatch -- we're forcing dispatch of the
Vehicle:: package's methods.

Or, in the flavor of Javascript's C<with> statement, perhaps we could
say:

my Vehicle $v = get_vehicle() as Vehicle;

$v.start;
$v.accelerate(target_speed => 10);

and get "coerced" dispatch -- the compiler knows that $v is constrained
to be a Vehicle (not some fancy extension) so it knows that it can
replace the dispatch with a direct method call.

But this is PERL! Suppose some numbskull returns some non-vehicle.

module yourmodule;
sub get_vehicle {return Horse: new;}

Then what happens to our fast dispatch?

Well, in keeping with Larry's idea of multiple entry points, and in
keeping with his suggestion of being able to choose between
caller-checks and callee-checks, the compiler decides: "The caller can
do a rudimentary check on this one -- I'll use the callee's <guaranteed
okay> entry point"

*NOW* we need to think about how to optimize the type-storage stuff.
The simplest way, in cases where the type is constrained, is to do a
literal compare of the vtable pointers. (Warning: subtle class
optimization bug possible here.)

If every class has a singular vtable, then the pointer to that vtable
is a unique class-id. So just remember what the vtable pointer for a
Vehicle is supposed to be, and compare $v.vtable against that pointer.
If they're the same, great -- you have a Vehicle. If they're not the
same, then you either have (1) a subclass of Vehicle; or (2) an
imposter.

In the "way fast" case, both of those generate an abort. In any other
case, you would chase some kind of C<$v.implements> list to get all the
interfaces, and all the superclasses of the interfaces, etc.

Since most inheritance trees contain small (<100) classes, you could
probably answer the C<$v.implements> with an lsearch through a
null-terminated list. It might be profitable to build a bitmask,
however, in certain circumstances. Just assign a bitnumber to every
class, and maintain a fixed size bitvector for every class (N-squared
bits, where N = #classes).

You can even do the same thing for methods, given that you know the
names of all the defined methods -- for each method name, list the
classes that implement it. Then for each object (classes, mostly - but
you might have some one-offs) keep a list of classes in "search order".

Regardless of all that stuff, the point is that if you can promise the
compiler good behavior, the compiler can reduce your type overhead to a
16 or 32 bit compare (or 64, what the hell!) for "confirmation"
purposes. That's WAY slower than native C, but much faster than
everything else, for generic stuff.

For specific stuff like loop decrements and so forth, I'll bet you a
dollar that they get their own opcodes. As I was saying to Paul Hodges
earlier, the way to reduce the cost of the object interface is to
reduce the number of calls you make. In this case, I suspect that there
might wind up being "decrement a loop counter and branch if not zero
and call the cleanup code if zero and remove the counter from the
current scope and wash the dishes and take out the trash" opcodes, once
the need for them is established -- I know the Parrot guys are planning
for plug-in bits.

(Kind of nice to own the VM, ain'a?)

Michael Lazzaro

unread,
May 1, 2003, 8:47:46 PM5/1/03
to Austin_...@yahoo.com, perl6-l...@perl.org

On Thursday, May 1, 2003, at 03:15 PM, Austin Hastings wrote:
> --- Michael Lazzaro <mlaz...@cognitivity.com> wrote:
>> I guess my sole criteria is this: I'd like to be able to write P6
>> routines that do, for example, image manipulation -- such as rotating
>> a gif 90 degrees -- and have those routines not be laughably slow
>> compared to a native C implementation.[*] That's pretty much my only
>> measure of "success" -- how far I can go, in my program, before I
>> have to abandon Perl for C.
>
> A very one-dimensional metric, that.
>
> I'd suggest that there are a whole host of dimensions to measure along,
> including expressiveness, intuitiveness, and maintainability.

Sorry, I meant "sole speed-related criteria". The other dimensions I'm
already quite cheerful about, so I don't worry about them.

Not to say speed isn't just as important as the other dimensions, of
course; the flexibility and expressiveness of a language will do me
little good if the managers of a company tell me "you can't use that
language for our apps, because this magazine on my desk says it is too
slow." Hoo, been there...


> But "raw performance" is appropriate for huge apps. You've mentioned
> you have performance concerns -- what kind of stuff do you do?

Why, this:
http://www.cognitivity.com/

Short version: distributed apps. Large, heavily OO-based distributed
apps. Focus is primarily server-side web based stuff, so "speed" is
measured in number of hits per second before the machine melts. :-/

MikeL

David Storrs

unread,
May 5, 2003, 5:13:31 PM5/5/03
to perl6-l...@perl.org
On Mon, Apr 28, 2003 at 10:36:44AM -0700, Michael Lazzaro wrote:

> J: (scalar junctive to typed scalar)
>
> Same thing here... you're specifically asking for only one aspect of
> the scalar, so one could presume you meant to do that to. So always
> allowed.

This seems reasonable; I have a hard time imagining a case where this
would produce significant and hard-to-find bugs.

> B: (conversion to boolean)
>
> For things like int --> bit or num --> bit conversions, one can argue
> that it's not a big deal, and those should _always_ be allowed. After
> all, if you've specified those types, it's because you only care about
> 1/0 or true/false. So always allowed.

How does this work? If the int is 0, you get 0, else 1? Or does it
look at MSB? LSB? Does endianness matter?


> S: (string to numeric)
>
> This is a trickier one, because the string _may_ not be entirely
> numeric. In P5, a string like '1234foo' numifies to 1234, but you
> don't always want that ... sometimes, you want to throw an error if the
> string isn't 'cleanly' a number. So I'd argue this requires a pragma,
> for all types of string --> numeric conversion, selecting between
> "strict" and "non-strict" string conversions.

I agree with the idea that there should need to be an explicit marker
saying whether you want strict or non-strict conversions. I'm not
sure a pragma is the right way to go, though...in this case, it smacks
too much of action-at-a-distance. Perhaps a trait?

sub foo(int x) { ... }

# Lousy name for the trait, but can't think of a better choice
foo('1234bar' is loosely-numed);

OTOH, a pragma is a good tool for establishing a general policy, which
could be overriden locally with traits. *shrug*


> *: (undefness and properties lost)
>
> I'd argue these -- being able to convert an uppercase type to a
> lowercase (primitive) type -- should always be allowed. If you're
> sending an Int to something that requires an C<int>, you know that the
> 'something' can't deal with the undef case anyway -- it doesn't
> differentiate between undef and zero. Thus, you meant to do that: it's
> an "intentionally destructive" narrowing.
>

It wouldn't hurt to make a pragma, "use strict 'object_types'", which
would make this a warning/error. I can see doing this accidentally,
and having it be a hard bug to spot, since the only difference is the
capitalization of one letter in the signature.


> F: (float to int)
>
> Again, one could argue that if you were sending a float to something
> that required an int, you probably meant to do that. But,
> historically, it can be a source of errors. Lots of errors, in fact.
> So perhaps it should _not_ be allowed? Or perhaps a single pragma to
> enable/disable it?


Again, I'd go with the pragma route. Give people the freedom to shoot
themselves whereever they like, unless they ask for a safety on the gun.

>
> N: (numeric range)
>
> This one is a giant pain. Converting, say, an Int to an int will,
> in fact, fail to do the right thing if you're in BigInt territory, such
> that the number would have to be truncated to fit in a standard <int>.
> But 99% of the time, you won't be working with numbers like that, so it
> would seem a horrible thing to disallow Int --> int and Num --> num
> conversions under the remote chance you *might* be hitting the range
> boundary. Then again, it would seem a horrible thing to hit the range
> boundary and not be informed of that fact...

A couple of potential solutions:

1) The conversion is always permitted, but there is a pragma to turn
range-checking on and off. When range checking was on, suppressible
warnings/errors would be thrown at runtime.

2) We could introduce another status variable, similar to $@. I
propose repurposing the $% variable ($FORMAT_PAGE_NUMBER from P5;
formats are coming out of core). The mnemonic is that one circle is
crossing the boundary to become the other.

After any conversion that caused a truncation (or whatever other error
conditions we decided on; several of the ones discussed above would be
well suited here), $% would be set to the appropriate error message.


--Dks

Austin Hastings

unread,
May 5, 2003, 6:35:41 PM5/5/03
to David Storrs, perl6-l...@perl.org

--- David Storrs <dst...@dstorrs.com> wrote:
> On Mon, Apr 28, 2003 at 10:36:44AM -0700, Michael Lazzaro wrote:
>
> > J: (scalar junctive to typed scalar)
> >
> > Same thing here... you're specifically asking for only one
> aspect of
> > the scalar, so one could presume you meant to do that to. So
> always
> > allowed.
>
> This seems reasonable; I have a hard time imagining a case where this
> would produce significant and hard-to-find bugs.
>
>
>
> > B: (conversion to boolean)
> >
> > For things like int --> bit or num --> bit conversions, one can
> argue
> > that it's not a big deal, and those should _always_ be allowed.
> After
> > all, if you've specified those types, it's because you only care
> about
> > 1/0 or true/false. So always allowed.
>
> How does this work? If the int is 0, you get 0, else 1? Or does it
> look at MSB? LSB? Does endianness matter?
>

I think it should probably take the lsb.

Consider that C<$x >> $y> will produce a result that's probably Int or
Scalar, and converting it should DWIM -- take the bit I just selected.

Of course, C<my bit @field = $x & $mask;> should DWIM, too,
automagically extracting the bits that I cared enough to mask...but
which was the mask and which was the variable?

Go with LSB.

> > S: (string to numeric)
> >
> > This is a trickier one, because the string _may_ not be
> entirely
> > numeric. In P5, a string like '1234foo' numifies to 1234, but you
> > don't always want that ... sometimes, you want to throw an error if
> the
> > string isn't 'cleanly' a number. So I'd argue this requires a
> pragma,
> > for all types of string --> numeric conversion, selecting between
> > "strict" and "non-strict" string conversions.
>
> I agree with the idea that there should need to be an explicit marker
> saying whether you want strict or non-strict conversions. I'm not
> sure a pragma is the right way to go, though...in this case, it
> smacks too much of action-at-a-distance. Perhaps a trait?
>

I think this is a case where AAD is the right behavior -- you're
"tuning for style", not specifying behavior on every individual entity.


(I *SURE* don't want to have to code every stringvar as "is
strictly-numed", and I'm sure someone out there equally resents the
reverse.)

> > *: (undefness and properties lost)
> >
> > I'd argue these -- being able to convert an uppercase type to a
>
> > lowercase (primitive) type -- should always be allowed. If you're
> > sending an Int to something that requires an C<int>, you know that
> the
> > 'something' can't deal with the undef case anyway -- it doesn't
> > differentiate between undef and zero. Thus, you meant to do that:
> it's
> > an "intentionally destructive" narrowing.
> >
>
> It wouldn't hurt to make a pragma, "use strict 'object_types'", which
> would make this a warning/error. I can see doing this accidentally,
> and having it be a hard bug to spot, since the only difference is the
> capitalization of one letter in the signature.

For sure!

Moreso, 'C' programmers are going to want to use the small types all
the time, by default.

I think a warning should be the defaut behavior, and a pragma available
to enable "I'm smarter than you" conversions. (Perhaps this could go in
site.pl?)

>
>
> > F: (float to int)
> >
> > Again, one could argue that if you were sending a float to
> something
> > that required an int, you probably meant to do that. But,
> > historically, it can be a source of errors. Lots of errors, in
> fact.
> > So perhaps it should _not_ be allowed? Or perhaps a single pragma
> to
> > enable/disable it?
>
> Again, I'd go with the pragma route. Give people the freedom to
> shoot themselves whereever they like, unless they ask for a safety
> on the gun.
>

Again, this should be part of some sort of "basic" warnings package.

You know, I remember being able to 'catch' warning exceptions in P5.
Perhaps we should resurrect that as an optional behavior:

- Option 1: Do nothing.
- Option 2: Throw a "warning" exception, the default handler for which
carps the associated message and invokes the continuation thrown with
the exception.
- Option 3: Halt and Catch Fire.

=Austin

Luke Palmer

unread,
May 5, 2003, 7:56:02 PM5/5/03
to Austin_...@yahoo.com, dst...@dstorrs.com, perl6-l...@perl.org
> > > B: (conversion to boolean)
> > >
> > > For things like int --> bit or num --> bit conversions, one can
> > argue
> > > that it's not a big deal, and those should _always_ be allowed.
> > After
> > > all, if you've specified those types, it's because you only care
> > about
> > > 1/0 or true/false. So always allowed.
> >
> > How does this work? If the int is 0, you get 0, else 1? Or does it
> > look at MSB? LSB? Does endianness matter?
> >
>
> I think it should probably take the lsb.
>
> Consider that C<$x >> $y> will produce a result that's probably Int or
> Scalar, and converting it should DWIM -- take the bit I just selected.
>
> Of course, C<my bit @field = $x & $mask;> should DWIM, too,

By which, of course, you mean C<$x +& $mask> :)

> automagically extracting the bits that I cared enough to mask...but
> which was the mask and which was the variable?
>
> Go with LSB.

Gee, that's a great idea. All even integers end up 0.

No, I'd say go with nonzero. That's how C's bitops have been working
for ages, and there's cultural pressure to keep it that way. Going
with the LSB would be surprising and counterintuitive. Masks, as it
seems, are more common that LSB extraction, and it's not as if C<+& 1>
is all that hard to type.

Luke

Dulcimer

unread,
May 5, 2003, 9:24:50 PM5/5/03
to perl6-l...@perl.org
> > > For things like int --> bit or num --> bit conversions, one can
> > > argue that it's not a big deal, and those should _always_ be
> > > allowed. After all, if you've specified those types, it's because
> > > you only care about 1/0 or true/false. So always allowed.
> >
> > How does this work? If the int is 0, you get 0, else 1? Or does
> > it look at MSB? LSB? Does endianness matter?
>
> I think it should probably take the lsb.
>
> Consider that C<$x >> $y> will produce a result that's probably Int
> or Scalar, and converting it should DWIM -- take the bit I just
> selected.
>
> Of course, C<my bit @field = $x & $mask;> should DWIM, too,
> automagically extracting the bits that I cared enough to mask...but
> which was the mask and which was the variable?
>
> Go with LSB.

Time to stick my foot in my mouth again, I guess.

my Int $two = 2; # LSB is now a 0, isn't it?
my bit $t = $two; # *SHOULDn't* this be 1?

__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com

Austin Hastings

unread,
May 5, 2003, 10:24:35 PM5/5/03
to Luke Palmer, dst...@dstorrs.com, perl6-l...@perl.org

If I say:

#define WIDTH (somenumber)

struct bits {
unsigned bit : WIDTH;
} bitstr;

And then:

bitstr.bit = 2;

I don't get "well, it was non-zero". I get the WIDTH lsbs of 2. If WIDTH
happens to be 1, then I get the 1 lsb of two, which you've correctly
computed is 0.

I think you meant that's how C's LOGIC ops (&&, ||) work -- which is one
reason we're having that bool/bit thread.

=Austin

0 new messages