The type system

Michiel

unread,

Mar 13, 2010, 4:29:11 PM3/13/10

to mist-discuss

C++ is the only programming language I've used with any frequency.
Sure, I am familiar enough with C, D, Java(script) and family and have
had some experience with Prolog and Scheme. But my background is
basically C++, so that's what I set out to improve upon when I started
designing Mist.

I haven't been thinking outside the box enough. Let's talk type
systems.

One of the basic goals of Mist has always been to be compiled with a
verifying compiler. To be as flexible/elegant as possible even if it
takes some extra runtime checking, and replace that with compile-time
checking as the compiler become smarter or the programmer provides
more effective redundant annotations.

So why not design the type system according to the same philosophy?
Most choices below will result in dynamic type checking, but a lot of
common cases can still be handled statically.

I'm sure I'm not the first one thinking of these. Are there existing
(imperative) programming languages implementing types as I describe
below? Maybe there are papers exploring some of these issues? For now,
this is just a brainstorming session. I need more information.
Opinions, too.

What do you think?

--- (1) ---

Enums. In C++, enum values are just integer constants. In Java, they
are bound to a class. But I believe they should just be symbols. New
values. Not equal to anything other than themselves and not by default
cast-able to any other type. Example:

val North, East, South, West;

They don't have to be part of an enum either (see (3)). But for enums,
perhaps some sort of set notation:

type CardinalDirection <- {North, East, South, West};

--- (2) ---

A type is an invariant. So why not allow a type to be defined (partly)
by an invariant condition? Example (tentative syntax):

type even_int <- int:(me mod 2 = 0); | the type of all even integers

even_int i <- 4; | ok
even_int j <- 5; | error

type NorthOrSouth <- CardinalDiraction:(me = North or me = South);

NorthOrSouth a <- South; | ok
NorthOrSouth b <- West; | error

--- (3) ---

A type is a set of values. So why not allow unions, intersections,
differences and subsets of types? Examples:

val inf; | infinity value
type inf_int <- int + {inf}; | integer type with infinity

val null; | null value
type nullable?(T) <- T + {null}; | nullable type T (idempotent)

type IntOrString <- int + string;

--- (4) ---

Variables have types. That's useful. The type determines which values
a var is allowed to take. But in C++ and many other languages, a
single value has a type as well.

Integer literals can be suffixed (u, l, ll, etc.) in order to give the
literal the right type. But is 5 not the same as 5u, 5l and 5ull? Is 5
not also the same as 5.0?

Also, can 5 not be part of an infinite number of types (see (2), (3))?
Can't `ClassName()' or `new ClassName'? The same goes for any value,
really.

Another example: is the tuple (1, 2, 3, 4) not the same as the list
(1, 2, 3, 4)? Maybe it's just that int[] (int array or list) allows
sequences of any length whereas (int, int, int, int) allows only lists
of length 4, but the two have a non-empty intersection.

Amos Robinson

unread,

Mar 13, 2010, 7:49:26 PM3/13/10

to mist-discuss

hmm, one comment on the set unions:

I don't think I like the nullable:

type nullable?(T) <- T + {null};

because doesn't that mean that

nullable?(T) == nullable?( nullable?(T) )

which I think kind of breaks abstraction - suppose you had a
list(nullable?(T)) and an operation on list,
find :: (E -> Bool) -> list(E) -> nullable?(E),
then you wouldn't know whether the returned 'null' means that nothing
was found, or that the predicate matched on an element that was null.

which is why I think that you would definitely need a tagged sort of
union like Haskell et al; at least in this case.

dmbarbour

unread,

Mar 13, 2010, 7:51:35 PM3/13/10

to mist-discuss

I would suggest looking up Scala. It might already be what you need...
but even if it is not what you need, you could gain some significant
inspirations from it. For example, you mention Enums below, but I
believe Scala's "case classes" are better than anything Java or C++
provides in that role.

RE: types as invariants. This can be made to work, if you're careful.
The problem is aliasing (doubly a problem in presence of concurrency).
I.e. if a type contains (or is) a reference, and the condition on that
type involves that reference, then you cannot usually assert the
condition checked at time of assignment will still hold later, at time
of use. You will encounter a class of errors sometimes called
'TOCTTOU' (time of check til time of use).

RE: set of values. Given that you embrace dynamic programming, you can
make this work. However, one might wonder what the syntax is for
accessing a type-union. Consider a union of:
(int,bool) + (string,double).
How would you obtain useful values from this?

There is a reason most functional languages (and even Scala's case-
classes) tend to use tags of some sort - tagged unions simplify the
syntax for handling multiple types. Developers are essentially forced
to perform a case analysis.

On Mar 13, 1:29 pm, Michiel <mhelv...@gmail.com> wrote:
> Integer literals can be suffixed (u, l, ll, etc.) in order to give the
> literal the right type. But is 5 not the same as 5u, 5l and 5ull? Is 5
> not also the same as 5.0?

I'd say that '5' is not generally the same value as '5.0'. The two
expressions have distinct levels of precision. (5 is usually an exact
count, whereas 5.0 is often understood as 5.0 err 0.1 in science, or
5.0 err 1e-16 in C++).

Anyhow, it is important to realize that your "types are sets of
values" and C++'s "types are representations for values" are rather
distinct concepts.

R. Matthew Schellhas

unread,

Mar 13, 2010, 9:59:24 PM3/13/10

to mist-discuss

> I'm sure I'm not the first one thinking of these. Are there existing
> (imperative) programming languages implementing types as I describe
> below? Maybe there are papers exploring some of these issues?

There are almost certainly papers, though most (all?) I've seen use
formal notation which is... difficult to consume. As dmbarbour points
out, Scala might be some useful exploration. Some of the type
composition reminds me of haskell as well.

> Enums. In C++, enum values are just integer constants. In Java, they
> are bound to a class. But I believe they should just be symbols. New
> values. Not equal to anything other than themselves and not by default
> cast-able to any other type.

That Java binds them to a class is an implementation detail.
Realistically, it's the same sort of behavior.

>
> --- (2) ---
>
> A type is an invariant. So why not allow a type to be defined (partly)
> by an invariant condition? Example (tentative syntax):
>
> type even_int <- int:(me mod 2 = 0); | the type of all even integers

Because you quickly run into sub-typing issues. Is
int_divisible_by_four assignable to even_int? It should be, but that
sort of thing is difficult to prove programatically, especially for
(near) infinite types like strings. Further, assigning a plain old non-
constant int will require casting or coercion of some sort which adds
to programmer headaches.

>
> --- (3) ---
>
> A type is a set of values. So why not allow unions, intersections,
> differences and subsets of types? Examples:
>
> val inf; | infinity value
> type inf_int <- int + {inf}; | integer type with infinity
>
> val null; | null value
> type nullable?(T) <- T + {null}; | nullable type T (idempotent)
>
> type IntOrString <- int + string;
>

Some languages like haskell kind of allow this (http://www.haskell.org/
all_about_monads/html/maybemonad.html), though the mechanics of it are
not exactly what you're thinking of. Also, looking at a type as merely
a set of values tends to ignore some features you might need. A set of
member definitions is perhaps more applicable.

That said, my recent toy project (http://www.assembla.com/wiki/show/
tangent-lang) has some type operations similar to this under the hood,
and formal texts like Types and Programming Languages cover this sort
of thing a bit more completely.

> --- (4) ---

>
> Also, can 5 not be part of an infinite number of types (see (2), (3))?

Sure, but practically you can only make static guarantees about a
variable based on the declared type. This fits into (2) and some of
the casting/coercion you'd need to do when dealing with the value 5
when it is a non-constant. And then as soon as you do any operations
to the value, the certainty for which you can make static type
guarantees diminishes quickly.

*shrug* I don't really know too much about these things and am just
offering my first instinct based on the little practical work I've
done. I *do* recommend taking a tour of some things outside of the C++
world. That doesn't take too much knowledge or experience to realize...

Michiel

unread,

Mar 14, 2010, 4:19:49 AM3/14/10

to mist-discuss

On Mar 14, 1:49 am, Amos Robinson <amos.robin...@gmail.com> wrote:

> I don't think I like the nullable:
>
> type nullable?(T) <- T + {null};
>
> because doesn't that mean that
>
> nullable?(T) == nullable?( nullable?(T) )
>
> which I think kind of breaks abstraction - suppose you had a
> list(nullable?(T)) and an operation on list,
> find :: (E -> Bool) -> list(E) -> nullable?(E),
> then you wouldn't know whether the returned 'null' means that nothing
> was found, or that the predicate matched on an element that was null.

You're right. In those cases, you might instead use something like
this:

type optional?(T) <- T^ + {null}; | T pointer or null

But I think that in many cases people prefer not to dereference if
they don't have to.

Michiel

unread,

Mar 14, 2010, 7:37:14 AM3/14/10

to mist-discuss

On Mar 14, 1:51 am, dmbarbour <dmbarb...@gmail.com> wrote:

> I would suggest looking up Scala. It might already be what you need...
> but even if it is not what you need, you could gain some significant
> inspirations from it. For example, you mention Enums below, but I
> believe Scala's "case classes" are better than anything Java or C++
> provides in that role.

Case classes look a bit heavyweight for what I had in mind. But I see
how they can be used as enums. The enum type would be a base class and
each enum value would be a subclass. I had looked at Scala before, but
not closely enough, I think.

> RE: types as invariants. This can be made to work, if you're careful.
> The problem is aliasing (doubly a problem in presence of concurrency).
> I.e. if a type contains (or is) a reference, and the condition on that
> type involves that reference, then you cannot usually assert the
> condition checked at time of assignment will still hold later, at time
> of use. You will encounter a class of errors sometimes called
> 'TOCTTOU' (time of check til time of use).

Excellent point. I hadn't considered that. I guess I'm suggesting that
a type is statically defined (though possibly dynamically checked), so
each reference inside the type invariants would have to be immutable.

> RE: set of values. Given that you embrace dynamic programming, you can
> make this work. However, one might wonder what the syntax is for
> accessing a type-union. Consider a union of:
> (int,bool) + (string,double).
> How would you obtain useful values from this?

Basically, I'm suggesting you can treat a variable of that type as a
variable of (int, bool) type if you're sure that its current value is
of (int, bool) type. Any assignment to an (int, bool) variable, for
example, generates an implicit assertion to that effect. The Mist
compiler will often be able to prove that the assertion can never be
violated. In other cases, we keep it as a runtime check.

If you're not sure which of the two types the value is, you should
only pass it to operations that accept the union type (or a supertype
of it). In order to eventually distinguish between the two cases, you
may pass it to an overloaded function. Example:

void print(int, bool) { ... }
void print(string, double) { ... }

(int, bool)+(string, double) var <- fill_var();

print(var);

A form of late binding would automatically pass the value to the right
function at runtime.

I realize overlapping overloads may occur (one function taking values
1 to 10, another 5 to 15). Likely, the programmer should avoid this
situation. However, I would consider it legal if, in the overlapping
case, the two are functionally equivalent (functions have behavioral
contracts in Mist). This would be a lot harder to prove statically.
But I think it's `the right thing'.

> There is a reason most functional languages (and even Scala's case-
> classes) tend to use tags of some sort - tagged unions simplify the
> syntax for handling multiple types. Developers are essentially forced
> to perform a case analysis.

I am aware of that approach, but my approach somehow feels more
elegant. More trouble, sure. But I do like a challenge.

> > Integer literals can be suffixed (u, l, ll, etc.) in order to give the
> > literal the right type. But is 5 not the same as 5u, 5l and 5ull? Is 5
> > not also the same as 5.0?
>
> I'd say that '5' is not generally the same value as '5.0'. The two
> expressions have distinct levels of precision. (5 is usually an exact
> count, whereas 5.0 is often understood as 5.0 err 0.1 in science, or
> 5.0 err 1e-16 in C++).

Think rational numbers instead of floating point. I think I would like
5.0 to be exactly the same value as 5. There will eventually be a type
for real numbers as well. With literals looking perhaps like (5.002 ~
0.0005), or something. (5.002 ~ 0.0) would be the same value as 5.002.

There is certainly something to what you say about significant digits.
Scientific/mathematical notation is much much more important to me
than C++ notation. I'll give it some more thought.

> Anyhow, it is important to realize that your "types are sets of
> values" and C++'s "types are representations for values" are rather
> distinct concepts.

Certainly. Thanks for your interesting reply.

Michiel

unread,

Mar 14, 2010, 8:11:32 AM3/14/10

to mist-discuss

On Mar 14, 3:59 am, "R. Matthew Schellhas" <rms.tang...@gmail.com>
wrote:

> There are almost certainly papers, though most (all?) I've seen use
> formal notation which is... difficult to consume. As dmbarbour points
> out, Scala might be some useful exploration. Some of the type
> composition reminds me of haskell as well.

I think Haskell is certainly worth looking into. I've read much about
it, but I haven't had the time to properly study it. And it feels like
it would take quite some time.

> > Enums. In C++, enum values are just integer constants. In Java, they
> > are bound to a class. But I believe they should just be symbols. New
> > values. Not equal to anything other than themselves and not by default
> > cast-able to any other type.
>
> That Java binds them to a class is an implementation detail.
> Realistically, it's the same sort of behavior.

The difference is that in my suggestion, a value may be part of many
enums. I believe in Java, an enum value is limited to its own class.
(Am I mistaken?)

val SugarCube, Milk, Ghost, Cheese;

type white_things <- {SugarCube, Milk, Ghost};
type food_items <- {SugarCube, Milk, Cheese};

> > A type is an invariant. So why not allow a type to be defined (partly)
> > by an invariant condition? Example (tentative syntax):
>
> > type even_int <- int:(me mod 2 = 0); | the type of all even integers
>
> Because you quickly run into sub-typing issues. Is
> int_divisible_by_four assignable to even_int?

Absolutely.

> It should be, but that
> sort of thing is difficult to prove programatically, especially for
> (near) infinite types like strings. Further, assigning a plain old non-
> constant int will require casting or coercion of some sort which adds
> to programmer headaches.

Please read my reply to dmbarbour, which clarifies my thoughts on this
issue.

> > A type is a set of values. So why not allow unions, intersections,
> > differences and subsets of types? Examples:
>
> > val inf; | infinity value
> > type inf_int <- int + {inf}; | integer type with infinity
>
> > val null; | null value
> > type nullable?(T) <- T + {null}; | nullable type T (idempotent)
>
> > type IntOrString <- int + string;
>
> Some languages like haskell kind of allow this (http://www.haskell.org/
> all_about_monads/html/maybemonad.html), though the mechanics of it are
> not exactly what you're thinking of.

It does look similar. If I'm not mistaken, the Haskel Maybe monad is
not idempotent, like my nullable example. There is such a thing as a
Maybe Maybe Maybe Integer. If I'm not mistaken you can retrieve the
integer using Just Just Just value. This is a difference Amos Robinson
noticed in his reply. In Mist you could accomplish the same using a
pointer. Or perhaps a Maybe class with the same sort of functionality.

> That said, my recent toy project (http://www.assembla.com/wiki/show/
> tangent-lang) has some type operations similar to this under the hood,
> and formal texts like Types and Programming Languages cover this sort
> of thing a bit more completely.

Ehud Lamm also suggested that book. I guess it's certainly worth
looking into.

> > Also, can 5 not be part of an infinite number of types (see (2), (3))?
>
> Sure, but practically you can only make static guarantees about a
> variable based on the declared type. This fits into (2) and some of
> the casting/coercion you'd need to do when dealing with the value 5
> when it is a non-constant. And then as soon as you do any operations
> to the value, the certainty for which you can make static type
> guarantees diminishes quickly.

Very true. I think my suggestions involve a lot of dynamic type
checking. But that's basically the point of Mist. To be as flexible/
mathematical as possible and trust that eventually compilers become
smart enough to bring as many of these checks as possible into compile
time.

Also, Mist will support programmer provided redundant annotations to
help the compiler generate proofs. Loop invariants, for example.

> *shrug* I don't really know too much about these things and am just
> offering my first instinct based on the little practical work I've
> done. I *do* recommend taking a tour of some things outside of the C++
> world. That doesn't take too much knowledge or experience to realize...

The point was that I needed someone to point me in the right
direction. :-)

Thanks for your reply!

R. Matthew Schellhas

unread,

Mar 14, 2010, 9:08:19 AM3/14/10

to mist-discuss

On Mar 14, 6:11 am, Michiel <mhelv...@gmail.com> wrote:
> The difference is that in my suggestion, a value may be part of many
> enums. I believe in Java, an enum value is limited to its own class.
> (Am I mistaken?)
>

No, but that is a side effect of Java not having multiple inheritance
(and probably a design decision too).

> val SugarCube, Milk, Ghost, Cheese;
>
> type white_things <- {SugarCube, Milk, Ghost};
> type food_items <- {SugarCube, Milk, Cheese};

And what happens when you overload a method to take either?

foo( white_things arg ) -> void {...}
foo( food_items arg ) -> void {...}

foo( Milk );

Worse yet, what happens if your language supports dynamic dispatch and
the value isn't a constant?

I mean it's one thing to say you want a feature, and another to
implement it. The whole 'we'll just check it at runtime' seems like a
bit of a cop-out. If you're going to have static typing, there should
be benefits there.

> > Because you quickly run into sub-typing issues. Is
> > int_divisible_by_four assignable to even_int?
>
> Absolutely.
>

And how do you propose doing the type checking for that?

Michiel

unread,

Mar 14, 2010, 9:16:06 AM3/14/10

to mist-discuss

On Mar 14, 2:08 pm, "R. Matthew Schellhas" <rms.tang...@gmail.com>
wrote:

> > val SugarCube, Milk, Ghost, Cheese;

>
> > type white_things <- {SugarCube, Milk, Ghost};
> > type food_items <- {SugarCube, Milk, Cheese};
>
> And what happens when you overload a method to take either?
>
> foo( white_things arg ) -> void {...}
> foo( food_items arg ) -> void {...}
>
> foo( Milk );

Please take a look at my reply to dmbarbour. To summarize, if
something like that happens, the overlapping case should be
behaviorally equivalent in both overloads. That sort of thing is
extremely problematic to prove at compile-time, but not impossible in
all cases.

And when it is impossible, yes, it is possible to check it at runtime.
Just copy the state, run both functions, see if the result is the
same.

Even so, I would suggest the programmer avoid such situations.

> Worse yet, what happens if your language supports dynamic dispatch and
> the value isn't a constant?

I don't understand this question. Dynamic dispatch means that the
value doesn't have to be a constant. By definition, right?

> I mean it's one thing to say you want a feature, and another to
> implement it. The whole 'we'll just check it at runtime' seems like a
> bit of a cop-out. If you're going to have static typing, there should
> be benefits there.

The philosophy here is that I want to be as flexible as possible,
check the easy things at compile time and check the harder things at
runtime. I don't think it's a cop-out if it works.

> > Absolutely.
>
> And how do you propose doing the type checking for that?

Again, my reply to dmbarbour should clarify this issue.

R. Matthew Schellhas

unread,

Mar 14, 2010, 1:45:35 PM3/14/10

to mist-discuss

On Mar 14, 8:16 am, Michiel <mhelv...@gmail.com> wrote:
> On Mar 14, 2:08 pm, "R. Matthew Schellhas" <rms.tang...@gmail.com>
> wrote:
>
> > > val SugarCube, Milk, Ghost, Cheese;
>
> > > type white_things <- {SugarCube, Milk, Ghost};
> > > type food_items <- {SugarCube, Milk, Cheese};
>
> > And what happens when you overload a method to take either?
>
> > foo( white_things arg ) -> void {...}
> > foo( food_items arg ) -> void {...}
>
> > foo( Milk );
>
> Please take a look at my reply to dmbarbour.

I did, and didn't really get what you're looking to do in a general
sense. And I really didn't get any grasp of how you're going to
practically solve some of the implementation problems there-in.

> To summarize, if
> something like that happens, the overlapping case should be
> behaviorally equivalent in both overloads. That sort of thing is
> extremely problematic to prove at compile-time, but not impossible in
> all cases.

Effectively impossible for the general case... And even if it
weren't, (natural) language ambiguity will make the behavioral
equivalence requirement a bit restrictive. Words have different
behaviors based on what context their in; or in our case, what
parameters are sent to them. As soon as you allow multiple inheritance
(or similar behavior with values belonging in multiple types), you run
into ambiguity issues with function overloading.

>
> And when it is impossible, yes, it is possible to check it at runtime.
> Just copy the state, run both functions, see if the result is the
> same.
>

Which is neigh impossible to programatically determine if functions
aren't pure.

>
> > Worse yet, what happens if your language supports dynamic dispatch and
> > the value isn't a constant?
>
> I don't understand this question. Dynamic dispatch means that the
> value doesn't have to be a constant. By definition, right?

This goes back to ambiguity that comes from values being valid for
multiple types.

If you have a function overloaded to take types A and B, what happens
when A union B is passed in? If you allow type operations, A union B
always can exist, and can be assigned into a variable where you expect
A or B. When/if you go to do runtime dispatch on the variables, what
is the 'right thing'?

Not saying it's an intractable problem, but it is a design decision
that arises from this sort of design decision. And any choice made
will curtail some of your options in other parts of the language.

>
> > I mean it's one thing to say you want a feature, and another to
> > implement it. The whole 'we'll just check it at runtime' seems like a
> > bit of a cop-out. If you're going to have static typing, there should
> > be benefits there.
>
> The philosophy here is that I want to be as flexible as possible,
> check the easy things at compile time and check the harder things at
> runtime. I don't think it's a cop-out if it works.
>

Quite true. My instinct though is to worry about the benefits of a
type system if so much is being pushed off to runtime. And I worry
about the actual implementation difficulty (and practical usability)
of such things. Problems don't really become easier at runtime.

But who knows? I tend to be a bit pessimistic about things.

Michiel

unread,

Mar 14, 2010, 2:39:46 PM3/14/10

to mist-discuss

On Mar 14, 6:45 pm, "R. Matthew Schellhas" <rms.tang...@gmail.com>
wrote:

> > Please take a look at my reply to dmbarbour.

>
> I did, and didn't really get what you're looking to do in a general
> sense. And I really didn't get any grasp of how you're going to
> practically solve some of the implementation problems there-in.

I'll try to explain more clearly.

In general, if two (or more) functions with the same name are able to
take the same value as parameter, it shouldn't matter which of these
functions is actually used. The program is legal if each choice yields
the same result, both in return value and in side-effects.

It is always possible to check this at runtime, though in general it
would be insane to do so for production code. You can copy the entire
program state, try out each option, then compare each result. There
would be an implicit assertion that all the results are equal.

Given a sufficiently smart compiler and/or enough redundant
annotation, these sorts of things can be proved statically (and
mechanically) as well. For all possible parameters. Then the compiler
(or dispatcher) would be free to choose nondeterministically between
the implementations.

Ok, that is the easy way explain it. But it's not entirely correct. In
reality Mist functions don't have to be deterministic in their
behavior. They may be defined by a contract (see Eiffel, verified C,
Spec#, JML). So in actuality the implicit assertion would be that each
result satisfy each postcondition.

> > And when it is impossible, yes, it is possible to check it at runtime.
> > Just copy the state, run both functions, see if the result is the
> > same.
>
> Which is neigh impossible to programatically determine if functions
> aren't pure.

Sure, it's computationally complex. But why would it be impossible?

> If you have a function overloaded to take types A and B, what happens
> when A union B is passed in? If you allow type operations, A union B
> always can exist, and can be assigned into a variable where you expect
> A or B.

All true.

> When/if you go to do runtime dispatch on the variables, what
> is the 'right thing'?

If there is only one function that can take the value (which is known
at runtime), use that function. If there are multiple: that's where my
story above fits in. In short: choose arbitrarily, provided that the
choices are semantically equivalent.

> Not saying it's an intractable problem, but it is a design decision
> that arises from this sort of design decision. And any choice made
> will curtail some of your options in other parts of the language.

I'd be very interested to know which options I'm curtailing with this
design decision. Any thoughts on that?

> > The philosophy here is that I want to be as flexible as possible,
> > check the easy things at compile time and check the harder things at
> > runtime. I don't think it's a cop-out if it works.
>
> Quite true. My instinct though is to worry about the benefits of a
> type system if so much is being pushed off to runtime. And I worry
> about the actual implementation difficulty (and practical usability)
> of such things. Problems don't really become easier at runtime.

You're right. But probably everything that can be checked in
statically typed languages can still be statically checked in Mist.
That is, if you don't use any of the 'advanced' features from the TLP.
And just maybe, the Mist compiler will be able to statically check
some of those as well, in the future. That's what the project is
about.

Note that the compiler will always issue a warning if there are still
things to check at runtime, so the programmer is at least aware of the
potential problem.

R. Matthew Schellhas

unread,

Mar 14, 2010, 5:45:09 PM3/14/10

to mist-discuss

> I'll try to explain more clearly.
>

Thanks!

> In general, if two (or more) functions with the same name are able to
> take the same value as parameter, it shouldn't matter which of these
> functions is actually used. The program is legal if each choice yields
> the same result, both in return value and in side-effects.

And if the program isn't legal?

Another question then. If values are essentially symbols in Mist, what
happens when you run into homonyms? They're not really the same value
(or should be treated the same at least), but have the same name.

> Sure, it's computationally complex. But why would it be impossible?

Prohibitively computationally complex then. Also, to be fairly
pedantic, any side effect that leaves your system (eg network traffic)
cannot be fully verified.

>
> If there is only one function that can take the value (which is known
> at runtime), use that function. If there are multiple: that's where my
> story above fits in. In short: choose arbitrarily, provided that the
> choices are semantically equivalent.
>

Fair enough. I'm just not convinced that method overloads will always
(or even often) be semantically equivalent (or be provably so).

> > Not saying it's an intractable problem, but it is a design decision
> > that arises from this sort of design decision. And any choice made
> > will curtail some of your options in other parts of the language.
>
> I'd be very interested to know which options I'm curtailing with this
> design decision. Any thoughts on that?
>

Naively, it seems as though this would require concessions to make
functions relatively provable or constrain overloading or require a
bit of programmer work to make sufficient annotations. There also
might be optimization or static checking limitations because things
may or may not belong to multiple types. And it'll be more
computationally expensive to make sub-classing determinations.

> > Quite true. My instinct though is to worry about the benefits of a
> > type system if so much is being pushed off to runtime. And I worry
> > about the actual implementation difficulty (and practical usability)
> > of such things. Problems don't really become easier at runtime.
>
> You're right. But probably everything that can be checked in
> statically typed languages can still be statically checked in Mist.
> That is, if you don't use any of the 'advanced' features from the TLP.
> And just maybe, the Mist compiler will be able to statically check
> some of those as well, in the future. That's what the project is
> about.
>
> Note that the compiler will always issue a warning if there are still
> things to check at runtime, so the programmer is at least aware of the
> potential problem.

Sure, that makes sense. Tangent does similar thing for the ambiguous
case of A union B into an overloaded method. A low priority warning if
a method overloaded with A and B doesn't also specify a A union B
overload (and runtime error if invoked that way). If I can see
statically that A union B is going into the method it gets an error at
compile time. That is rather why I brought that particular case up.

dmbarbour

unread,

Mar 14, 2010, 6:44:15 PM3/14/10

to mist-discuss

On Mar 14, 11:39 am, Michiel <mhelv...@gmail.com> wrote:
> It is always possible to check this at runtime, though in general it
> would be insane to do so for production code. You can copy the entire
> program state, try out each option, then compare each result

“Four things come not back: the spoken word, the sped arrow, the past
life, and the neglected opportunity.” - proverb

This applies even in computation. If a procedure has effects
(communication, actuation) or consumes conserved resources (time
slots, space, signals), you cannot reverse these and 'try again'.

Distributed transactions are probably the closest thing you can get to
multi-process 'take-backs', and I include them in my language's
design. But those things that 'come not back' are inherently non-
transactional, must be protected by transaction-barriers (i.e. they
become post-commit behaviors). And, in general, you need a lot of
cooperation from external processes to make distributed transactions
work... i.e. to include distributed transactions, your language will
need to influence the environment - the communications protocols, the
browsers, the operating systems - rather than just conforming to it.

>
> Given a sufficiently smart compiler

You shouldn't rely upon a sufficiently smart compiler any more than
you should rely upon your tooth fairy.

A language designer MUST pay careful attention to implementability and
ensure that ALL desired engineering characteristics (such as
performance, parallelism, garbage-collection, safety, security,
packaging, code-distribution, packaging, persistence, runtime upgrade,
runtime extension, robustness, disruption tolerance, graceful
degradation, resilience and self-healing, load-balancing, real-time,
embedded-space, correctness & error analysis, domain expressivity,
etc.) can be achieved - practically, without compromise. For a
designer, compromise is equivalent to giving up. (Compromise is
opportunity for innovative solution - basis of TRIZ.) You should study
this list, prioritize it, decide just where 'correctness analysis'
lies and how you're going to achieve it and the other properties you
want... all without relying upon a sufficiently smart compiler.

>
> I'd be very interested to know which options I'm curtailing with this
> design decision. Any thoughts on that?

From the above list, I would suggest that a decision to support
polymorphic dispatch over a set of functions will generally limit
modularity (whether two modules are compatible will be difficult to
determine), will hurt any sort of runtime code-distribution (upgrade,
extension, plug-in), and hinders security (which requires controlling
and grokking distribution of authority).

>
> Note that the compiler will always issue a warning if there are still
> things to check at runtime, so the programmer is at least aware of the
> potential problem.

If you swamp developers with "warnings", you should know what they're
going to do to make those warnings go away. Developers won't do the
right thing. They'll do the easy thing. They can always fix it
properly later, right? but that's hubris speaking...

As a language designer who is developing a language for humans, you
should be aware of at least basic tendencies and limitations of human
psychology. Laziness, impatience, hubris (see http://c2.com/cgi/wiki?LazinessImpatienceHubris)
are part of human nature. You should avoid issuing a warning or error
unless the easiest way to fix the error will also be the most correct
way of fixing the error. Language design options that contradict this
basic 'law' of human nature should be consider dubious at best, and
"evil" at worst. Java checked exceptions, for example, are oft called
dubious or evil... developers quickly learn catch and drop exceptions
to the floor - it gets their code running. The hacks will then make it
into production code, and meanwhile the developers are learning bad
habits through repetition.

Michiel

unread,

Mar 15, 2010, 3:10:41 AM3/15/10

to mist-discuss

On Mar 14, 10:45 pm, "R. Matthew Schellhas" <rms.tang...@gmail.com>
wrote:

> > In general, if two (or more) functions with the same name are able to

> > take the same value as parameter, it shouldn't matter which of these
> > functions is actually used. The program is legal if each choice yields
> > the same result, both in return value and in side-effects.
>
> And if the program isn't legal?

Then you will get an error. Either at compile time ("there is a
parameter for which the function choice matters") or at runtime ("the
function choice matters for this parameter"). As much as possible at
compile time of course. With a warning if that's not possible.

> Another question then. If values are essentially symbols in Mist, what
> happens when you run into homonyms? They're not really the same value
> (or should be treated the same at least), but have the same name.

I don't think such things exist in Mist. But maybe I misunderstand.
Can you give me an example?

> > Sure, it's computationally complex. But why would it be impossible?
>
> Prohibitively computationally complex then.

In production code, certainly. But in a debug phase of development,
it's not so prohibitive.

> Also, to be fairly
> pedantic, any side effect that leaves your system (eg network traffic)
> cannot be fully verified.

Communication between the Mist process and the file system / network /
operating system cannot be verified by the compiler anyway. To put it
differently, you can specify and verify that the program will send
certain information down some channel, but you can make no guarantees
about the response. You as a programmer may know that your Mist
program is sending "1 + 1" to a third party calculator program and
expect to receive "2" in return, but verification of this is outside
the Mist compilers control. Such a thing would require higher level
verification with formal specifications of all processes involved.

That said, specifications about information leaving the system can
exist in Mist (e.g. postcondition: "1 + 1" sent to stdout). At runtime
we can use buffers to store that information, then compare, and only
then let it leave the system.

> > If there is only one function that can take the value (which is known
> > at runtime), use that function. If there are multiple: that's where my
> > story above fits in. In short: choose arbitrarily, provided that the
> > choices are semantically equivalent.
>
> Fair enough. I'm just not convinced that method overloads will always
> (or even often) be semantically equivalent (or be provably so).

This is up to the designer of those functions. In truth, I believe
overlapping function overloads can and should often be avoided. But I
can imagine situations in which it's the most elegant solution.

> > I'd be very interested to know which options I'm curtailing with this
> > design decision. Any thoughts on that?
>
> Naively, it seems as though this would require concessions to make
> functions relatively provable or constrain overloading or require a
> bit of programmer work to make sufficient annotations.

That's certainly true. If you want static safety or correctness
guarantees, annotations are necessary. Do note that if you do not
provide any such annotations -- and in the worst case the compiler
cannot prove correctness anywhere -- you're still no worse off than
with any other programming language (compiler), which do not even
attempt such static proofs.

> There also
> might be optimization or static checking limitations because things
> may or may not belong to multiple types. And it'll be more
> computationally expensive to make sub-classing determinations.

Only in those cases which are prohibited in mainstream languages. The
features they have in common also allow the same level of optimization
and static checking. Those strange situations allowed by Mist may be
inherently complex. But rather than prohibit them, I would rather give
the programmer the choice to use or avoid them.

Michiel

unread,

Mar 15, 2010, 3:47:46 AM3/15/10

to mist-discuss

On Mar 14, 11:44 pm, dmbarbour <dmbarb...@gmail.com> wrote:

> > It is always possible to check this at runtime, though in general it
> > would be insane to do so for production code. You can copy the entire
> > program state, try out each option, then compare each result
>

> This applies even in computation. If a procedure has effects
> (communication, actuation) or consumes conserved resources (time
> slots, space, signals), you cannot reverse these and 'try again'.

As I replied to R. Matthew Schellhas, Mist cannot -- nor does it
pretend it can -- verify anything about communication with other
processes or the file system or the operating system anyway. That

said, specifications about information leaving the system can exist in

Mist (e.g. postcondition: "1 + 1" sent to stdout). In order to do our
runtime checks we can use buffers to store that information (as part
of the state that we copy), then compare, and only if the assertion is
satisfied let the information leave the system.

> > Given a sufficiently smart compiler
>
> You shouldn't rely upon a sufficiently smart compiler any more than
> you should rely upon your tooth fairy.

I know that the "sufficiently smart compiler" argument has often been
misused.

http://c2.com/cgi/wiki?SufficientlySmartCompiler

Perhaps this is so in my case as well. But I feel justified in using
it, because the latest version of the Mist compiler is already able to
statically verify consistency between a procedure and its contract.
This includes loops and recursion. And there are already tools that
can do even more for other languages (e.g. verified C, Spec#, JML).
This particular "sufficiently smart compiler" is not a dream so far
off.

Besides, "sufficiently smart" in this case refers also to the
compilers ability to use redundant annotations provided by the
programmer (assertions, but more advanced). In theory, any property
can be proved by a computer program as long as the programmer gives
enough hints.

> > Note that the compiler will always issue a warning if there are still
> > things to check at runtime, so the programmer is at least aware of the
> > potential problem.
>
> If you swamp developers with "warnings", you should know what they're
> going to do to make those warnings go away. Developers won't do the
> right thing. They'll do the easy thing. They can always fix it
> properly later, right? but that's hubris speaking...

There is no way to turn the warnings off without fixing the bug, if
there is a bug, or by providing more or better annotation if there was
not. Basically, you cannot shut the compiler up, except by doing the
right thing.

dmbarbour

unread,

Mar 15, 2010, 9:22:51 AM3/15/10

to mist-discuss

On Mar 15, 12:47 am, Michiel <mhelv...@gmail.com> wrote:
> As I replied to R. Matthew Schellhas, Mist cannot -- nor does it
> pretend it can -- verify anything about communication with other
> processes or the file system or the operating system anyway.

Your explanation doesn't seem relevant. Verifying that the behavior of
these two functions is the same is an entirely separate issue from
verifying the communications with the external system.

You suggest that Mist will confirm that two procedures, such as
'foo(white_things)' and 'foo(edible_items)' will have the same
behavior when applied to Milk or SugarCube. If that behavior involves
IO, that doesn't mean you need to verify the IO; instead, it means you
must verify that outputs and reactions to inputs will be common
between the foo procedures.

> In order to do our runtime checks we can use buffers to store that
> information (as part of the state that we copy), then compare, and
> only if the assertion is satisfied let the information leave the
> system.

That strategy may work in a rather specialized subset of cases
(involving output only), but I do not see how it will work if the
behavior involves reacting to an input after an output.

R. Matthew Schellhas

unread,

Mar 15, 2010, 9:29:33 AM3/15/10

to mist-discuss

>
> > Another question then. If values are essentially symbols in Mist, what
> > happens when you run into homonyms? They're not really the same value
> > (or should be treated the same at least), but have the same name.
>
> I don't think such things exist in Mist. But maybe I misunderstand.
> Can you give me an example?
>

Umm, completely arbitrary example. You are writing a game and have an
enum of weapon types which includes Bow, and have a enum to say what
quadrant of a ship an enemy is at with Bow (the front of the ship).
From what you've explained, I'd think these the same symbol that
exists in two distinct types, but has completely different conceptual
meaning.

> > > Sure, it's computationally complex. But why would it be impossible?
>
> > Prohibitively computationally complex then.
>
> In production code, certainly. But in a debug phase of development,
> it's not so prohibitive.
>

No, no, I mean it won't finish for days prohibitive. I wouldn't be
surprised if the entire problem boils down to the Halting Problem.

>
> Only in those cases which are prohibited in mainstream languages. The
> features they have in common also allow the same level of optimization
> and static checking. Those strange situations allowed by Mist may be
> inherently complex. But rather than prohibit them, I would rather give
> the programmer the choice to use or avoid them.

I've not done a ton of code generation, but this seems... not correct.
I can easily see where normal looking C++-ish code can't be generated/
optimized in the normal fashion because it's next to something exotic
which breaks a guarantee/assumption that the optimizer would take
advantage of. Still, that's for something perhaps down the road.

Scott

unread,

Mar 15, 2010, 10:24:27 AM3/15/10

to mist-discuss

I highly recommend taking a look at the type system for the "typed
scheme" dialect in the PLT Scheme system. It doesn't deal with all of
the flexibility issues you describe, but it does seem to deal with a
subset of these issues in an elegant manner, such as untagged union
types, enum types as unions of symbols, etc.

There is good documentation on typed scheme in the (multi-platform,
easily installed) system as well as some nice papers describing the
type system. I've found that the designer(s?) seem to monitor the
#scheme IRC channel and have responded to inquiries in a very friendly
and helpful manner. Best of luck!

Michiel

unread,

Mar 15, 2010, 12:44:38 PM3/15/10

to mist-discuss

On Mar 15, 2:29 pm, "R. Matthew Schellhas" <rms.tang...@gmail.com>
wrote:

> Umm, completely arbitrary example. You are writing a game and have an

> enum of weapon types which includes Bow, and have a enum to say what
> quadrant of a ship an enemy is at with Bow (the front of the ship).
> From what you've explained, I'd think these the same symbol that
> exists in two distinct types, but has completely different conceptual
> meaning.

Ah, that's what you're aiming for. :-)

I suggest that programmers give each symbol a unique meaning. If both
the weapons library and the ships library declare a global value Bow,
the compiler will report an error. To fix this, the programmer should
be able to put them each in their own C++ish namespace. Weapon.Bow,
Ship.Bow.

> No, no, I mean it won't finish for days prohibitive. I wouldn't be
> surprised if the entire problem boils down to the Halting Problem.

I don't think the halting problem is an issue here. Or non-termination
for that matter. If two implementations are able to terminate given a
state satisfying their precondition, then so will the parallel or
sequential execution of both. If one of them does not terminate, then
problem is in that implementation, not in the runtime error checking
mechanism.

I can see the following issues, though. Perhaps this is what you
meant:

* Copying the entire state may take very long, depending on how large
it is. As a compromise, we can leave out the stuff we know cannot be
referenced by the function.

* If an overloaded function is being called recursively, over and over
again, with a parameter that makes the call ambiguous, there will be
an explosion of copied states. I can see this being a "won't finish
for days" scenario.

Those are certainly valid objections.

Michiel

unread,

Mar 15, 2010, 12:49:59 PM3/15/10

to mist-discuss

On Mar 15, 2:22 pm, dmbarbour <dmbarb...@gmail.com> wrote:

> > In order to do our runtime checks we can use buffers to store that
> > information (as part of the state that we copy), then compare, and
> > only if the assertion is satisfied let the information leave the
> > system.
>
> That strategy may work in a rather specialized subset of cases
> (involving output only), but I do not see how it will work if the
> behavior involves reacting to an input after an output.

Heh, brilliant. I didn't think that far ahead. That's exactly why I
wanted to brainstorm with you guys.

However, the problem does not yet seem unsolvable. I imagine both
implementations we are checking for equality will run in parallel and
then halt at each communication to the outside. Case distinction:

* One wants to output, the other wants input. Or, more generally, both
want to do a different kind of communication. We bail out, the two are
not behaviorally equal.
* Both want to output to the same channel. We compare the outputs. If
equal, we let each thread proceed. Otherwise, bail out.
* Both want input from the same channel. We ask the input only once,
feed it to both threads, and let them proceed.

What do you think?

dmbarbour

unread,

Mar 15, 2010, 9:01:48 PM3/15/10

to mist-discuss

On Mar 15, 9:49 am, Michiel <mhelv...@gmail.com> wrote:
> * One wants to output, the other wants input. Or, more generally, both
> want to do a different kind of communication. We bail out, the two are
> not behaviorally equal.
> * Both want to output to the same channel. We compare the outputs. If
> equal, we let each thread proceed. Otherwise, bail out.
> * Both want input from the same channel. We ask the input only once,
> feed it to both threads, and let them proceed.
>
> What do you think?

That looks like it might work, though 'comparing the outputs' might be
hiding more than a little complexity (esp. if those outputs can
include functions such as 'foo').

The whole design seems to violate modularity principles. A function
named 'foo' that works and passes all unit-tests might stop working
and bailing at seemingly arbitrary points for different inputs the
moment another module comes along also possessing a function named
'foo'.

Michiel

unread,

Mar 16, 2010, 12:52:09 AM3/16/10

to mist-discuss

On Mar 16, 2:01 am, dmbarbour <dmbarb...@gmail.com> wrote:

> That looks like it might work, though 'comparing the outputs' might be
> hiding more than a little complexity (esp. if those outputs can
> include functions such as 'foo').

We were talking about runtime checking here, right? Comparing the
outputs is string comparison.

If you're talking about recursion (so there might not be any outputs,
or they will take a long time to calculate), that's basically a
problem of the original foo. Though I admit the explosive branching
because of this runtime comparison does not help matters.

> The whole design seems to violate modularity principles. A function
> named 'foo' that works and passes all unit-tests might stop working
> and bailing at seemingly arbitrary points for different inputs the
> moment another module comes along also possessing a function named
> 'foo'.

Not if the new 'foo' is functionally equivalent for overlapping
inputs. If it's not, there will be warnings after compile-time (so the
problem does not creep up silently). I suppose we could also just give
an error and disallow the entire situation. But what fun would that be?

shap

unread,

Mar 16, 2010, 1:50:07 AM3/16/10

to mist-discuss

Michiel: welcome to the joys of type systems. I was in your position
about six years ago, and we're still not done with BitC. Some general
reactions and thoughts...

First, one of the keys to designing a programming language is to
understand what problems you *aren't* trying to solve. It's more
important to have a coherent set of choices than it is to solve all
problems for all potential users. Type systems are a feature where you
can build a lot of help into a language, but also a subject where
there are many tempting cliffs to go over. In consequence, the feature
set of the type system needs to be balanced against the demands of
usability. As an example, you wrote:

> One of the basic goals of Mist has always been ... to be as

> flexible/elegant as possible even if it takes some extra runtime checking,

Do you mean that? What you are (perhaps unintentionally) saying is
that you prefer for more programs to fail dynamically as a consequence
of stronger typing. Users will find this a real pain -- especially so
when the additional checks are inserted by the compiler. The good news
about compiler-inserted checks is that you (the language designer)
know they happen. The bad news is that the programmer doesn't. Before
having a compiler magically insert them, consider having the compiler
instead check that the programmer has inserted them.

> ... and replace that with compile-time checking as the compiler become smarter

Be careful here as well. You'll discover as you dig into this a bit
that the checks which can be done statically without a full prover
infrastructure are clear-cut, and you might as well do them eagerly.
The problem with checks that a prover can eliminate is that it is very
hard to explain to the programmer when/where they will get eliminated.
A seemingly innocuous change at one end of a program can change the
elimination outcome at the other end.

This brings me back to my point that dynamic checks should happen in
programmer-predictable places.

> Enums. In C++, enum values are just integer constants. In Java, they
> are bound to a class. But I believe they should just be symbols. New
> values. Not equal to anything other than themselves and not by default

> cast-able to any other type...

There are valid use-cases for both types of enumerations. Ultimately,
enum symbols must be assigned values for interop reasons. Nothing
wrong with your idea, but it's not a critical thing one way or the
other. And if you're going to go down this path, how are enums
different from field-less unions?

> But for enums, perhaps some sort of set notation:
>
> type CardinalDirection <- {North, East, South, West};

Clean notation is important, but clean concepts come first. What is
the advantage to the programmer deriving from your proposal?

> A type is an invariant. So why not allow a type to be defined (partly)
> by an invariant condition? Example (tentative syntax):
>
> type even_int <- int:(me mod 2 = 0); | the type of all even integers

In most languages, types can depend on other types: List<int>. This
can be extended to types that depend on literals by "lifting" the
literals into corresponding unit types without extending the type
system much. We looked at that in BitC and ultimately concluded that
it was more trouble than it was worth.

What you are proposing here is something called a "dependent type": a
type that depends on a value. On the one hand, this is *extremely*
powerful. On the other hand, all current realizations of it are
*extremely* cumbersome -- well beyond the point of unusability.
Basically, the feature is too powerful to be exposed in general form.
Part of the problem is that the PL community has been playing with
dependent types as a means of proof embedding, and that tends to
generate horrible examples from a usability perspective. But at a more
practical level, take a look at some of the work on automatically
discharging range checks. The point is that seemingly simple features
carry a surprisingly large dynamic penalty. Enough so that the cost of
the feature may well exceed its value. The problem is that the
programmer won't have a good model of which checks have high value and
which checks have high cost.

> --- (3) ---
>
> A type is a set of values. So why not allow unions, intersections,
> differences and subsets of types?

In most type theories, it's very helpful if leaf types form *disjoint*
sets. Unions tend to make a mess of static typing, because they tend
to eliminate the ability to derive a most general typing.
Intersections are empty, and differences are trivial (none of these
statements is necessarily true in dependent type systems). If you want
something like this, you may want to look at constraints as a way to
get an orthogonal slice across types without making the type analysis
intractible. Constraints are also a

Nearly all of the more recent languages provide subsets in the form of
subtypes. Subtypes introduce a variety of complications into the type
system whose concrete value to the programmer is un-proven, largely
because it has tended to be conflated with subclassing in popular
languages.

> Variables have types. That's useful. The type determines which values
> a var is allowed to take. But in C++ and many other languages, a
> single value has a type as well.
>
> Integer literals can be suffixed (u, l, ll, etc.) in order to give the
> literal the right type. But is 5 not the same as 5u, 5l and 5ull? Is 5
> not also the same as 5.0?

In C++, 5 is definitely not the same as 5u, 5l, and 5ull. The un-
annotated "5" has a language-specified type of "int". The confusion
about whether 5 is the same as the others arises becuase of C/C++'s
transparent integer promotion rules.

> Also, can 5 not be part of an infinite number of types (see (2), (3))?
> Can't `ClassName()' or `new ClassName'? The same goes for any value,
> really.

Yes, it can. In BitC, for example, we have an IntLit type class whose
effect is to give an integer literal a union type consisting of all of
its possible concrete types (if the type would be too small for the
literal, we drop it). This is actually a nuisance, because it has the
effect of causing some function types not to resolve concretely. For
example, the function:

(define (add-one x) 1)
add-one: (forall ((IntLit 'a) (fn 'a -> 'a))

has type alpha where alpha is some type defined under the IntLit
constraint. The tricky bit here is that in a surprising number of use
cases you can't decide from the program text which concrete integer
type to choose.

> Another example: is the tuple (1, 2, 3, 4) not the same as the list
> (1, 2, 3, 4)? Maybe it's just that int[] (int array or list) allows
> sequences of any length whereas (int, int, int, int) allows only lists
> of length 4, but the two have a non-empty intersection.

Tuple is definitely *not* the same as list, and doesn't want to be.
The two have different representations and (in a static type system)
different feasible typings.

Perhaps I might suggest that you take some time playing with Haskell,
and then with Coq. Haskell probably represents the state of the art
for type systems in a semi-practical language. Coq deals with the
dependent types part. I don't suggest that you will see either
language as the be-all and end-all, but you'll begin to get a sense of
what's behind some of the issues you are trying to sort through.

Michiel

unread,

Mar 16, 2010, 10:36:29 AM3/16/10

to mist-discuss

On Mar 16, 6:50 am, shap <s...@eros-os.org> wrote:

> you wrote:
>
> > One of the basic goals of Mist has always been ... to be as
> > flexible/elegant as possible even if it takes some extra runtime checking,
>
> Do you mean that? What you are (perhaps unintentionally) saying is
> that you prefer for more programs to fail dynamically as a consequence
> of stronger typing.

It must have been unintentional.

What I want to say is that I want to allow more things. Things that
are inherently complex. The choice is to either prohibit them (as many
programming languages do), or to allow them and be happy with dynamic
checking now and then.

It's not even a real trade-off. If the programmer refrained from using
the more exotic features (basically using a subset of the language),
type-checking could be strictly static. It's like the C++ motto: you
only pay for what you use.

> Users will find this a real pain -- especially so
> when the additional checks are inserted by the compiler.

If I stuck to features a great number of other languages already
support, what's the point of designing a new one. :-) Basically I'm
designing a language I would personally like to use. Greater
expressiveness allows me to say what I really mean instead of making
me dance around the concept. I would gladly pay for this with some
runtime checking, if static proof is not possible.

> The good news
> about compiler-inserted checks is that you (the language designer)
> know they happen. The bad news is that the programmer doesn't. Before
> having a compiler magically insert them, consider having the compiler
> instead check that the programmer has inserted them.

I guess it comes down to one of two choices here:

* When the programmer performs an operation, implicitly assert that
the operation is valid. If this assertion can be statically proved,
perfect. Otherwise, prepare a runtime check and _warn_ the programmer.
It would be part of a special list of warnings, listing exactly the
situations where a runtime check may potentially fail.

* If there is no static proof of validity, force the programmer to
explicitly state that the operation should be valid. Otherwise,
generate an error. This is what you seem to be suggesting, and it also
sounds like what Haskell monads can do (though I have no real
experience with them, so I may be wrong).

To me, it feels like the latter approach would generate a lot of noise
in the code; make it less readable and less writable. Though I do see
the advantages.

As a middle ground, the compiler could easily be instructed to bail
out with an error when something like this could not be proved
statically. This setting would encourage a programmer not to use the
more advanced type system features of Mist.

> > ... and replace that with compile-time checking as the compiler become smarter
>
> Be careful here as well. You'll discover as you dig into this a bit
> that the checks which can be done statically without a full prover
> infrastructure are clear-cut, and you might as well do them eagerly.

Hm. Not necessarily. Most programming languages would outright reject
an expression of a super-type being assigned to a variable of a sub-
type. But in Mist it's okay if you know for sure that the right-hand
side contains a value that's a member of the sub-type.

> The problem with checks that a prover can eliminate is that it is very
> hard to explain to the programmer when/where they will get eliminated.
> A seemingly innocuous change at one end of a program can change the
> elimination outcome at the other end.

That's certainly a valid concern.

> > Enums. In C++, enum values are just integer constants. In Java, they
> > are bound to a class. But I believe they should just be symbols. New
> > values. Not equal to anything other than themselves and not by default
> > cast-able to any other type...
>
> There are valid use-cases for both types of enumerations. Ultimately,
> enum symbols must be assigned values for interop reasons.

Interop? Do you mean they must have a runtime representation in
memory?

> Nothing
> wrong with your idea, but it's not a critical thing one way or the
> other. And if you're going to go down this path, how are enums
> different from field-less unions?

The point of the enum point was that I wanted self-defined values as
part of the language. inf(inity), null, North, South, East, West, etc.
Enums just seemed like a good point to start off.

> > A type is a set of values. So why not allow unions, intersections,
> > differences and subsets of types?
>
> In most type theories, it's very helpful if leaf types form *disjoint*
> sets. Unions tend to make a mess of static typing, because they tend
> to eliminate the ability to derive a most general typing.

I understand I'm not making things easy here. :-) It's basically more
of the same: let's try it anyway, because it would be cool. And
because it would fit in well with the plans for Mist.

> Nearly all of the more recent languages provide subsets in the form of
> subtypes. Subtypes introduce a variety of complications into the type
> system whose concrete value to the programmer is un-proven, largely
> because it has tended to be conflated with subclassing in popular
> languages.

I intend that subtypes in Mist be actual real subtypes. Behavioral
subtypes even. Classes may be used, but the sub-typing rules have to
be respected.

I suppose Mist is supposed to be as much an intellectual exercise as
it is supposed to be a practical programming language. Let's make
things as mathematically elegant as possible and see where we end up.
Who knows how practical these concepts might prove to be?

> > Variables have types. That's useful. The type determines which values
> > a var is allowed to take. But in C++ and many other languages, a
> > single value has a type as well.
>
> > Integer literals can be suffixed (u, l, ll, etc.) in order to give the
> > literal the right type. But is 5 not the same as 5u, 5l and 5ull? Is 5
> > not also the same as 5.0?
>
> In C++, 5 is definitely not the same as 5u, 5l, and 5ull. The un-
> annotated "5" has a language-specified type of "int". The confusion
> about whether 5 is the same as the others arises becuase of C/C++'s
> transparent integer promotion rules.

I'm aware of how C++ handles these values and types. I guess I was
telling this story in order to introduce the distinction between types
and values I intend to use for Mist.

> > Also, can 5 not be part of an infinite number of types (see (2), (3))?
> > Can't `ClassName()' or `new ClassName'? The same goes for any value,
> > really.
>
> Yes, it can. In BitC, for example, we have an IntLit type class whose
> effect is to give an integer literal a union type consisting of all of
> its possible concrete types (if the type would be too small for the
> literal, we drop it).

I guess this would be pointless in Mist, since every value would have
the type Universe.

I would prefer that values did not have types at all. It is types that
should have values. Is this really as unused a concept as it seems? In
existing programming languages, I mean?

> > Another example: is the tuple (1, 2, 3, 4) not the same as the list
> > (1, 2, 3, 4)? Maybe it's just that int[] (int array or list) allows
> > sequences of any length whereas (int, int, int, int) allows only lists
> > of length 4, but the two have a non-empty intersection.
>
> Tuple is definitely *not* the same as list, and doesn't want to be.
> The two have different representations and (in a static type system)
> different feasible typings.

I've been thinking a lot about this issue, and I'm still not quite
sure about it. Mathematically it would seem that the two concepts are
interchangeable. That is, a tuple containing elements of a common type
could as well be a statically sized list of that type.

Not unimportant is that the two could use the same notation:

(int, int, int, int) a <- (1, 2, 3, 4);
int[] b <- (1, 2, 3, 4);

> Perhaps I might suggest that you take some time playing with Haskell,
> and then with Coq. Haskell probably represents the state of the art
> for type systems in a semi-practical language. Coq deals with the
> dependent types part. I don't suggest that you will see either
> language as the be-all and end-all, but you'll begin to get a sense of
> what's behind some of the issues you are trying to sort through.

Absolutely.

Thanks for your reply!

Michiel

unread,

Mar 16, 2010, 3:50:29 PM3/16/10

to mist-discuss

It's on my reading list. :-) So much to read, so little time.

Thanks for your suggestion!

Reply all

Reply to author

Forward