Test

75 views
Skip to first unread message

john skaller

unread,
May 30, 2022, 7:09:10 AM5/30/22
to felix
new group due to forgetting password for old owner email
probably forget this one too :-)

John Skaller2

unread,
May 30, 2022, 8:08:33 PM5/30/22
to felix
These diagnostics:

Guess meta type of type function... TYPE -> TYPE
Flx_bind_type_index: fixpoint, meta type calculated by guess_meta_type!

and

Flx_bind_type: TYP_name x found in 'as' trail: fixpoint: metatype hack!
Flx_bind_type: Trail term depth 0
Flx_bind_type: we don't know the term so can't compute it's meta-type! BUG FIXME!

need attention. The two cases are different.

Bind type index is called when lookup in the *unbound* symbol table for a name finds
the definition of a type term with a particular index for example

vector[int]

The indices are bound (that’s int) and we discover (in this case) vector is a binding
to a C++ type (a primitive) with an index i, and we call bind_type_index to construct
a bound term, in this case:

BTYP_inst (`Nominal, i, ts, mt)

This says it’s a nominal type (not necessarily abstract but it is in this case),
with index i, schema indices ts, and kind mt. The problem is discovering
the meta-type or kind. It will *usually* be TYPE meaning, the term is an
ordinary GC type. However, if the definition found is a typedef, it may not be.
It could be a TYPESET. It might be a type function TYPE -> TYPE.

The routine “guess_meta_type” is used to have a guess at the kind.
It does this by looking at the unbound term structure, but that may not
be enough. If the guess is wrong all hell can break loose.

However the case of interest here is more difficult again: when we’re binding
a term in the definition of a typedef named “x” and we find the name “x” again
inside the definition. Note this can also occur *indirectly* meaning “x” refers
to “y” which refers to “z” which refers back to “x”.

In this case, we need to avoid an infinite expansion of typedefs, so we must
put a fixpoint term in place. So we can detect this, there is a general purpose
record of Ocaml type “recstop” passed around between ALL recursive binding
routines. In this case the term

rs.type_alias_fixlist

contains a list pairs (index,depth) containing every index previously called
on this descent into a term, and the depth in the term bind_type_index
encountered it. When seen again, we put a fixpoint with the depth difference.

The problem is, we need to specify the kind of a fixpoint. Felix has to know
the kind of every type term. At the moment, unbound kinds are bound very early,
which can be done because the compiler knows all of them (there are no user
defined kinds yet). A primitive type (binding to C++) has to be kind TYPE.
However there are types which are *subkinds* for example the type 5
is a UNITSUM. The specified kind of a term MUST be the smallest kind.
And there are terms which are not TYPEs at all, for example TYPESET
and of course type functions.

So the problem is to accurately compute the smallest kind and I don’t
know how to do that in this context.

The SECOND case is similar, only it deals with a different situation

1 + int * x as x

for example is a list of int. The “as” term is an explicit fixpoint operator.
It uses a name, “x” in unbound form, in the bound form, a fixpoint term ..
and again we need the meta-type or kind. This one uses

rs.as_fixlist

to track the recursion and is handled by bind_type, not bind_type_index.
I think this one is easier but I’m not sure. There’s an interesting case:

typedef any = any;

which really exists as written in the library and is the return type of
a procedure which does not actually return. An example is

exit: int -> any

which terminates your process. The encoding is currently

BTYP_fix (0, KIND_type)

In this case I don’t think the kind matters but I’m not sure.

Anyhow, the problem for me now is to figure out how to get rid
of those two diagnostics.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
May 30, 2022, 9:13:53 PM5/30/22
to felix
in general, calculating some property of a recursive term is done by a recursive
descent, in which

(a) property at the same level are merged
(b) recursion is ignored and simply backs the calculation out

For example consider the kind of

(1 + int * x) as x

The minimal kind of 1 is UNITSUM, and of int is TYPE, and of x
is DONTKNOW. The kind of a product is the LUB of the subterms,
where LUB means Least Upper Bound,
and same for a sum (note these are product and sum types NOT
products and sums OF types!).

So int * x has kind TYPE, and 1 + int * x has kind TYPE since

LUB(UNITSUM, TYPE) = TYPE

In this case, DONTKNOW should be kind VOID, meaning,
the smallest possible kind, so LUB of anything including it
effectively ignores it.

This rule is GENERAL, that is, it applies to ALL similar properties
of all kinds of terms (not just type terms in Felix). Now this suggests
a TWO PASS algorithm (or backpatching) since DONTKNOW is in fact
now known! So we have to fix it up when the whole term which the fixpoint
refers to is known.

This can also be done in ONE pass as follows: the first time we hit
the fixpoint, instead of don’t knowing, we actually unfold the term,
so we now have the whole term. The second time, we hit the recursion
again, but this time, we put DONTKNOW. Then we drop back up the
recursive descent calculating the kind, bottom up, until we reach the
unfold point, and delete the unfold, install a fixpoint, and now we have
the kind. The two pass algo seems easier but could be tricky (since you have
to “refind” the fixpoints).

Note this is the same problem as calculating the return type of a recursive
function. It is done by unification of all the terminal branches, ignoring
the recursive ones. If all the branches are recursive, you get no information,
except that the function is bound to fail at run time with an infinite recursion!
[In practice Felix has problems sometimes with the calculation and requires
the user to explicitly specify the type]

The interesting thing out of this analysis is that this is correct:

type any = any;
BTYP_fix (0, VOID)

where VOID is the smallest, empty, kind, meaning, there are no
type terms of that kind: it’s a subkind of all kinds. However type
any, in fact, is the BIGGEST type! Since all type satisfy the equation,
all types are subtypes of type any (which is why I called it “any”).



John Skaller
ska...@internode.on.net





John Skaller2

unread,
May 31, 2022, 9:41:58 AM5/31/22
to felix
Felix compiler currently has two kinds of typedef records in the bound symbol table and the binding
routine has a flag: these are nominal and structural type aliases. They are in fact identical.

The constructor used together with the current state of the flag was previously used to
decide whether a type alias name should be bound as written or replaced by its
definition.

The algorithm first binds names as written, then does typeof eliminatiion for each typedef
and replaces it with a structural typedef. Note, this is done first for the library, then separately
the program: the bound symbol table of the library is cached to avoid rebinding everything every time.

Note you cannot merely replace a typedef name with its definition because of recursion.

However I have changed the rules. Now all names should be bound as references and should
never be replaced by their definition during type binding: binding should just bind, i.e. it
should simply replace names with numerical indiciies of symbol table entries, removing
the need to any further lookup in subsequent phases.

Instead, replacement is done on the fly by the beta reduction routine.

However to bind an expression, we have to do overloading, which requires
reduction of the types to normal form, otherwise unification won’t work.
[Eg you cannot unify a type function application with a type, you have to actually
evaluate the type function application by beta-reduction]. Similarly, fixpoint
calculation has to be done with normal forms (it actually isn’t .. so sometimes
he fixpoint has the wrong depth and it has to be adjusted to account for the
depth changes reductions like beta-reduction cause).

The problem is, typedef can contain typeof(expr) terms. The way this is calculated
is fairly brain dead: the expression is bound and then the type is immediately available.
This is because in Felix all expression terms are a pair (x,t) where x is the expression
and t is its type. In many cases, the type could be calculated by recursive descent,
although in some it can’t be. Any which way, the associated type caches the
type so it is immediately available.

However, some expression binding depends on type information, often just for sanity
checks but sometimes the type is required to know how to proceed. For example
to add a field to a record type .. the type you’re adding the field to has to actually
BE a record type in normal form, an application of a type function which produces
a record type is no good. So the type associated with an expression has to be
reduced.

So after making sure all typedefs/typefuns are in the bound symbol table,
with typeofs unprocessed, we do a pass that eliminates them, before doing
any more complex operations .. but calculating typeofs is actually already
the most complex possible calculation. There’s no assurance they will
always work: the compiler cannot handle recursive typeofs. But it HAS
to handle typeofs becase this:

var x = 1;

is encoded as

var x : typeof(1) = 1;

I.e. variable type deduction is done by first binding the variable
type to the typeof its initialiser, which can be an (almost) arbitrary
expression. This way the use gets “typeof” for free.

Anyhow, I want to eliminate the stupid “two kinds of typedef” thing.
It’s nonsense. So is the flag. Threaded code is EVIL!

The new rules says replacing a typedef
reference with it’s definition is NOT allowed except in beta-reduction.
Another rule should say beta-reduction is always safe: you can apply
it to anything at any time without ill effects. At worstl it simply
fails to reduce anything.

The problem is, the compiler itself often introduces new type terms
which need reduction, possibly later .. for example “T in tset” is
replaced by a typematch which is used as a constraint in overload
resolution and has to be fully reduced at that time (to either TRUE or FALSE).
The compiler “ands” together muliple constraints (by constructing logical
and terms). It can either reduce the terms eagerly, or after the conunction
is produced (lazily) by this rule (which means at the moment “and”
is not shortcut since that demands lazy evaluation).

Anyhow, removing the nominal/stuctural distinction is proving non-trivial.
I changed the rules, but some elaboration of typedefs is still being done
in the type binding routine, but when I remove it, all hell breaks loose,
probably because some type terms are not beta-reduced when they should be,
however a more difficult one is fixpoint ending up in the wrong place.
Here’s an example:

Expected:

Xlib=list(expr ::= [(term "+" expr) | term], term ::= [(factor "*" term) | factor], factor ::= [(atom "^" factor) | atom], atom ::= [("(" expr ")") | "9”])

Actual:

Xlib=list((expr, [(term "+" expr) | term]), (term, [(factor "*" term) | factor]), (factor, [(atom "^" factor) | atom]), (atom, [("(" expr ")") | "9"]))

Notice the ::= has gone. Why? Look closer, the actual output has one extra level or parens.
It smells of a mislocated fixpoint. This used to work (it works in the latest commit but not in
my experimental code)

The code being tested is highly sophisticated: it uses polymorphic variants with
open recursion to define grammars and a coroutine based parser: this code
is actually in the library (the test case of course isn’t).

Another case which failed when I fiddled involved GADTs. Yep, Felix has
generalised algebraic types, which require internally existentially quantified
types; that is, existential type variables.

In all these weird cases the ORDER of doing things matter a lot and
it’s really really hard to get right.
'

John Skaller
ska...@internode.on.net





John Skaller2

unread,
May 31, 2022, 12:08:43 PM5/31/22
to felix
Ah. The nominal/structural flag is used in bind_type as follows:

If the flag is nominal, a typeof(e) term is bound to BTYP_typeof(parent, e) where
that “e” is the original *unbound* expression e. So in the first phase of binding, typeofs
in typedefs are not bound to a type term containing an unbound expression.

AFTER all the typedefs are bound the flag is changed to nominal and the typeofs
in the typedef are eliminated.

Later on, bind_type eliminates them when seen outside of typedefs, as in the example

var x : typeof(e) = e;

In other words in any code *except* inside a typedef/typefun. So the flag is essential:
more precisely, it should have been passed as an argument of bind type but any which way
the handling of typeof has to be split into two cases. The reason typeofs cannot be handled
whilst binding typedefs is that you can’t expand a typedef name to its definition until
*after* the definition is put in the bound symbol table.

I note, the code actually handles recursion of the expression.

The flag is also used in a wrapper around bind_type_index which caches it’s result,
but ONLY if typeofs have been eliminated (since a bound type symbol containing a typeof
is not really very good in the cache, since it bypasses typeof elimination permanently).

The recursion with code threaded by a flag is really messy.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 3, 2022, 10:53:46 PM6/3/22
to felix
When Felix is binding types, there is a term written

T as x

captured by unbound type

`TYP_as(T,”x”)

This is a fixpoint operator, example

typedef list = (1 + int * x) as x;

The bound term is removes the TYP_as an put in a fixpoint term:

BTYP_fix (-2, TYPE)

where the inner “x” was. The -2 means to go up two levels, so
one level to get past the product * and one go past the sum +.

To detect a recursion you do a top down analysis, keeping a list called a trail
starting with the whole term, and push each subterm being analysed
in the recursive descent onto the trail, until a fixpoint is hit. You can then
find the term being refered to in a fixpoint term by -n terms back in the trail.
For example the trail will be

(int * fix-2), (1 + int * fix -2)

when you hit the fix-2 so the recursion is to the second term there,
which is the list. If you need to unfold, you replace the fix term with
the one in the trail.

Many operations do not unfold. Beta-reduction and mapping of terms
must generally not unfold. A term is *complete* if no fixpoint “goes
over the top of the term”. If you need always complete terms you have
to unfold the term *before* analysis which pushes the cycle the fixpoint
creates down. Analyses requiring complete terms can ONLY be done
top down.

Some operations screw up the level count. Beta-reduction is one of them.
Suppose a function term contains a fixpoint to some point above an application
of the function to an argument. Reduction removes TWO levels: first,
the application is removed, but the function term is also removed:
this is because a type function term in Felix looks like this:

fun (params, return kind, body)

and the body is a subterm of the type function term.
So beta reduction has to add 2 to any fixpoint in the function body term
which happens to go OVER the application. If it refers to the function itself,
the adjustment would be to add 1: this means the function itself is recursive.
I’m not sure what to do here but I think you have to unfold because it is
now a polymorphic recursion UNLESS the argument of the fixpoint is
the original parameter (in which case the whole application is a fixpoint).
I think this cannot happen, because the fixpoint installation must replace it
with a fixpoint.

Note that substitution of the argument for a parameter cannot
change the level of such a fixpoint.

The argument, however, presents a different problem: since it can be
substituted anywhere in the body of the function, there is no way
to adjust the fixpoint universally. So, IMHO, the argument MUST be
a complete term to avoid this problem, therefore, it HAS to be unfolded
before substitution (if the fixpoint goes above the application!).

However, if the recursion is inside or to the whole argument we don’t
unfold (only if the term is incomplete.). It’s a bit tricky because the
unfold function in Ocaml ONLY unfolds a recursion to the whole term.
It obviously cannot unfold an incomplete term (it’s a stand-alone function
so there’s no trail). If the recursion is interior to the term, there’s no need
to unfold. So this function actually does the wrong thing.

Unfortunately, if we do this special argument unfolding, I’m not sure
the beta-reduction will terminate. The standard proof it does is that
it is a tree analysis, and fixpoints are leaves (so it must terminate).
Unfolding a term *above* the application we’re beta-reduction almost
certainly would lead to an infinite recursion (since it reintroduces the
application we're currently reducing) unless a trail based recursion
stopper is found.

I have no examples. I’m not sure this problem can exist.

There’s ANOTHER problem: Felix *guesses* the kind of a fixpoint is TYPE.
It may not be. However the recstop data type actually doesn’t keep a trail:


type recstop = {
constraint_overload_trail: bid_t list;
idx_fixlist: bid_t list;
type_alias_fixlist: (bid_t * int) list;
as_fixlist: (string * int) list;
expr_fixlist: (expr_t * int) list;
depth: int;
open_excludes: (ivs_list_t * qualified_name_t) list;
strr_limit: int;
}

The “as_fixlist” is the one the “as” term uses. It is enough to
compute the correct depth for the fixpoint term, but not enough
to figure out what kind the term is. The problem is that the terms
we analyse during binding are *unbound* terms. They can include
references by string name to typedefs, which may not even be
in the bound symbol table yet .. binding is the process that puts them in there.

Note the “as” computation is part of the bindingf process before any beta-reduction
is done (beta requires bound terms).

So one option for “as” terms is to calculate the kind of the body and save it
in the trail and use that. This means we have to be able to kind *unbound* terms.
This is tricky because unbound terms contain string names, we can certainly
find these in the unbound symbol table. But can we calculate their kind reliably,
especially if they’re recursive??

NOTE: you may wonder, what kinds are there other than TYPE?
Well of course we have TYPE * TYPE and TYPE -> TYPE.
But we also have COMPACTLINEAR and UNITSUM at the moment.
These are indeed subtypes of TYPE so TYPE is not wrong, it’s just
not minimal.

There is a routine for guessing the meta-type of an unbound term,
but it fails in many cases.

| `TYP_name _ -> (* print_endline "A type name?"; *) kind_type

(* FIXME: these two cases are OBVIOUSLY WRONG *)
(* but without lookup there's no way to fix it *)
| `TYP_fname _ -> (* print_endline "A type name?"; *) kind_type
| `TYP_flookup _ -> (* print_endline "A type name?"; *) kind_type

See, if it sees a name it just says its kind TYPE. The routine does not
do any lookup, so it cannot do any better.

One way to fix this might be to set the kind to UNKNOWN and then patch
it up later. The fun one is:

| `TYP_as _ -> print_endline "A type as (recursion)?"; assert false

which would crash the compilation. Other terms are plain wrong too,
a compactsum has kind COMPACTLINEAR in general although it could
actually be kind UNITSUM. But that one can be fixed.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 5, 2022, 9:05:29 AM6/5/22
to felix

One of the problems I’m having at the moment is that a fixpoint term requires a kind.
There is a routine which given an unbound term guesses its kind. It does not do any
lookup so it has no idea the kind of a name. However that is not the only issue (it could
actually do lookup).

Consider the type t1 + t2. It probably has kind TYPE. But, that may not be the most
specialised kind. If t1, t2 are both unit, it’s a unitsum (in fact, it’s bool!)

Now, t1 could be just `TYP_tuple [] which is an unbound unit. But it
also could be a complicated expression like, for example, a dereference
of a pointer type (if it’s an application, and the type function is manifest
we can get the kind out of the function type).

The kind of a BOUND type is always calculable. The problem is we need the
kind setting the fixpoint term during the binding process.

Can we just put DONTNOW and fix it afterwards? Well yes, we could fix it
afterwards but would that been too late? After all te point of a kind is to
kind check type expressions.

The reason the kind is in there is that there are routine that calculate the kind
of any bound type, and the fixpoint type is a one of them. However, it could be
done by keeping a trail and just finding the kind of the term that would be
unfolded. Of course we have to use the usual trick here that when combinding
information to calculate the kind, we have to ignore the fixpoint .. which means
we don’t need to know its kind.

Hmmm. The fact it’s hard to calculate suggests we don’t need to know.
In fact kind ANY (the kind of the ype any) is probably good. Any term
requiring refinement will do that with unification/subtyping rules,
which will reduce the ANY kind down to a better approximation
(except for fix 0 = type any!)




John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jun 5, 2022, 9:15:01 AM6/5/22
to John Skaller2, felix
Hi John,

Maybe I am misunderstanding? I take it the problem is not kind checking, as you can annotate the fixed point with the correct kind, and check it. The problem is kind-inference, which is a syntactic problem. If the syntax for the language does not distinguish types of different kinds, then inference will be undecidable.

In other words you have to design the syntax with kind inference in mind from the beginning.

Cheers,
Keean.



--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/felix-lang/F89F0C9C-39F7-4713-AAA2-8A229F948C1C%40internode.on.net.
For more options, visit https://groups.google.com/d/optout.

John Skaller2

unread,
Jun 5, 2022, 11:52:07 PM6/5/22
to Keean Schupke, felix


> On 5 Jun 2022, at 23:14, Keean Schupke <ke...@fry-it.com> wrote:
>
> Hi John,
>
> Maybe I am misunderstanding? I take it the problem is not kind checking, as you can annotate the fixed point with the correct kind, and check it. The problem is kind-inference, which is a syntactic problem. If the syntax for the language does not distinguish types of different kinds, then inference will be undecidable.
>
> In other words you have to design the syntax with kind inference in mind from the beginning.

The kind can be determined from bound type terms, i.e. after string names have been replaced by unique
keys into the symbol table and type aliases are removed. So consider analysing an *unbound* term:
we basically replace for example

`TYP_tuple (texpr1, texpr2)

with

BTYP_tuple (btexpr1, btexpr2)

where btexpr1 is the bound version of texpr1. Now, type functions have
their domain and codomain kinds *specified* and so if you see an application,
even an *unbound* application, then if the function term is like

`TYP_function (params, ret, body)

then “ret” is the codomain kind, and the param terms also include a kind for each
param so the kind is the kind-product of the kinds of the params.

Now consider:

`TYP_compactsum [t1, t2, t3, .. ]

This will have kind COMPACTLINEAR and the arguments must have kinds
COMPACTLINEAR also. But here’s the problem: if ALL the types are
*unit* the *minimal* kind is UNITSUM. A UNITSUM is COMPACTLINEAR.
It’s a subkind.

The problem is that to determin this we have to replace any typedefs,
which means replacing string names with keys to the symbol table,
and we may need to beta-reduce the result of binding, and replace
typedefs, etc to get an actual type term.

Once the terms are in normal form, its no problem. The problem is
the binding process recognises recursion by comparing *unbound*
terms in the trail and inserts a fixpoint .. but the fixpoint is a bound term
requiring the kind of the term which would be stuck in there by an unfold.
We don’t know the bound term yet .. we are in the middle of binding it!

This is the same problem type inference solves by sticking in a variable,
which we solve for later. A test case using a fixpoint operator:

(1 + int * x) as x

So we bind top down. First we see the “as” term so we know “x”
is a fixpoint name. Now we see a sum of 1 and int * x. 1 is unit,
no problem. Then we see int * x.

Now, “int” is a string name so we look it up to find a unique key
to the symbol entry, in Felix the keys are just integers, and int
happens to be 101.

Int has kind TYPE. 1 has kind UNIT which is a UNITSUM
which is COMPACTLINEAR which is TYPE. So the product
has to have a kind greater than or equal to TYPE,
so the sum of UNIT and TYPE will be TYPE. Therefore the
fixpoint x, which is the first argument of “as” term, must have
type INT because

(1 + int * ???)

has kind “at least” TYPE. Since we want the smallest kind,
the kind must be TYPE.

What I’m stuck on is that we have to set the kind of x during the recusive
descent of the unbound term we do when we are doing the binding (lookup).

Once we have done that, we have a fully bound subterm and so we can
then determine the minimal kind .. but we need to set the kind of the fixpoint
first to complete the binding.

The reason is I made the fixpoint term include the kind. Maybe I have
to remove it since it can’t be calculated in time.



John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jun 6, 2022, 2:46:21 AM6/6/22
to John Skaller2, felix
Okay, so its an implementation problem? If you were to use a disjoint set instead of a symbol table you would add symbols to the disjoint set, and merge them in the disjoint set, so the 'unbound' symbol would point to the same data structure as the 'bound' symbol. I think this is how the OCaml compiler implements this too.

Cheers,
Keean.

John Skaller2

unread,
Jun 6, 2022, 10:04:54 AM6/6/22
to felix
Actually: in the compiler every expression is represented by a term:

raw_expr, type

So calculating the type is O(1).
This also means I can’t just bind and expression, I have to bind its type also.

So kinds are the types of types … so they should be

raw_type, kind

Note, the subterms of a raw type are types (NOT raw types).


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 8, 2022, 6:30:44 PM6/8/22
to felix
So, I am simply deleting the kind component of fixpoints. This gets rid of a lot
of guesses. For a bound type, we can calculate the kind anyhow: the kind of a fixpoint
is the kind of the whole term containing it which it refers to.

However the routine that already does that needs to be fixed because
it is a hack also.

In that routine, the kind of a fixpoint has to be ANY, and we have
to calculate the Least Upper Bound of possible kinds. For example
consider:

1 * 5 * int

The kinds are UNIT, UNITSUM, TYPE and the LUB is TYPE.
But the kind of

1 * 5

should be calculated as UNITSUM. [Note UNIT is not yet a supported
kind, nor is ANY]

The interesting case is a type function:

typefun f (x: TYPE, y:TYPE):TYPE * TYPE => x,y;

now consider:

f (1,1) = 1,1

The kind of 1,1 is

UNIT * UNIT

but the type function *specifies* the kind as

TYPE * TYPE

so we actually need a new term

KIND_coercion (kin, kout)

which has kind kout, but accepts a kind kin which must be subkind of kout
and coerces it to the super kind kout.

We need this since the input kind of our example is actually UNIT * UNIT,
but we require TYPE * TYPE. Most of the time the coercion can be implicit I think.
Note that this is NEVER the case for subtyping. Felix mandates a coercion
in bound types so that function arguments always have EXACTLY the same type
as the parameters. Unification finds the compatibility, but the “cal_apply” routine
injects the coercion. The coercion is later replaced by a back end (C++) coercion
if necessary. More to the point, the coercion is moved down into subterms,
for example a coercion of a tuple is reduced to a coercion on its components.

For the kinding system, the computations are at compile time, and generally don’t
need to be actually coerced.

However there is a problem. In Felix there are overloads of functions
where one version operates on types, and another on UNIQ types.
For example to map a list you make a new list of mapped elements
and reverse it. However if the list is unique, and the mapping is
T -> T, you can just modify the list in place. The effect is not a side
effect because it cannot be observed except by the sole owner.

So the issue with the kinding system is simple: there’s no overloading.

It is interesting, I just read again an article by Pierce et al on system O,
which is “type classes without class declarations correctly type more
programs”.

It’s the same issue. Systems that use type inference always give the
most general type to a function. If (as in Felix) you’re forced to specify the
type, and you overspecify it, the only escape is to overload on the types
that you missed.

An excellent example of this is with Felix polymorphic variants, compared
to Ocaml ones. A polymorphic variant

`X

has type

[`X]

which is a singleton type. It is a subtype of all type with at least `X in it.
So in a function you can specify the return type

[`X | `Y]

OR you can coerce

`X :>> [`X | `Y]

to get the desired result however you’re MISSING OUT on

[`X | ‘Z]

which is also a supertype of [`X]. In Ocaml *sometimes* you can just
infer the most general type. It doesn’t always work because type inference
is incomplete (i.e. it just DOES NOT WORK with higher order type systems).

On the other hand it works sometimes, and in Felix a very limited inference
is allowed: you can do this:

var x : [`X | `Y] = `X;

and Felix will insert the type coercion for you, but the inference is NOT
global as Ocaml’s is so in general polymorphic variants in Felix are
much harder to use because you have to write the coercions out
in many cases and these will usually be what you need at the time,
and not as general as they could be.




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 8, 2022, 7:52:26 PM6/8/22
to felix
>
> In that routine, the kind of a fixpoint has to be ANY, and we have
> to calculate the Least Upper Bound of possible kinds. For example
> consider:
>
> 1 * 5 * int
>
> The kinds are UNIT, UNITSUM, TYPE and the LUB is TYPE.
> But the kind of
>
> 1 * 5
>
> should be calculated as UNITSUM. [Note UNIT is not yet a supported
> kind, nor is ANY]

OOPS!

The LUB of ANY and X is ANY which is too general. Consider again

5 * int

The component kinds are UNITSUM and TYPE. The kind of the product
has to be TYPE, because a type function with domain UNITSUM will
not accept such a type (it isn’t a unitsum!)

So actually the fixpoint has to have kind VOID because the LUB of VOID,X is
of course X. Hmmm. For a sum A+B if A is kind void, the whole sum is just B.
But for a product, A * B, if A is void, the result is void.





John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 9, 2022, 5:32:41 AM6/9/22
to felix
So after removing the kind from fixpoints, and hacking their type, I get this:

Processing [107/194]: recursive_types-02.flx
Mt FAIL Flx_exceptions.ClientError(_, "[Flx_btype_kind:61: E241]

Flx_btype_kind:Metatype error: function required for LHS of application:\n

BTYP_type_apply(BTYP_fix(-4),BTYP_type_var(73106:TYPE)),

got metatype:\nTYPE”)

BINDING

typedef arity<71962> = fun(T: TYPE): TYPE=typematch T with
| ?U -> ?V => succ (arity V)
| ANY => void
endmatch

FAILED with File "src/compiler/flx_core/flx_btype_kind.ml", line 10, characters 1-7: Assertion failed


which is saying correctly that the kind of the function term in a type function
application has to be of the form

D -> C

and not TYPE. The termis BTYP_fix(-4) which is hacked to kind TYPE.

YOu can see the recursion in the unbound term, arity is calling itself.
Note, this a function recursion (NOT an application recursion), so in
fact it is NOT a recursive type at all: this is polymorphic recursion
so to speak, it has to be unrolled when executed: note the argument
is V, which is not the parameter T.

Now given a bound type, it is possibvle to calculate the kind of the fixpoint,
when the analysis starts with a just complete type, and we keep a trail,
then the fixpoint kind is the kind of the top level input the fixpoint refers to.
So the fixpoint is given a DONTKNOW kind which doesn’t add information
to the computation. This will be the kind of the fixpoint too. But we cannot
calculate it without the trail.

The problem here is we trying to find the kind of the fixpoint just after
it is created deeply down in the recursive analysis of an UNBOUND term,
the bound termis computed bottom up, which means we don’t know
the bound term being refered to until after we’re finished.

this problem occurerd before too, only the kind was guessed at

TYPE->TYPE

for some reason, and the guessing not done now. But we don’t want to guess
anyhow.

So I haven’t actually broken anything, I just replace a bad guess with a worse one.

The actual problem seems to be this: the binding code is KIND CHECKING as it goes.
In particular for an application f t, the function term must have a kind of the form

D -> C

and the kind of the argument must be a subkind of D. Furthermore, the kind of
the result must be a subkind of C, but this is not checked.

The only possible solution here is to not check at this point because we don’t have
enough information to do so. Which leaves the question .. exactly where do we check?



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 9, 2022, 10:41:47 AM6/9/22
to felix
So, this:

> On 9 Jun 2022, at 19:32, John Skaller2 <ska...@internode.on.net> wrote:
>
> So after removing the kind from fixpoints, and hacking their type, I get this:
>
> Processing [107/194]: recursive_types-02.flx
> Mt FAIL Flx_exceptions.ClientError(_, "[Flx_btype_kind:61: E241]
>
> Flx_btype_kind:Metatype error: function required for LHS of application:\n
>
> BTYP_type_apply(BTYP_fix(-4),BTYP_type_var(73106:TYPE)),
>
> got metatype:\nTYPE”)
>
> BINDING
>
> typedef arity<71962> = fun(T: TYPE): TYPE=typematch T with
> | ?U -> ?V => succ (arity V)
> | ANY => void
> endmatch
>
> FAILED with File "src/compiler/flx_core/flx_btype_kind.ml", line 10, characters 1-7: Assertion failed
>

arises because finst, an instance of a type function, requires the domain and codomain kind.
This requires beta-reduction, to obtain an actual type function lambda which carry that
information. So I’m going to remove that from the term to get rid of the problem.

There are some other cases where binding does too much work, one weirdo
is unravelling an alias with the structural/nominal typedef crud. When removed
before this broke something but I’m sure it’s because I failed to adjust a fuxpoint.

It simply cannot be wrong for binding to do nothing other than replacing string names
with integer keys, except where an unbound term has kinds in it already (in which case
they’re bound too). Function lambdas have kinds, type variables have kinds.

The ONLY exception to this rule is a bit nasty:

1 + int * x as x

has to have a fixpoint term inserted because the x is only a local type variable,
it’s not in the symbol table.

So it’s clear what Felix does is this:

1. Bind all typedefs and typefunction definitions.
2. Eliminate typeofs in these

Now bind everything else.

Not that binding a var or function requires type binding too.
However now, when we hit a name in a type term, we know
it must be already bound.

Conversely, only terms associated with executable code,
such as instructions, function paramater and return types, etc,
need binding. Beta reduction is only needed when we need to
*compare* types, or, later, generate code.

I need to add another pass before “everything else” that does
kind checking. But for the moment all the test cases are correct
so not doing the checks cannot introduce any errors. It’s not
clear HOW to test for cases that should be errors (this is s universal
issue).




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 9, 2022, 7:06:09 PM6/9/22
to felix
The kinding issue is not going to be easy. So here’s a diversion:
Top and bottom.

The smallest kind at the moment is UNITSUM, the largest is
BORROWED which is probably going to go away. The ordering is:

BORROWED
TYPE
LINEAR
COMPACTLINEAR
UNITSUM

The borrowed kind will probably go away. LINEAR actually means unique
not linear.

Compact linear is well founded: a type is compact linear if it
is 0, 1, a compact sum of compact types, or a compact product of compact types
of a compact exponential of compact types or a compact repeat sum of compact types.

Compact types are represented by a single natural number (uint64_t at the moment,
although it could be made to depend on the size of the type).

Ordinary products are C structs, that is, the components are addressable.
Ordinary sums consist of a 64 bit machine word, and, if any constructor has
an argument (perhaps other than unit), then a second machine word pointing
at the type. Note variants use a slightly different model for efficiency and
C compatibility. Polymorphic variants use the same model as ordinary sums
except that the constructor index is a 64 bit hash of the string name.

With these models, an ordinary sum of compact linear types has the same
representation as a compact sum.

A unitsum is 0, 1, 1+1, 1+1+1, etc,. written 0,1,2,3,…. these are
a special subkind of compact linear which is “plain linear”.

An ordinary array type x^y requires y to be compact linear.
It’s a linear array if y is unitsum. A linear array is identical
to a tuple with all y components the base type x. If the type
y is not a unitsum, we have a matrix instead.

A compact array x\^y requires x to be compact linear also, and is
compact linear.

A repeated sum n *+ t is the dual of an array, n must be a compact
linear type. If n is a unitsum, it is just the sum of n components
where all the components have the same type. Note the repeat
operator is isomorphic to but NOT equal to product formation.

There’s also a n \*+ t, where t is also compact linear.
Note at the moment both repeat sums are disabled in the grammar.

It should be noted arrays and repeat sums are NOT fundamental.
However they are REQUIRED in practice because, for example,
a tuple of 10 million terms would wipe out the compiler. There’s
a limit at the moment of around 20 components for a tuple.
Ditto sums.

However arrays and repeat sums offer the ability to index using
expressions, whereas tuples only permit indexing by a constant.

The compiler automatically enforces the optimal representation,
that is, the term bound from the unbound form

int * int

is

BTYP_array (int, 2)

Felix has one more vital structural type: pointer formation

&T

and

_pclt<t,m>

which is a pointer to a compact linear product component, where
m is a machine pointer, and t a compact linear type a value of which
is stored inside a 64 bit integer pointed at by the machine pointer.
The representation consists of three machine words: the machine
pointer, a divisor and a modulus, which is sufficient to identify the component.

Note there are (at least) 3 kinds of pointers:

&T is a read/write pointer
&<T is a read only pointer
&>T is a write only pointer

and there are equivalent compact forms with ugly names. Felix uses
a polymorphic intrinsic

_storeat: &T * T

to store a value at a pointer (and equivalent forms for compact linear
pointers).

Felix introduces a NOVEL structural operation: pointer products.
Consider a tuple, A * B, it has projections A* B -> A and A * B -> B.
Felix introduces pointer projections:

&(A * B) -> &A
&(A * B) -> &B

which map a pointer to a product to a pointer to its components
(generalised to any number of terms). It works for compact linear
pointers too, and for arrays the index can be an expression.

In fact, pointer projections exist in C and C++ and for an addressable
product, is nothing more than the addition of an offset. In fact, all
addressable products use pointer projections to *fetch* component values,
even in a purely functional subsystem. It’s rather amazing that in fact
the construction is novel since internally all addressable components
of products are manipulated this way (with an extra level of indirection
thrown in for boxed types).

NOTE: a type system contains a LOT more terms than described above,
for example type functions. The terms above, however, are all types
of expressions, that is, they all describe entities which exist at run time.
Other terms must be reduced away, for example the application of a type
function to a type must be reduced to a run time type eventually,
since type functions themselves do not have a run time existence.


KINDS
=====

All of the above has been introduced to show that the use of kinds
is MANDATORY. In particular, consider arrays

base ^ index

then base can have kind TYPE, but index must be COMPACTLINEAR.

So now .. what is the TOP and BOTTOM kind? TOP is obviously “any kind
you can think of”, but what is BOTTOM? Here’s the fun thing now:

BOTTOM has to be a kind with NO types by analogy wih type systems.

So what’s UNIT?

Strangely, a UNIT kind must be any kind with ONE type in it,
and that must be void. So unit type = 1 is not in UNIT!

Note that in type systems, with analogy to sets, 1 = type of value ()
which is an empty tuple, is not unique. In fact, any set with one
element is a unit type, so 1 is simply the unit associate with
tuple formation (it is uniquely the empty tuple). Other products
can have different units.

An example: in some languages an empty array of type T,
that is, T ^ 0 is distinct from an empty array of type U.
So we could have T ^ 0 = 1_T meaning the unit of type T
being distinct from U ^ 0 = 1_U. They’re isomorphic of course,
but not equal.

So BOTTOM could be called VOID. But VOID is NOT the kind
containing type 0, that is actually UNIT. It’s disturbing, I’d have
thought UNIT was the kind containing only 1 = unit.

Is this right? Note that VOID must be unique in any distributive
category. But UNIT = set of only type 0, is ALSO unique because
0 is unique. There’s only one void possible. Initial objecs in a distributive
category are unique, units are terminal and are not unique.

Since kinds are categories, VOID is the category with no objects
(and hence no arrows). UNIT has to have one object and one arrow,
namely the identity. But

UNIT * TYPE == TYPE

where == means “isomorphic categories” and

VOID * TYPE == VOID

So it seems, UNIT is actually the category containing the type void.
Which is kind of interesting if true. Is there another UNIT? After all
units are not unique. So any category with one arrow would do.
This means UNIT is a specific one arrow category (and it HAS to be
the one containing type void). Just as “unit=1” is a specific unit,
namely the one associated with the standard product of no types.




John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jun 10, 2022, 1:38:09 AM6/10/22
to John Skaller2, felix
Do you distinguish between mutable and immutable data? 

I consider read and write pointers to be like access permissions, it is possible to pass read only access to mutable data, and write only access to mutable data. So here the data pointed to by a read only pointer could be changed by "someone else". Immutable data can only be accessed via a read only pointer, and cannot change. 

This leads to different semantics, with a read only pointer to immutable data it is safe to copy the data (for example to have a thread local cached copy for performance reasons). With a read only pointer to mutable data you must always refer to the original memory address.

I also wonder about partial write access, pointers to objects where you only have permission write to certain properties, partial read pointers, and execute pointers (only call methods of object).

Really pointers are very primitive 'capabilities'. 
Statically the 'capability' of a pointer can be expressed in its type.

I don't think you should ever be allowed to cast away immutability, nor read/write only.


Cheers,
Keean.


--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jun 10, 2022, 3:23:26 AM6/10/22
to felix


> On 10 Jun 2022, at 15:37, Keean Schupke <ke...@fry-it.com> wrote:
>
> Do you distinguish between mutable and immutable data?

Not specifically .. yet. The compiler knows some stuff is immutable.
For example, the data captured by a variant constructor is considered immutable.

Pointers have a specific attribute:

type pmode = [
| `RW (* read/write *)
| `R (* read only *)
| `W (* write only *)
| `N (* pointer to unit : no read or write *)
]

where RW is a submode of R an W so a RW pointer can be used where a R or W pointer is required.

The encoding in the grammar is:

&T, &<T, &>T

N means no read or write (but can be used for address comparisons perhaps).

I’m thinking of adding RMW meaning atomic read/modify/write.

I’m also thinking of getting rid of Uniq type constructor, and instead using
a pointer attribute: the idea is teh ONLY allowed kind of reference to
something is a pointer and the ONLY allowed way to capture a non-memory
resource is in an object.

There are some other possibilities, such as “immutable” which means the data
cannot be changed, so it can be cached once read.


>
> I consider read and write pointers to be like access permissions, it is possible to pass read only access to mutable data, and write only access to mutable data. So here the data pointed to by a read only pointer could be changed by "someone else". Immutable data can only be accessed via a read only pointer, and cannot change.

Right. But immutability is a property of the object, so we need to put that into the pointer that refers to the object too.
There are more properties, such as “ROM” which is not only immutable, but has trhe same address
in every process.

>
> This leads to different semantics, with a read only pointer to immutable data it is safe to copy the data (for example to have a thread local cached copy for performance reasons). With a read only pointer to mutable data you must always refer to the original memory address.

Yes.

>
> I also wonder about partial write access, pointers to objects where you only have permission write to certain properties, partial read pointers, and execute pointers (only call methods of object).
>
> Really pointers are very primitive 'capabilities'.
> Statically the 'capability' of a pointer can be expressed in its type.
>
> I don't think you should ever be allowed to cast away immutability, nor read/write only.

It’s not so much “cast away” as subtyping properties. Allowed conversions are subtyping
conversions. If you allow throwing out immutable property, then you cannot cache the
result of a read, you have to assume the data is volatile. You can have a write only pointer
to the same store as a read only pointer and modify it yourself.

However immutable never allows a write access. But that is not enough.
Volatile registers can still change due to external events, and, also possibly
other threads, even if they’re only accessed by read-only pointers.

I plan to do a more complete analysis of this because I think the linearity/uniqueness
stuff goes in there too. But there is more: we need to handle reference counting.

Ref counted object pointers most be constrained to avoid cycles. One way to do that
is disallow storage of them on the heap. But that is overkill.

There are two things to do here. The first is to construct different kinds of allocators.
I have a few of these. The other thing is to link memory management with
kinding systems.

RW/R/W/N attribute is just the tip of the iceberg. I suspect separation logic is the key
to reasoning here, but ultimately I believe different “logics” are just inspiration
for a more general notion of contract programming. Type systems enforce only a
limited kind of contract … we need richer kinds to enforce memory management rules.

This is the basis of my idea for C@. Unique and linear are all very well.
But I want sharing WITHOUT garbage collection because that is required
for real time distibuted processing. You can’t collect across computers, nor
can you do it with strict performance bounds. But real time programs
are finite state machines: if you have recursion it MUST be bounded.
Allocators are initialised with finite memory and it MUST be enough.

This is much harder to do that what Felix does, which is basically
the same as Haskell or Ocaml: implement a CCC (cartesian closed
category). That is too powerful for reat time distributed operation,
so we need to find constraints: we want BOTH in the same language,
with static enforcement, and that requires a powerful kinding system
based on category theory.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 12, 2022, 7:59:59 AM6/12/22
to felix
I am making progess in my question to make binding algos do nothing except
binding. What this means is that unbound terms will map, fairly closely,
to closely related bound terms, in which nothing much is different other
than the replacement of string name with integer keys into the symbol table.

One thing I found was this: when a typefun with kind indices is applied
to a type, but the kind indices are not fully specified, we have to deduce
them from the argument. Beta-reduction does this, but the bound form
of an application requires the kind of the function term, argument, and result.

The problem is that binding is bottom up, whereas the deduction process
required the analysis to be top down: it is necessary to recognise an
application, find the argument kind, then the function kind, and unify
them to calculate missing results and simultaneously and kinding error.

So to make the binding work, I had to beta reduce a whole type alias,
just o calculate it’s kind (and then discard the reduced result,
knowing it will all be done again later by beta-reduction).

The problem is that there is no anonymous kind function term,
abstracting over kinds, as there is an anonymous type function
term, abstracting over types: type functions with kind variables
have to be defined by name. In other words there are no kind
level lambdas. Yet :-)



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 12, 2022, 9:01:13 AM6/12/22
to felix
I’ve reset the group so replies go to the list instead of the poster.
Hoping this works .. test test … :-)


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 12, 2022, 9:31:18 AM6/12/22
to felix


> On 12 Jun 2022, at 23:01, John Skaller2 <ska...@internode.on.net> wrote:
>
> I’ve reset the group so replies go to the list instead of the poster.
> Hoping this works .. test test … :-)

Nope, it’s not working.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 12, 2022, 9:34:48 AM6/12/22
to felix


> On 12 Jun 2022, at 23:31, John Skaller2 <ska...@internode.on.net> wrote:
>
>
>
>> On 12 Jun 2022, at 23:01, John Skaller2 <ska...@internode.on.net> wrote:
>>
>> I’ve reset the group so replies go to the list instead of the poster.
>> Hoping this works .. test test … :-)
>
> Nope, it’s not working.

Again try ,,


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 12, 2022, 9:48:11 AM6/12/22
to felix
At the moment this fails:

variant Expr[T] =
| EBool of bool => Expr[bool]
| EInt of int => Expr[int]
| EEqual of Expr[int] * Expr[int] => Expr[bool]
;

fun eval(e) => match e with
| EBool a => a
| EInt a => a
| EEqual (a,b) => eval a == eval b
endmatch
;

var expr1 = EEqual (EInt 2, EInt 3);
println$ eval expr1;

~/felix>flx gadt-03.fdoc
ClientError Whilst calculating return type:
[flx_bind/flx_lookup.ml:2225: E114] [cal_apply] Function index_78370<78370>closure
of type unit -> void
applied to argument ()
of type unit
returns void
Flx_lookup:inner_bind_expression: Client Error2 binding expression (eval expr1)
Error at:
/Users/skaller/felix/gadt-03.fdoc: line 18, cols 10 to 20
17: var expr1 = EEqual (EInt 2, EInt 3);
18: println$ eval expr1;
***********

Two other GADT tests work. I need to explain the code: in Felix,
if a parameter has no type it is set to GENERIC.

GENERICs are a hack that works as follows: when a call is made
to a generic function, the function is copied and modified so the
GENERIC parameter is now set to the actual argument type.
Then, the call proceeds as usual.

This hack is used by GADTS. The problem in this case is roughly
that the function call is bound to a reference, instead of being elaborated,
because that’s why “binding” should do .. but in this case since the type
variies because it’s a GADT, that inhibits the copying of the definition.
Something like that.

Unrelated to this the beta-reduction hack done in binding to find
the kind of a type alias is resulting in attempts to lookup a symbol
which has not yet been bound. This doesn’t crash the program,
since beta at the moment never fails (it should though).

This means I need to use special code for calculating the kind of a
type function instance used in an application, the same way as
beta does it, without running beta for all type aliases.
I don’t actually know how to do that due to the bottom vs top down
issue.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 12, 2022, 9:48:53 AM6/12/22
to felix
>>
>> Nope, it’s not working.
>
> Again try ,,

Now it works :-)


John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jun 12, 2022, 10:19:11 AM6/12/22
to John Skaller2, felix
Kind level lambdas are pretty much the same as type level lambdas.

Values are different, so typing values is different, but once you have kinding types, you can also have sorting kinds the same way. At some point it seems simpler to me to move to numbered universes, so that types = u0, kinds = u1, and sorts = U2 etc... Then you can abstract the universe level to define things like Boolean in u(n+1) is True and False in u(n).

Otherwise you end up having to define a type level natural number kind, and then a kind level natural number sort... Etc.

Cheers,
Keean.



--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jun 12, 2022, 7:07:29 PM6/12/22
to felix


> On 13 Jun 2022, at 00:19, Keean Schupke <ke...@fry-it.com> wrote:
>
> Kind level lambdas are pretty much the same as type level lambdas.

In principle.

> Values are different, so typing values is different, but once you have kinding types, you can also have sorting kinds the same way.


> At some point it seems simpler to me to move to numbered universes, so that types = u0, kinds = u1, and sorts = U2 etc... Then you can abstract the universe level to define things like Boolean in u(n+1) is True and False in u(n).
>
> Otherwise you end up having to define a type level natural number kind, and then a kind level natural number sort... Etc.

I tried that but got too confused. I couldn’t figure out which terms had levels and which were
universal. For example is product formation:

product [k1, k2, k3] // product kind
product [t1, t2, t3] // product type

like that, or,

product 3 [k1,k2,k3] // product kind
product 2 [t1, t2, t3] // product type

The first one uses a universal constructor.
The second one gives the constructor a level number.
Method 1 leaves the empty product ambiguous, or, if you prefer, universal.

Now for C@ the idea is .. there are no levels, just categories.
They’re a set of arrows with constraints on composition.
They’re related with with functors and natural transforms.

However, consider things like dependent typing. In the “lambda cube”
we now have types dependent on values instead of just types.
And if you consider polymorphic functions, you have values dependent
on types.

So the idea is, in a term, there is *locally* a containment relation:

E:T:K:S:….

E-expression, T-type, K-kind, S-sort ….

where the “infinite containment list” is terminated by a default, or, perhaps,
when you can compute the next level up from the previous one.
Another way is meta-recursion: you say for example a meta-Sort
is exactly the same structure as a Sort. So now, much cleaner,
you can terminate at a fixpoint, i.e. when the Sort’s meta-Sort is itself.

If you look at that last statement, you might say “when the Sort’s
meta-Sort is ISOMORPHIC to itself” rather than equal, but the proposal
is that this isomorphism is represented in the language by identity,
to terminate the infinite regress.

So in practice, can we just get rid of global levels entirely?
In other words, E, T, K, S, meta-S …. they’re all just categories
with no level notion at all. The “level” notion is localised instead:
if you want “types of a particular kind” you have to build a mapping
from kinds to types so you can pick out the types of a given kind.

So it’s the mapping, not the categories being mapped, that creates
the notion of a level.

In other words Nat is just a single category. There’s no value Nat,
type Nat, kind Nat distinction.

I think this idea works with the notion of partial evaluation. In other
words, if you have

1 + 2

it can be evaluated by the compiler, if you have

x + 2

we generate the code to evaluate it later, when x is known
at run time. And this notion applies to types and kinds etc as well.

Of course .. the idea may not pan out. Playing with this stuff is
very confusing.

Consider products.; Now, a type level product in Felix
is literally a method for generating a C type from a Felix type.
The canonical product generates

struct blah { T1 mem1; T2 mem2; };

This is an addressable product. Felix also has a *different* product
functor which only applies to types of kind COMPACTLINEAR:

T1 * sizeof(T0) + T0

where T1, T2 are (subranges of) integers and sizeof means
“number of values in subrange”.

So COMPACTLINEAR types are represented by integers,
this is a product on the *subcategory* COMPACTLNEAR
of the main category TYPE.

COMPACTLINEAR has sums, products, and exponentials.
In addition, we have interactions:

string ^ 5

is an array. String is in TYPE, 5 in COMPACTLINEAR.
This is a finite exponential. It’s a real exponential, distinct
from function closures

BASE -> INDEX

So when you read the literature it talks about, for example,
cartesian categories, it misses the point: a category can
have more than one product functor. The functor is never
specified in category theory, only the constraints on the product
(a product is a collection of projections).

Note COMPACTLINEAR is a subcategory of TYPE, it is
NOT merely isomorphic to a subcategory of TYPE, it’s
not an embedding, its an actual subcategory.

Therein lies a fundamental problem with category theory:
it’s only interested in structure, not identity. PLs must be
interested in distinguishing identical and merely isomorphic.

An example of this: in Felix, there is no distinct product of
one value. (5,) = 5. The projection is the identity function.
In other systems:

un (5,) = 5
tup (5) = (5,)

the tuple of one value is distinct from that value. Not in
Felix.




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 13, 2022, 10:02:01 PM6/13/22
to felix
I get this in the gadt-03 regression test:

CLIENT ERROR
Whilst calculating return type:
[flx_bind/flx_lookup.ml:2225: E114] [cal_apply] Function index_78531<78531>closure
of type unit -> void
applied to argument ()
of type unit
returns void
In /Users/skaller/felix/gadt-03.fdoc: line 10 col 1 to line 15 col 2
9:
10: fun eval(e) => match e with
11: | EBool a => a
12: | EInt a => a
13: | EEqual (a,b) => eval a == eval b
14: endmatch
15: ;
16:


The error is actually not with eval, but the match function:

Ubound type remapper type variable 77343 --> 78531
New exes =

eval_mv_77344<78536> := e;
// begin match
// match case 1:|EBool a_param_gadt-03_77338
begin_match_case
if(not((eval_mc77345<78523> ())))goto _ml77346;
a_param_gadt-03_77338 := ctor_arg EBool(eval_mv_77344<78536>);
return a_param_gadt-03_77338;
_ml77346:>
end_match_case
// match case 2:|EInt a_param_gadt-03_77339
begin_match_case
if(not((eval_mc77346<78527> ())))goto _ml77348;
a_param_gadt-03_77339 := ctor_arg EInt(eval_mv_77344<78536>);
return a_param_gadt-03_77339;
_ml77348:>
end_match_case
// match case 3:|EEqual (a_param_gadt-03_77340, b_param_gadt-03_77341)
begin_match_case
if(not((eval_mc77348<78532> ())))goto _ml77350;
b_param_gadt-03_77341 := get (1, ctor_arg EEqual(eval_mv_77344<78536>));
a_param_gadt-03_77340 := get (0, ctor_arg EEqual(eval_mv_77344<78536>));
return (== ((eval a_param_gadt-03_77340), (eval b_param_gadt-03_77341)));
_ml77350:>
end_match_case
// match failure
noreturn_code c" FLX_MATCH_FAILURE("/Users/skaller/felix/gadt-03.fdoc",10,16,13,37);
"()
++++ Cloned 77343 -> 78531
inline(generated) generated desugar:match fun (Felix function) fun eval_mf_77343: unit -> <typevar 78531>;

Now, the function 78531 returns a type variable 78531. This is a special trick which identifies
a function return type with a type variable.

Felix uses unification to set the value of the type variable, it is stored in a special cache of
function return types.

The way this works is by examining every return statement in the function, and unifying
the stated return type, the types of the function returns, and the type variable, which
sets the type variable in the cache (hopefully). The process fails if there are no return
statements with evaluable argument types and no return type is specified: for a procedure
this is fine, the return type is void. You can see in the function there are multiple return
statements.

So it looks like none of the return expressions can be typed, although why the result
is void is beyond me at the moment.

But there’s more. In Felix GADTs are hacked by making functions on them take
a GENERIC argument. The way this works is as follows:

When an parameter type is not specified, Felix adds a type variable to the function,
and sets the parameter type to that type variable. When a function is applied,
Felix checks if it’s generic (rather than polymorphic). This is done with a trick:
the KIND of the type variable is set to GENERIC.

Now overloading has the bound argument and its bound type.
So Felix does unification on the parameter and argument to calculate
the bound type of the type variable.

It then CLONES the whole function, eliminating the type variable,
and replacing it with a reference to a type alias.

Note that overloading has to use UNBOUND function symbols.
It cannot bind the function, because of recursion. It only needs to
pick the correct function and provide a bound reference to it,
which includes its type.

SO when you see this simple case:

fun f(x) => x;
println$ f 1;
println$ f "Hello";

it works by creating two new functions:

fun f(x:int) => x;
fun f(x:string) =>x;

which allows the system to proceed and find an actual function
of the correct type. It’s messy because it cloning is copying
the function and all its children (variables, etc) and rebranding
everything. It also has to convert the argument bound type into
an UNBOUND type which is an extreme pain, since it is subsequently
rebound.

Felix maintains a cache of generic rebindings so it only creates
a function with a particular type once.

Now the thing is THIS TEST CASE USED TO WORK.
The code that does the cloning is very messy and complex but
it does not contain a BUG as such: instead, *something* is making
assumptions which have been invalidated by the new binding model.

For example: in the cache of generic types, code previously
expanded typedefs but now, they’re left as references to symbols.

Another change: all the typedefs get bound first now. So in any executable
code (and we’re examining unbound executable code here), the typedefs
already refer to BOUND symbols (but not expanded ones).

Note that the cloning ADDS A NEW TYPE ALIAS TO THE UNBOUND
SYMBOL TABLE. As well as adding new functions and other stuff.

So there error must be due to:

(a) failing to expand a cached function return type
(b) failing to expand a cached generic thingo
(c) failing to expand a typedef
(d) heck, failing to BIND the new typedef.

The code is too complex to analyse. you have to remember,
variables set like

var x = expr;

are actually encoded as:

var x : typeof(expr) = expr;

so there are also typeof’s involved: computing a typeof
required binding the expression, extracting the type,
and throwing the executable part of the expression away.

All of which requires overloading .. which we’re right in the
middle of doing, and not just ordinary overloading, but
GENERIC overloading.

The real problem here is that Ocaml’s type system isn’t strong enough
to simultaneously maintain type normalisation and also represent
invariants of the term rewriting code. Felix uses private types to
protect normalisation (for example an attempt to encode a
tuple of type bool * bool actually generates the term 2^2, an array
of two bools, where bool is a name for 2). There are a LOT of these
kind of normal forms and the system relies on it.

Consequently even after, say, typedefs are totally removed,
so there are no references to typedefs in any terms, a reference
to a typedef is still a case of the variant and meeting one leads
to a compiler run time error (instead of an Ocaml type error).
The invariants can be represented much better with polymorphic
variants .. but then the normal forms can’’t be assured.
To make it harder, type inference doesn’t always work with polymorphic
variants.

Gadt 01 and 02 run fine. Gadt-03 was always the real test.

I need more coffee!


John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jun 14, 2022, 2:04:32 AM6/14/22
to felix
It would be interesting to understand what OCaml's type system would need to be strong enough to express the invariants necessary to find these kind of errors statically at compile time. Would something like refinement types work where you can say 'variant<t, t!=typedef>'? You probably need to be able to pass the 'refinement constaint' too, so perhaps first class constraints too? 

However, could you do a complete pass of the data and check each variant does not have a typedef before copying to a new variant that does not have that option? If you use generics in OCaml, you can have a slower debug mode that includes this extra pass and copy between any compiler stage that eliminates variant options, but for production just uses the same variant type in every stage.

Cheers,
Keean.


--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jun 14, 2022, 7:18:16 AM6/14/22
to felix


> On 14 Jun 2022, at 16:04, Keean Schupke <ke...@fry-it.com> wrote:
>
> It would be interesting to understand what OCaml's type system would need to be strong enough to express the invariants necessary to find these kind of errors statically at compile time. Would something like refinement types work where you can say 'variant<t, t!=typedef>'? You probably need to be able to pass the 'refinement constaint' too, so perhaps first class constraints too?
>
> However, could you do a complete pass of the data and check each variant does not have a typedef before copying to a new variant that does not have that option? If you use generics in OCaml, you can have a slower debug mode that includes this extra pass and copy between any compiler stage that eliminates variant options, but for production just uses the same variant type in every stage.

Ocaml’s standard variants allow a private option:

type t = private
| BTYP_tuple of t list
| BTYP_compacttuple of t list
| BTYP_array of t * t
| BTYP_type_function of (bid_t * kind) list * kind * t


let btyp_tuple ts =
match ts with
| [] -> btyp_unit ()
| [t] -> t
| (head :: tail) as ts ->
(* If all the types are the same, reduce the type to a BTYP_array. *)
try
List.iter (fun t -> if t <> head then raise Not_found) tail;
btyp_array (head, (BTYP_unitsum (List.length ts)))
with Not_found ->
BTYP_tuple ts


So you see, the code creates an array uf all the components are the same,
otherwise a tuple.

The private keyword on the type prevents construction of a term except
by calling functions in the same module. This ensures the enforcement
of normal forms.

Now you see there is a type function, first argument is a list of integer, kind
pairs for the parameters, a retrurn kind, and finally a body.

After prodcessing, eg beta-reduction, all the type functions should disappear.
To enforce that, you need a type without that constructor. Alas with standard
variants, you have to crate a completely new type, with distinct constructors
for the terms that remain, and some reduction process would have to map
all the terms across.

With polymorphic variants, you can do this using open recursion without
remapping anything. But alas, a polymorphic variant can be constructed
anywhere with any arguments you like, so there’s no way to prevent
construction of say

`TY_tuple (int, int)

which is not in normal form, should be an array

`TY_array (int, 2)

Polymorphic variant type can be made private but the interaction
with open recursion is impossible to understand and probably can’t
be made to work properly.

The other problem is, since polymorphic variants are *structural*
types not nominal types .. error messages have to list all the
constructors to specify a type ..in other words, they’re unreadable.

So what I do is this:

let rec cpp_type_classname syms bsym_table t = ….
try match t' with
| BTYP_instancetype sr -> "void*/*instancetype*/"
| BBOOL _ -> assert false
| BTYP_typeop _ -> assert false
| BTYP_typeof _ -> assert false
| BTYP_uniq _ -> assert false
| BTYP_borrowed _ -> assert false
| BTYP_ptr (_,_,(_::_::_)) -> assert false

I just assert away the terms that shouldn’t be present.
In other words the type check is *dynamic*. If one of
these terms triggers an assert, somehow a bad term
leaked through the rewrite rules. But since the check is
dynamic .. I have no idea where the fault is.

I use standard variants for the critical Ocaml variants eg for types, expressions,
instructions, and polymorphic variants for less important ones like the front end AST.

There are like 100 files which pattern match variants and adding or removing
one requires editing the lot, it can take a few hours and still isn’t reliable
if any match uses a wildcard.

At least Ocaml tells me if I do NOT use wildcards, if I missed a case.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 14, 2022, 11:37:46 AM6/14/22
to felix


> On 14 Jun 2022, at 12:01, John Skaller2 <ska...@internode.on.net> wrote:
>
> I get this in the gadt-03 regression test:
>
> CLIENT ERROR

>
> So there error must be due to:
>
> (a) failing to expand a cached function return type
> (b) failing to expand a cached generic thingo
> (c) failing to expand a typedef
> (d) heck, failing to BIND the new typedef.

It was (c). What happened is that a GadtUnificationFailure caused all the
match branches to be skipped so the function had no visible return
statements, which made it a procedure .. returning void.

The unification failure was, trivially, because type 2 is not equal to Alias bool
structurally. Unification only works on types, you cannot have aliases,
lambdas, applications, etc in there. So the inputs required beta-reduction.

Previously, aliases were expanded by binding a type, which I worked
very hard to eliminate.

So we’re down to only 1 error now, which I think is a failure to adjust a
fixpoint term after beta-reduction.

BTW: Because of Felix weird representation of fixpoints, a recursion
in an argument of a type function that goes out of it does not work.
In other words you cannot do this:

(1 + f x) as x


The problem is that inside f, the parameter p can be anywhere in the f term.
When x replaces it in the above form, no problem. For example if

f = fun p -> 1 + p,p

then after replacement we get

(1 + ((1 + x),x) as x

As you can see, the interior x fixpoints are at different depths in the term.
So there is no single adjustment prior to replacement, which preserves
the location the whole x term. The first x is 3 levels down and the second only 2.

So in Felix, the argument has to be a complete type. If is not, the term has to be
unfolded to make it so:

(1 + f ((1 + f x) as x))

Now the outer f can be applied to argument (1 + f x) as x which is complete.
This leads to

(1 + (1 + (1 + f x) as x), (1 + f y) as y

The “as” form of the original is correct, it just has no representation
where the fixpoint has a single level: we can put different levels
in for the two x and it’s fine. But we can only do that AS we do the substitution.
We cannot do it first or afterwards.

Felix doesn’t unfold the argument at the moment, nor does it adjust fixpoints
in the argument before, during, or after substitution.

Note tha *function* term does not have to be complete, after beta reduction
we just have to adjust the level down by 2 (one for removing the lamda,
and one for removing the application), provided the level goes over the top
of the term after substitution.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 14, 2022, 9:31:28 PM6/14/22
to felix
The heavy open recursion check, src/test/regress/rt/recog-11.fdoc is giving
the correct answer but the output is wrong:

Xlib=list((expr, [(term "+" expr) | term]), (term, [(factor "*" term) | factor]), (factor, [(atom "^" factor) | atom]), (atom, [("(" expr ")") | "9"]))
Closure=list('atom', 'factor', 'term', 'expr')
Got parser
Test1: End pos (should be 11)=@1
Test1: End pos (should be 11)=@3
Test1: End pos (should be 11)=@5
Test1: End pos (should be 11)=@11

should be

Xlib=list(expr ::= [(term "+" expr) | term], term ::= [(factor "*" term) | factor], factor ::= [(atom "^" factor) | atom], atom ::= [("(" expr ")") | "9"])
Closure=list('atom', 'factor', 'term', 'expr')
Got parser
Test1: End pos (should be 11)=@1
Test1: End pos (should be 11)=@3
Test1: End pos (should be 11)=@5
Test1: End pos (should be 11)=@11

The problem is that the Xlib should be a list of prodictions.
And it must be or the answer wouldn’t be right.

But it is printing as a list with a single element, which is itself
a list of pairs of a nontermal and production.

My guess was a maladjusted fixpoint term which hadn’t been adjusted
after beta-reduction. However I just added a check in beta-reduction
on both the function body and the argument, to see if either is incomplete,
and it never happens in any test case.

The type of that thing is this:

typedef gramlib_t = Grammars[recog_t]::gramlib_t;

Now here’s how it’s defined:

class Grammars[Terminal_t] {

typedef generic_gramentry_t[T] = string * T;
typedef generic_gramlib_t[T] = list[generic_gramentry_t[T]];
typedef generic_grammar_t[T] = string * generic_gramlib_t[T];


typedef open_prod_t[T] =
(
| `Terminal of string * Terminal_t
| `Nonterminal of string
| `Epsilon
| `Seq of list[T]
| `Alt of list[T]
)
;

instance[T with Str[T]] Str[open_prod_t[T]]
{
fun str: open_prod_t[T] -> string =
| `Terminal (s,r) => '"' + s + '"'
| `Nonterminal name => name
| `Epsilon => "Eps"
| `Seq ss => "(" + catmap " " (str of T) ss + ")"
| `Alt ss => "[" + catmap " | " (str of T) ss + "]"
;
}

typedef open_gramentry_t[T] = string * open_prod_t[T];
typedef open_gramlib_t[T] = list[open_gramentry_t[T]];
typedef open_grammar_t[T] = string * open_gramlib_t[T];


// FIXATE
typedef prod_t = open_prod_t[prod_t];

// CLOSE RECURSION
typedef gramentry_t = open_gramentry_t[prod_t];
typedef gramlib_t = open_gramlib_t[prod_t];
typedef grammar_t = open_grammar_t[prod_t];

instance Str[gramentry_t] {
fun str (nt: string, p: prod_t) => nt + " ::= " + str p;
}

Now here’s the thing: a gramlib is a list of gramentry, and a gramentry is
a list of pairs.

The instance of Str prints a gramentry with a ::= separating the components.
However, a Str of a pair just prints a pair.

So actually BOTH forms are correct because we have a structural type and one
thing I keep saying about type classes is that they should ONLY be used on nominal
types. Instead, generic operations should handle structural types.

The problem here is, why did the system not detect a conflict, or, why
did the more specialised gramentry operation NOT get chosen?

Note the call in the code:

// library
var xlib = ([
("expr",xprod),
("term",tprod),
("factor",fprod),
("atom",atom)
]);

println$ "Xlib=" + xlib.str;

Now, there is a Str for lists, which is used, and prints a comma separated
list of elements inside “list(“ .. “)”, and it’s seeing here a pair, which is
printed by code shown at the bottom of this post. That code uses an alternate
form of a tuple type

head ** tail

in which tail must be a tuple. The implementation is made a bit complex
by too factors: first, we want parens around the whole tuple, but NOT
around the tail, so there are two classes, one for the outermost tuple,
and one for an inner one.

In addition, since Felix does not have a particular one element tuple,
the polymorphic recursion has to stop at a pair instead of a trailing
unit tuple. In addition, we have to handle the fact that a tuple of all the
same elements is an array, so there is a special case for a pair
of the same type. If there are three elements of the same type,
I don’t know what happens .. arrays also have a Str operation.

So it is all quite compicated. The question is why the specialised
type with a pair

gramentry_t = string * prod_t

was not selected by the type class instantiation routine (which is a form
of overloading) instead of the generic printer for all pairs.

Note that the generic printing operations use polymorphic recursion,
meaning, the type class instance functions are NOT recursive,
because they’re applied to different types in each expansion,
and hopefully those expansions terminate.

SO: it’s not a misplaced fixpoint at all. There’s a bug I think, in that
the specialised str routine accepting

string * prod_t

is not preferred over the routine taking

U * V

I have no idea why … but it used to work, possibly because
a *reference* to a typedef would have excluded the second case
and accepted the first one, but after expansion the same thing
should happen but doesn’t: a pair where the first argument
is a string should be preferred.

Note, it’s trivial to fix this problem by using a nominal type somewhere
instead. But that would ignore the fact the typeclass instantiation
is not proceeding as expected. Possibly, a missing beta-reduction
is the reason so the ::= str function is considered not to match at all.
Remember unification only works with actual type terms.


//------------------------------------------------------------------------------
// Class Str: convert to string

// Tuple class for inner tuple listing
class Tuple[U] {
virtual fun tuple_str (x:U) => str x;
}

instance[U,V with Str[U], Tuple[V]] Tuple[U ** V] {
fun tuple_str (x: U ** V) =>
match x with
| a ,, b => str a +", " + tuple_str b
endmatch
;
}

instance[U,V with Str[U], Str[V]] Tuple[U * V] {
fun tuple_str (x: U * V) =>
match x with
| a , b => str a +", " + str b
endmatch
;
}

// actual Str class impl.
instance [U, V with Tuple[U ** V]] Str[U ** V] {
fun str (x: U ** V) => "(" + tuple_str x +")";
}

instance[T,U] Str[T*U] {
fun str (t:T, u:U) => "("+str t + ", " + str u+")";
}
instance[T] Str[T*T] {
fun str (t1:T, t2:T) => "("+str t1 + ", " + str t2+")";
}

open[U, V with Tuple[U **V]] Str [U**V];
open[U, V with Str[U], Str[V]] Str [U*V];




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 15, 2022, 10:36:08 PM6/15/22
to felix
So I have half the solution. I found the argument was not minimised.
This could be the result of changing the way things got expanded.
So I minimised it. Now we’re closer but it still doesn’t match.
I know why: here’s the argument:


Trying to instantiate virtual str<17759>[Nominal string *
([(`Alt<102340519> of (Nominal list[fix-2_a]:TYPE)|
`Epsilon<547317678> of (unit)|
`Nonterminal<193601615> of (Nominal string)|
`Seq<118406739> of (Nominal list[fix-2_a]:TYPE)|
`Terminal<1003285653> of (Nominal string * (record(inp:(Nominal ischannel[Nominal Buffer]:TYPE),out:(Nominal oschannel[Nominal Buffer]:TYPE)) -> (unit -> void))))]
as fix-2_a )]

And here’s the instance:

index: 79374, parent: 79372, id: str
bbdcl: fun index_0::Grammars::Str::str[Terminal_t<79240>:TYPE]((val nt<79376>: Nominal string, val p<79377>:
([(`Alt<102340519> of (Nominal list[fix-2_a]:TYPE)
| `Epsilon<547317678> of (unit)
| `Nonterminal<193601615> of (Nominal string)
| `Seq<118406739> of (Nominal list[fix-2_a]:TYPE)
| `Terminal<1003285653> of (Nominal string * <T79240>))] as fix-2_a ))): Nominal string{

They match exactly EXCEPT for that pesky type variable T79240,
that is the type of the terminal evaluator which is

record(inp:(Nominal ischannel[Nominal Buffer]:TYPE),out:(Nominal oschannel[Nominal Buffer]:TYPE)) -> (unit -> void)))

in the input argument. It’s the parameter of the Grammar typeclass, which is generalised
over terminals:


class Grammars[Terminal_t] {


typedef open_prod_t[T] =
(
| `Terminal of string * Terminal_t
| `Nonterminal of string
| `Epsilon
| `Seq of list[T]
| `Alt of list[T]
)
;

typedef prod_t = open_prod_t[prod_t];

typedef open_gramentry_t[T] = string * open_prod_t[T];
typedef open_gramlib_t[T] = list[open_gramentry_t[T]];
typedef open_grammar_t[T] = string * open_gramlib_t[T];


typedef gramentry_t = open_gramentry_t[prod_t];
typedef gramlib_t = open_gramlib_t[prod_t];
typedef grammar_t = open_grammar_t[prod_t];

instance Str[gramentry_t] {
fun str (nt: string, p: prod_t) => nt + " ::= " + str p;
}




Remember this code USED TO WORK. The instance there is nested insice
the Grammar class, so is parametrised by the Terminal_t type variable.
That’s what we’re seeing. However the unification will only work, if we’re
solving for Terminal_t, but actually we’re trying to find the T in Str[T] from
the T in str:T -> string.

A simple match would work fine if we were solving for Terminal_t.

Note the instance of Str we’re examining is NESTED in Grammar class.
The question is why this worked before and doesn’t now. Either I missed
a substitution which eliminates the Terminal_t, or we should also be solving
for it.

The question is: how did it work before?? What changed to stop that
working? It’s not a question of how does the code account for parent classes
in instances, that’s tricky but again: it worked before. Again: the algorithm
makes assumptions that used to be true, but are not now.

Note: this started failing quite some time ago, I ignored it since it was
just a minor mismatch in the expected result. So it was an early change
made as part of the refactoring to make binding just do binding, and no
substitutions.

Note typeclass stuff is invoked when MONOMORPHISING code,
and the end result completely eliminates type classes.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 16, 2022, 3:15:26 AM6/16/22
to felix


> On 16 Jun 2022, at 12:36, John Skaller2 <ska...@internode.on.net> wrote:
>
> So I have half the solution. I found the argument was not minimised.
> This could be the result of changing the way things got expanded.
> So I minimised it. Now we’re closer but it still doesn’t match.
> I know why: here’s the argument:

Nope. The function type itself wasn’t minimised either.
So I took a brute force approach: beta-reduce now minimises its result.
That fixed it :-)

Interesting: the same error in two places!


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 18, 2022, 11:28:31 PM6/18/22
to felix
At the moment, if I run make test, Ctrl-C only kills the child process.
Since the tests run in batch mode, I have to hit Ctrl-C many many times to stop it.
I need to fix this but not sure how to do it. Here are the two core routines for Posix:

gen raw_get_stdout(x:string) = {
var fout = PosixProcess::popen_in(x+" ");
if valid fout do
var output = load fout;

var result = PosixProcess::pclose fout;
return PosixProcess::WEXITSTATUS result, output;
else
println$ "Unable to run command '" + x "'";
return -1,"";
done
}


Note it’s bugged in that it doesn’t handle it correctly if the child died with a signal.

//$ System command is ISO C and C++ standard.
gen raw_system: string -> int = "::std::system($1.c_str())"
requires Cxx_headers::cstdlib
;
//$ basic command with line quoting.
gen basic_system (cmd: string) :int =>
cmd.quote_line_for_system.raw_system
;

This one is tricker since it calls C’s system command (that is, it’s platform
independent).

On Posix, it returns the same pid_t as the popen stuff above.

I think i need to add this:

if PosixProcess::WIFSIGNALLED(result) perform
if(PosixProcess::WSIGINT==Signal::SIGINT) perform
exit(result);


Felix Win32 does not have the W** macros at the moment
although they’re probably available.

Would this work?
How do I do it on Windows?



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 19, 2022, 11:00:08 PM6/19/22
to felix
Ok, this is done for Posix. shell and popen operations (shell, get_stdout) check
if child killed by signal, and exit the parent if so.

Not sure how to do on Windows.

Any signal, not just Ctrl-C (sigint).

Currently no option on this. Probably another level is needed.

Some pain because get_stdout uses popen which is platform dependent,
whereas system uses the C system function which is platform independent
but returns an int whereas Posix systems return a process status type.

Signal handling is basically crap. The changes above only work with
shell and popen, not exec* operations which are lower level.
Also, it doesn’t help with threads (which in general unix systems can’t
handle correctly anyhow).

So basically it’s a bit of hackery, just so flx —batch —nonstop operations
which run test suites can be terminated. Prior to the change, you could
kill a single test but then the next one would start, unless you were
lucky and the Ctrl-C happened to interrupt flx process rather than a child.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jun 21, 2022, 2:17:10 AM6/21/22
to felix
I’m trying to work out a framework for reasoning about objects.
My high level thesis is that kinding systems can manage storage.
In particular things like uniqueness and linear typing calculii do not really
capture the much broader nature of memory management issues.

I think there are three elements:

1. Storage/memory/objects have a few core properties eg readable/writable
but also lifetime properties

2. A variable is a reference to an object. It is universally a unique reference,
It has properties restricting what can be stored in it.

3. Pointers. These are values with rich properties related to
what objects are pointed at, and related closely to where they can be
stored themselves.

The key thing we need to manage is lifetimes. I suspect purely type based
properties like uniqueness do not work.

There is also a connection with allocators. I am going to throw out the
machine stack. There is no stack. However we can have an allocator
that works like a stack instead. One obvious thing we need to model
is this: we take the address of an object in a subroutine based system
where calls allocate a data frame, and returns discard it, then it a safe
stack pointer *variable* is one that can hold a pointer to an object
on an “earlier” or current frame but NOT a later frame (because the frame
will be deallocated before the variable is)

This means the lifetime of the object pointed at exceeds that of the
object containing the pointer. Notice that the constraint is placed
on the *variable* and has nothing to do with the type of object pointed at.

One way to enforce this is quite simple: the variable need only be
immutable once initialised. The point is, it has to be initialised by
an address in the current or up stack frame during the construiction
of the current frame *before* a subroutine which construicts a new
frame is called .. so as long as it must be the address of an object
in a stack frame, and must be initialised when the frame is constructed,
it cannot ever be invalid.

So we have a suffcient condition to ensure the variable cannot
contain an invalid pointer. Clearly the TYPE of object to which the
pointer refers is not important.

Now we want to relax the conditions. Can we have a constraint
that allows the variable to be mutable? The problem is: how to
express the (obvious) constraint: any pointer can be put there
provided it is “upstack” from the address of the variable itself.

Here’s another one: can we put a heap pointer in there?
The answer is obviously YES provided the heap object refered
to remains live until after the variable id dead. In particular
if the allocator used to create the object

1. Is itself stored in the frame AND
2. deletes all its memory when it is deleted

then it is basically a “stack respecting heap”.

I have no idea exactly how to express these kinds of constraints
statically (so they can be enforced). However C programmers
enforce these constraints with programming disciline all the time.

We have uniqueness and linear logic, but how do we constrain
reference counting?

Anyhow the point here is there are three entities: objects, variables,
and pointers. Type theory is not enough because it leaves out variables.
in particular value semantics is not good enough. Functional programming
is entirely value semantics. No good!



John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jun 21, 2022, 3:09:05 AM6/21/22
to felix
I have been thinking about this for a while, as I like the lifetime management in Rust, but dislike the proliferation of so many flavours of pointer.

I don't like mark/sweep garbage collection because of the unpredictable delays, and it also doesn't scale well for multi-processor systems.

I think the best place to start is the semantics of runtime reference counting. Assume every object contains a reference count that triggers a destructor when the last reference goes out of scope.

You can then use linear types to optimise away the reference count in cases where the compiler can prove it is safe.

In this way you get objects with consistent semantics, so that the code does not need to distinguish between reference counted and static objects on the heap.

It also allows the compiler to implement ever more sophisticated proof of static memory allocation, and improve program performance, without having to rewrite code.

Cheers,
Keean.



--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jun 21, 2022, 8:33:55 PM6/21/22
to felix


> On 21 Jun 2022, at 17:08, Keean Schupke <ke...@fry-it.com> wrote:
>
> I have been thinking about this for a while, as I like the lifetime management in Rust, but dislike the proliferation of so many flavours of pointer.

I don’t know Rust .. explain?

However Rust basically supports only moving i.e. unique ownership PLUS borrowing.
Reference counting covers a LOT more cases, so the idea is to pick

(a) weak pointer (an ordinary pointer safely used)
(b) owned pointer (move no copy)
(c) borrowed pointer (do what you like as long as it is forgotten in time)
(d) ref counted sharing

The problem with ref counting is avoid cycles. In theory, you can make the uplinks weak pointers
if you need cycles, or just avoid cycles.

The question is: how to statically enforce these different methods??

It isn’t just an attribute of the pointer, but also of the variable holding it.

So for example you might have

weak var p : &T

meaning the pointer stored in p has to be a weak pointer.

It’s a “type system” we need, but it is not the type of an object but a separate attribute.
In fact three augmentations: storage kind, pointer kind, and variable kind.

So there are TWO analyses:

1. Validity of the pointer (is the storage still live)
2. Initialisation of the object to which the pointer points (has the object been initialised yet?)

Actually: 2a: Initialisation of the object which a variable refers to.

For exclusively owned object, Felix has “unique” stuff an I already have an
algorithm for calculating if a variable if live or dead, although it cannot handle
pointers. It happens to be linear!!!

The analyses for “definitely initialised” is work in progress but is NOT linear.
Again .. the standard data flow algo can’t handle pointers.


> I don't like mark/sweep garbage collection because of the unpredictable delays, and it also doesn't scale well for multi-processor systems.

Garbage collection must be either

(a) localised
(b) excluded

to meet my requirements (for C@): it has to handle kernel level development all the way
to large scale multi-computer distributed processing.

Ideally a GC language like Felix can still be used as a host for launching and managing
stuff. For example a real application I need is Apple Audio processing which has
to be real time multiprocessor and so excludes GC, and I actually have the design
for a mixer (as a test) with those properties, but it is Felix code so it’s using my
GUI code based on SDL2 which requires GC. Also, in Felix coproducts/sums./variants
are boxed and so require GC.

I need to redesign the system so the GUI can be separated from the signal processing,
and also at least add a new coproduct which doesn’t use boxing. The idea for C@
is a complete redesign and rewrite.

> I think the best place to start is the semantics of runtime reference counting. Assume every object contains a reference count that triggers a destructor when the last reference goes out of scope.

I actually start a bit earlier: I have already coded some allocators. That’s the core, real time
systems cannot use malloc/free and C++ allocator design is rubbish.

One allocator is an arena allocator, that’s the simplest allocator.
Sometimes called a bump allocator. Initialised with a finite amount of store,
it just bumps a pointer to allocate and ignores deallocation. The destructor
deletes the arena.

Allocators themselves are reference counted, and most allocators get storage
by delegation: asking another allocator for store they then manage.

Since real time programs are finite state machines you can always pre-allocate
enough store in one go (calculating it is a not so easy).

>
> You can then use linear types to optimise away the reference count in cases where the compiler can prove it is safe.

Yes, that’s sort of the idea. But the problem is to eliminate ref-count cycles.
So some things cannot be allowed. The question is how to do this statically.


>
> In this way you get objects with consistent semantics, so that the code does not need to distinguish between reference counted and static objects on the heap.
>
> It also allows the compiler to implement ever more sophisticated proof of static memory allocation, and improve program performance, without having to rewrite code.

I think you still need kinding system. That is, you cannot optimise anything with some reasonable
starting point. So you do need to distinguish at least enough to eliminate strong cycles.

With pointers all over the place this simply isn’t easy in fact it is probably impossible.
The question is how much you can do with proof hints plus a performant SMT solver.

As an example:

A tree can work if child pointers are owning pointers.
In this case, uplinks to parents can be weak pointers.

If go to a DAG, the child pointers have to be ref counting pointers.
The parent pointers can still be weak pointers.

Theorem: If there are more cycles, it can work without GC if, and only if,
we can identify which pointers are up pointers and make them weak.

I don’t know if that theorem is true or not. In any case, the problem is clearly
to identify uplinks. So it’s design dependent. If you’re building a road network,
then all the links have equal status and GC is required.

In that case we use a new question: can we ISOLATE the cyclic graph?
It will still need GC, but that’s fine if there is a finite bound on the number
of roads, for example, which will always be the case for a real time
program: then you can just calculate the lag and demand enough
processing time to say run the GC every cycle.

But the GC has to be isolated. We need separation logic in some form.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 10, 2022, 8:56:18 PM7/10/22
to felix
I’m looking at intersection types, for which Felix has some support, but not enough.
You can write this code:

////////
typedef r1 = (a:int, b:string);
typedef r2 = (a:int, c: double);
typedef r = r1 & r2; // intersection should be (a:int, b:string, c:doule);

var x : r = (a=1, b="Hello", c=4.2);
println$ x._strr;
//////

but it doesn’t work. However somewhere somehow, Objective C interfaces can be intersected,
and they’re just records of methods. The only actual handling of intersection semantics appears
to be in the unification engine:

| BTYP_intersect ts, t ->
(* print_endline ("Flx_unify: Argument as subtype of intersection parameter, must be subtype of each intersectand"); *)
List.iter (fun p -> add_ge (p, t)) ts

This says are argument of a function is OK if it’s type is a subtype of each of the
intersectands of a parameter with an intersection type.

The sample code typechecks, since a variable assignment has the same structure
as binding a parameter to an argument, but _strr doesn’t work because it generates
code based on the unbound type, and can’t handle intersections.

However:

| t,BTYP_intersect ts ->
(* print_endline ("intersection as subtype of some type "^Flx_btype.st t^ " not implemented yet"); *)
raise Not_found

so the support for intersections is incomplete. Beta doesn’t do any reductions.
Binding code will insert a coercion in our example which isn’t handled by the coercion module.

There are at least two cases which are reducible. For primitive types, the intersection
is the least supertype of which all the intersectands are subtypes. Since we have a graph
of the required data that can be computed, if one exists. For example if A < X, and B < X,
then A & B < X, so X is a candidate, and should be chosen unless there is a Y < X such
that A < Y and B < Y (in which case Y is chosen). So it’s a “least upper bound”if going
up means going to a supertype.

The other case we can calculate is with record types: the intersection of two record
types is just the record with all the fields of both.

Unfortunately there is a PROBLEM with scoped records, that is, records with
duplicated labels. If you look at the example, the field “a” is in both r1 and r2.
If duplicates were disallowed, the result would have three fields a,b,c.

However if duplicates are allowed, we would have TWO “a” fields, and the question
would arise: which comes first? Any answer to this question destroys the assumption
that intersection is symmetric, i.e. A & B = B & A.

Instead, we have to decide intersections are ordered, and that means there must be
two “a” in the record type r not one. If we trash scoped records, then row polymorphism
is much harder because the row variable must be constrained to exclude duplication
of known fields, which makes row polymorphism/unification much more difficult.

Note with scoped records, you have two “a” fields with the same type or different
types: there’s no way to discard one of them in the different type case.

For some context, consider ObjC interfaces: it is quite possible you want
an object with methods from two interfaces, and some methods do indeed
have the same name. They may even have the same parameter types.
Calling a method with that name with scoped records is never ambiguous
because they’re ordered and we always pick the first one (you can still
call the second one with a bit more work to choose it). However if you consder
say “clone()” method, you actually want to call an override that clones the
whole object, neither of the two subobject clones!

Apart from this problem, it’s still unclear how to unify when an argument
has an intersection type. The ideal solution would be for beta to eliminate
intersections.

There are other cases where we can do intersections. The problem is,
there’s no general way to find the least common supertype of the
intersectands. Note in particular, we have to consider *depth* subtyping
as well as width subtyping. A simple case is two records with the same
field name but the argument of that field it itself a record.

I haven’t even though about function type intersections :-)




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 10, 2022, 9:48:17 PM7/10/22
to felix
When I run the test case I get this:

~/felix>flx ist
WARNING: multiple overload succeed after expanding intersections
CLIENT ERROR
[flx_frontend/flx_typeclass.ml:752: E367] [Cannotmatch] Cannot instantiate virtual str<17763>[record(a:(Nominal int),c:(Nominal double))]
In /Users/skaller/felix/src/packages/strings.fdoc: line 37, cols 3 to 44
36: class Repr [T with Str[T]] {
37: virtual fun repr (t:T) : string => str t;
******************************************
38: }

Searching the code base I find the top level overload routine handles intersections explicitly.
Basically it says: if we have an intersect A & B as an argument, then either of the overloads

f: A -> …
f: B -> …

will match. That is, since the argument has type A *and* type B, you can apply
*either* a function taking an A, *or* a function taking a type B. Both will work
because the argument type is a subtype of both.

Felix picks the “first" overload, arbitrarily and issues a warning. In the test case,
it appears to pick the _strr function generated for B that is, r2 in:

////
typedef r1 = (a:int, b:string);
typedef r2 = (a:int, c: double);
typedef r = r1 & r2; // intersection should be (a:int, b:string, c:doule);

var x : r = (a=1, b="Hello", c=4.2);
println$ x._strr;
///

The problem is it isn’t doing a coercion. Normally, cal_apply should coerce an argument
to exactly match a parameter. Somehow that seems to be bypassed, in any case
the coercion module cannot handle intersections at the moment. Obviously printing
the r2 part of the intersection will miss the “b” field. (r2 is probably “first” because the
list of intersectands is reversed during processing).

The actual problem is that _strr delegates to str when it sees a type it doesn’t understand
as a structural type. It does understand records, after all _strr was originally implemented
to mean “string of record type”, that’s what the extra “r” means. But it doesn’t understand
intersections so it just calls repr which then fails.

However repr doesn’t see the intersection .. because the overload routine unpacks
intersections and tries to find any function that will work on any intersectand,
since that should be enough. However, repr doesn’t then call _strr even though it
has a record type from one of the intersectands, which explains the failure.

repr cannot call _strr because that woud be an infinite recursion.

It would work if we have a specific routine for the *reduced* result of the
intersection.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 10, 2022, 10:04:18 PM7/10/22
to felix
Justy FYI I happen to be doing some data analysis at the moment
and related to database SQL joins there are two joins: inner and outer.

AN intersection is actually a meet not a join, but for records, it ends
up being a join of the fields anyhow. So actually with scoped records
there might be two intersection operators: one treats the records
as if there were no duplicates (ignores duplicates), merges equal
fields, and barfs if the types of a field in both intersectands are
incompatible (it intersects the field types).

The other just keeps all the fields, including duplicates, with the left
operand’s fields going first.

Records are a pain. Even without duplicate fields there are TWO
kinds of record: closed and open.

A closed records A is a subtype of a closed record B only if
they have exactly the same fields, and the field arguments
of A are subtypes of B. So depth subtyping but no width subtyping.

Open records have width subtyping as well.

Now a nominally type record is a struct, and these are
clearly CLOSED record types in C/C++, except that
if you *explicitly* derive A from B, then A is a subtype of B
(provided the type variables unify as required of course).

Note also in C++, a struct method in a derived class hides one
in the base so we have a kind of “scoped” record thing happening there too.

So the argument is, we should use open records, because structs already
give us closed records, as well as a way to allow limited subtyping.
[Note Felix unification allows nominal types to be subtypes but it does
NOT support covariance yet, that is, it doesn’t work for polymorphic
nominal types]

Having two intersection operators will be very confusing though.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 11, 2022, 12:33:48 AM7/11/22
to felix
Ok what I’ve decided to implement is to beta-reduce intersections of records as
follows: we first concatenate all the fields. Then we MERGE all fields with
the same name into a single field, whose argument type is the intersection
of the argument types.

This means an intersection never has duplicates in it. It also means
the intersection of a record with duplicates exists and is the record
without duplicates. This is a bit weird!

At present you cannot intersect a record with nothing, that is, you have
to have at least two operands, if you give one, the constructor just returns
it as is. But now, intersection has a new meaning, I will have to fix it.

The definition has this new and very nice property now: since an intersection
always yields a record without duplicates solo intersections can be used to
prepare scoped records for operations that only make sense when there
is no duplication. Note in particular

intersect (a: (x:int,y:int), a:(x:int,z:int))

yields

(a:(x:int,y:int,z:int))

At least it will when I fix the constructor to allow a single intersectand.

This is what you would expect from

(a: (x:int, y:int)) & (a: (x:int, x:int))

so in some very cool sense it is actually consistent. The implementation uses
a special check for type equality to avoid an infinite recursion of intersections.

It will fail to beta reduce at the moment unless either common field types
are equal, or everything is a record. It doesn’t work for polyrecords
and tuples are a degenerate case that may not be correctly handled:
since a tuple is just a record with blank duplicate field names, the intersection
of a tuple is a single type which is the intersection of all the types.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 11, 2022, 2:01:37 AM7/11/22
to felix
Ooo… this is looking nicer and nicer .. which means it might even be correct!

So the basic notion is you have the intersection of records.
So you gather all the fields into a single list which is sorted by name.

Then, for each sequence of fields in that list with the same name,
you get a single field with type the intersection of all the types.

We define the intersection of any type t with itself as t, which is
the first way to terminate the reduction.

We also specify subtyping rules for nominal types, which terminates
the reduction either by finding the smallest supertype in the graph.

If we allow the type “any” to be the supertype of everything, then this
computation on a finite graph always terminates successfully.

We can also decide that “any” is the least supertype of incompatible types.
So now ALL intersections must terminate trivially UNLESS there is nasty old recursion involved.

So now, the intersection computation will ALWAYS terminate with a sane result,
where “any” is an insane result considered sane (modulo recursion again).

Now here is an interesting case: intersection of tuples. Well they’re just records
with blank fields, so the result is a record with ONE blank field of type the intersection
of the components. A record with ONE blank field is a tuple with ONE field which is
just that type.

The point is the computation reduces automatically to a plain type intersection.
In particular

Intersect (A * B) = A & B

is clearly a special case of how record intersections work.

It should be interesting to consider intersection of sums :-)

For polyrecords, the intersection is the same as records with the extra
bit thrown in. In other words

forall U. (a:int | U) | forall V. (b:int | V)

is just

forall W. (a:int,. b:int | W)

The V and U don’t need to be intersected. The only hassle here is that
for a function parameter you’d have

fun f[U,V] (x: ((a:int | r:U) & (b:int | s:V)) => …

and now we lose the (convenience) name of the result

(a:int, b:int | W)

Note that the result W is actually going to be any collection of fields
except the initial a and b. It can still have a and b fields in it if the argument
has two of each: the intersection only “unifies” the specified head fields.

Felix row polymorphism only works when the quanitifier is already bound
at a higher level, eg as variables of function f here. There’s no actual type

forall W. (a:int, b:int | W)

because Felix has no actual run time polymorphic types. So in the example,
there’s no “label” for the residual (it’s actually “anything MINUS the two headers so to speak).
It’s not clear why you’d do this anyhow …mmm...



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 11, 2022, 2:13:22 AM7/11/22
to felix

typedef r1 = (a:int, b:string);
typedef r2 = (a:int, c: double);
typedef r = r1 & r2; // intersection should be (a:int, b:string, c:doule);

var x : r = (a=1, b="Hello", c=4.2);
println$ x._strr;

~/felix>flx ist
(a=1,b='Hello',c=4.2)

:-)


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 11, 2022, 5:40:05 AM7/11/22
to felix
Whoohoo!

typedef r1 = (a:long, b:string);
typedef r2 = (a:int, c: double);
typedef r = r1 & r2; // intersection should be (a:int, b:string, c:doule);

var x : r = (a=1, b="Hello", c=4.2);
println$ x._strr;

Result:

(a=1l,b='Hello',c=4.2)


Notice, the type of a is long because int & long = long because long is a supertype of int.
It actually coerced the argument as well!

The actual code is not completely correct but it seems to work for intersection of two types
and probably more than two but not one .. not sure how to even write that!

Now, I’m thinking, what happens if there are TWO least supertypes??
At the moment Felix picks just one of them arbitrarily.

But actually I’m thinking NO .. that’s not beautiful!
Algebra SHOULD be beautiful!

So if there are in fact TWO supertypes .. it should return BOTH!

How can it do that? DUH! A record with two fields of the same name,
with the types for each supertype! The only nasty is that these will be
ordered when there is no reason I can think of for that.

In fact consider the type

(a: u & v)

then the intersection of that type should be

(a: u, a: v)

i.e. it should actually SPLIT irreducible intersections open.
And to be consistent

u & v = u * v

iff u,v cannot be intersected … because that is a record:

(n””: u, n””: v)

where n”” is the way to spell the name of a blank field. HMMMMM ….

:-) :-)



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 12, 2022, 3:57:39 AM7/12/22
to felix


> On 11 Jul 2022, at 19:39, John Skaller2 <ska...@internode.on.net> wrote:
>
> Whoohoo!
>
> typedef r1 = (a:long, b:string);
> typedef r2 = (a:int, c: double);
> typedef r = r1 & r2; // intersection should be (a:int, b:string, c:doule);
>
> var x : r = (a=1, b="Hello", c=4.2);
> println$ x._strr;
>
> Result:
>
> (a=1l,b='Hello',c=4.2)
>
>
> Notice, the type of a is long because int & long = long because long is a supertype of int.
> It actually coerced the argument as well!

OOPS! Intersection should find the greatest lower bound, the biggest common subtype.
Thanks to Pierce PhD thesis! [I always get up and down confused .. that’s what comes
of being Australian .. :]




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 12, 2022, 10:37:06 PM7/12/22
to felix
OK so now some basic rules for intersections work for two types: records and monomorphic primitives.
I need to verify some properties and fix the routines to be “more correct” and expand the types handled.
At the moment, there are some issues because Felix records are more general that plain records.

There are several next steps. The first is to repeat everything for unions. It’s the same stuff, dualised,
which means the algos for records become algos for polymorphic variants when dealing with unions.

The second step is to generalise “monomorphic primitives” to support polymorphic ones.
This should be simple enough. Consider a type constructor A of T which is a subtype of
P of T. Note I use A and P to help my brain understand which one is the subtype: A means
“argument” and P means “parameter” so we require the argument to be a subtype of the parameter.

If A[T] is a subtype of P[T] and the parameter is covariant then A[X] is a subtype of P[Y] provided
also X is a subtype of Y. If the parameter is contravariant, we need Y to be a subtype of X instead.
We need to handle multiple parameters, and we need to classify them as one of these:

+1: covariant
-1 contravariant
0: invariant

Covariance is most common, however function spaces go -D -> +C. So consider

struct B[T] { a: T };
struct D[T] : B[T] { b:T; };

Now D is a subtype of B, and the parameter is covariant. But is we had

struct F[T] : B[T] { f: T -> T; }

now the parameter is invariant because the domain of f is contravariant and the codomain
of f is covariant so the whole type F requires T to be invariant. This means

F[T] <= B[T]

however if X < Y

D[X] <= B[Y] provided X <= Y

but not ( F[X] <= B[Y]).

However the rules are non-trivial as follows: for the type B, we need to specify the variance
oif the type parameter, if we have more than one parameter, then each of them. This allows
us to decide if B[X] <= B[Y] knowing only how X,Y are related.

However a derived type D even with the same parameter, need not actually be a subtype.
More generally, it can have any number of parameters:

struct D[U,V] : B [ U * V] {…}

More generally again, a polymorphic type D need not be derived from a polymorphic type D
for there to exist a subtyping relation, but that relation has to be parametrised by relations between
the type variables. In particular saying

B[U] <= B[V] if U < V

is a longhand way of saying B is covariant in its parameter. So the shorthand B[+T] can be used,
and in general for any number of parameters like

B[+U,V,-W]

meaning covariant in U, invariant in V, and contravariant in W but this shorthand doesn’t extend
to relating B to D. Suppose

D[+J,K,-L]

we now knoe the subtype relations between instantiations of D and of B, but if some instance
of D are subtypes of instances of B, how do we state the constraints for that to be the case?
Seems like we have to say this:

D[J,K,L] <= B[U,V,W] when J < f(U,V,W] and ….


There may be more than just three formulas here because we might ALSO have

J > f2(U,V,W)

Note that in the specialisation algorithm handling nominal types, we can solve
these things by recursion. In fact, we already do exactly this kind of thing
for structural types, such as tuple types, etc, except in those cases we
are HARD CODING the known relations. EG for tuples, they’re covariant on
all components so we just push l_j >= r_j for each component index j into
the set of equations to solve. That’s only for a single fixed contructor (tuple
formation). However we can, for example, compare tuples with arrays
because we know an array is a subtype of a tuple:

array[T,N] <= tuple (T1, T2, … TN) provided T <= T_i for all i = 1 to N

So this is an example where the type constructors are distinct but related
and the subtyping condition is explicitly stated above (the compiler is actually
even more general because it can do this for any integer N). We will be
happy if we can find a way to encode the relations between two constructors
each with a fixed number of parameters.

So basically the solution is quite nasty: to do this in general, when we write the
subtyping rule, we have to give a set of inequalities which the algorithm
“adds” to the set to be solved.

But here’s the really REALLY nasty thing: this will be happy if a bit hard
to encode for specialisation. But our objective is grander

WE WANT TO USE JUST THOSE RULES TO FIND INTERSECTIONS


The problem is that it may be X > Y under some rules and Y < X under others.
It;’s KNOWN (from Pierce thesis) that in general intersection computation
is undecidable.

BTW: simple example:

tuple(X,Y) <= array[T,2] when X <=T and Y <= T

shows an case where a tuple is a subtype of an array rather than the other way
around. The compiler WILL (I hope) handle this for specialisation purposes,
but it’s quite unclear how to handle this kind of thing for intersections,
even when the semantics of tuples and arrays are precisely known
because they’re structural types.

Note that the intersection (and soon union) problem arises inside the unification
algorithm itself since you can get parameters and arguments of these types
containing type variables. in Felix there is ALREADY a problem because principal
typing is lost (meaning .. there is more than one solution). Specialisation has
to produce a single MGU and you cannot do that unless it is unique, or you just
pick one solution. Picking one solution is NO GOOD because it is FINAL, meaning,
you can no longer combine it with further constraints .. because those constraints
might knock out the solution that was picked, and retain the one already discarded.

It is interesting because in parsing this problem is solved by backtracting in a simple
parser, or concurrently maintaining all solutions in a GLR parser.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 13, 2022, 1:48:41 AM7/13/22
to felix
Back to basics. Felix has construction used here:

supertype vlong: long = "(long long)$1";
supertype long: int = "(long)$1";
supertype int : short= "(int)$1";
supertype short : tiny = "(short)$1”;


which has all the usual forms of a function, including support for polymorphism
and definitions in Felix rather than C. These specify that the LHS type is a supertype
of the RHS and provide a special function RHS -> LHS which can be used as an
implicit coercion. The syntax allows polymorphism, but Felix only implements the subtyping
rule for monomorphuc types. These will typically be primitives but can be any monimal types.

This allows, given a function taking a type P, to pass a type A, provided P is a supertype
of A, and the computation is performed by wrapping the specified coercion function
around the argument A (it should also work returning values as well).

There are (unchecked) requirements on the coercions: they have to be semantically transitive.
If there is a relation X < Y and Y < Z the compiler automatically applies X < Z by applying
both coercions in sequence. In addition if there is a *specified* coercion X < Z as well, it will use that
one instead of the composite, assuming it is semantically equivalent and performs better.

If there are two implied coercions from A to B, the compiler finds the set of shortest paths
and picks one of them at random.

There’s no rule against equivalences:

cartestian-complex <—> polar-compiex

is a great example I got from Pierce. Now, intersections are defined in terms of subtyping.
We want to define the intersection of A and B given by A & B is the greatest upper bound
in the preorder graph given by subtyping. Consider the rules:

X > A, X > B, Y > A, Y > B

then A is a subtype of both X and Y, as is B. So there are two types, A and B, which are
“greatest lower bound” of X and Y that is, X & Y = A and also X & Y = B. The graph is not
a lattice. The meet is well defined provided you allow the result to be a SET of things.

If you add another type Z with Z > A, NOW we have

X & Y & Z = A

A is a subtype of X, Y and Z, but B is only a subtype of Y and Y. This means you cannot
“pick” randomly from two possible results because that would imply that intersection was
not associative.

So there are two options: mandate the graph must be a (bounded) lattice so intersections are
unique, or simply allow a set to be returned. In the latter case, we do not need bounds,
since an empty set would be allowed instead. Otherwise if we have TOP element, then
the set is never empty.

But the problem is worse. If we have type variables in there, we cannot actually compute
the intersection until the type variable is instantiated, so we either return TOP (or empty set),
or we refuse to reduce the intersection yet.

Unfortunately the uniqueness (lattice) requirement doesn’t make sense modelling C++
because of multiple inheritance:

class X; class Y; class A: X, Y; class B: X,Y;

There is no unique maximal subtype of both X and Y: A and B are
both subtypes of X and Y.

The union situation is the same with the same diagram: A and B have
no unique minimal supertype.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 14, 2022, 8:07:15 AM7/14/22
to felix
Ok I think I have a coherent definition for intersection of records but I’m not completely sure.

Consider record type intersection

(a:A0, a:A1,b:B) & (a:A, c:C)

What we do is assign a natural number sequentially to each label to make the labels unique even
when repeated:

(a_0, a_1:A1 ….

Now there’s no duplication. Now we say that a record has an infinite number of components
where every component except those listed has type “any” which is the unit of intersections
and now for matching labels lab_index the type is the intersection of the corresponding types.
Finally we throw out any label_index with type “any”. So the result above is

(a:A0 & A, a:A1, b:B, c:C)

The advantage of this formulation is that it works nicely with tuples:

(A * B) & C = (A & C) * B

Ok so lets check the formula for polyrecords: this is much trickier! There is a coherence
condition: the intersection for polyrecords must be stable for any instantiation of the row variables.

So it’s NOT good enough to just intersect the row variables. Consider

(a:A | r:T) & (b:B | s:U)

So the formula is going to be found by alignment, but we need a way to say “the component
with name, index from V” so lets use V.name_index to mean that, and we need a way
to say “the record V with name_index removed”. We can at the moment remove only a label
with the index 0, to remove index 1 we have to remove index 0 twice (the indices get renumbered).
This is messy so I’m just going to write V - label_index for now.

It’s IMPORTANT that the index used is the one which would be assigned after substitution.
So we’ll have to make an adjustment. So the formula is

(a: A & U.a_0, b: B & T.b_0 | (T - b) & (s - a))

Note that this formula cannot fail in the sense that U.a_0 is always defined, although it
may be “any”. Similarly T - b is always defined: there’s always a “b” in T, although it may
have type “any”.

The formula generalises to any number of intersectands: a_0 has a type which is the
intersection of all a_0 terms (including itself), and the residual intersection is the
intersection of all the residuals (the bits after the | ) minus all the used components
from that residual (which excludes the labels in the prefix of that term).

Note that the subtraction removes N copies of a label, where N is the maximum
repeaty count of the label in any term. Since we always get the “leading index”
for a label out of a residual, we can drop the index if we’re careful: an index of
1 or more is possible only if one of the intersectands has a repeat, in which case
instead of U.a_1 we would write (U - a).a etc.

To visualise this we just write the intersectands as rows lined up vertically
with each column a label in alphabetical order (and retaining order of writing
with duplicated. The residual goes last, and we remove all the used labels
as we use them (the order doesn’t matter here, only the count of removals
for each label).



John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jul 14, 2022, 9:23:34 AM7/14/22
to felix...@googlegroups.com
Hi John,

Is "(Int * String)" a subtype of "Int" in Felix?

If not then this:

(A * B) & C = (A & C) * B

Doesn't hold, because in an intersection "(A & B)" the value must be typeable as both "A" and "B" modulo subtyping.

So we can only move the 'C' inside the product if  (A * B) <= A 

Cheers,
Keean.



--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jul 14, 2022, 10:31:03 AM7/14/22
to felix


> On 14 Jul 2022, at 23:23, Keean Schupke <ke...@fry-it.com> wrote:
>
> Hi John,
>
> Is "(Int * String)" a subtype of "Int" in Felix?
>
> If not then this:
>
> (A * B) & C = (A & C) * B
>
> Doesn't hold, because in an intersection "(A & B)" the value must be typeable as both "A" and "B" modulo subtyping.
>
> So we can only move the 'C' inside the product if (A * B) <= A

Right. According to my calculus the answer seems to be yes.
Basically (A * B) & C is computed by extending it to

(A * B) & ( C * any)

which yields (A & C) * (B & any) which is (A & C) * B since B & any = B.

The results for tuples follows from taking a tuple as a record with repeated fields with a blank name.
I think for unions, the missing component is made to be void, rather than any. Any = Nonsense = Universal
type = Top. Void = Bottom.

In Felix at the moment, unification algo (actually specialisation algo) doesn’t allow width subtyping
for tuples even though it is well defined, because it would be confusing to a programmer if

f : int -> int
f (1,”Hello”) // fine!

Ouch. I’d want that to be an error. But since a record with blank fields is a tuple .. and for
scoped records width typing across duplicates is required .. unfortunately it has to be allowed,
possibly with a warning.

It may be possible to do something about this but I’m not sure what.

For records, there are two kinds I think: closed and open. Closed records do not allow width subtyping.
Felix has closed records in the form of structs, i.e. nominally typed records. If we follow the C++/Java
model, a subtype has to be an explicitly derived, struct (i.e. a nominal type also).

You could have an unnamed struct be a closed record type.

The point is, you could also have open and closed tuple types.
For example (a,b,c) could ba closed (no width subtyping) and (a,b,c,…) open.

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/scopedlabels.pdf

NOTE: my current algorithm for record intersection does NOT do what it should according
to my analysis.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 14, 2022, 10:51:10 AM7/14/22
to felix
OK so it suffices to consider a binary intrersection of two polyrecords with only one label each and two row
variables:

(a:A | X) & (b:B | Y)

Extension to more fields is by induction (recursion) and more intersectands by iteration.
I would need to prove the operation is associative. The rule is now trivial

Case 1: a, b distinct:

a b
----------------------------------
A Y.a X-b
X.b B Y-a
=====================
A & X.b, Y.a & B X-b & Y-a

The result is

(a: A & X.b, b: Y.a & B | (X-b & Y-a)

If a,b are the same label then

(a: A & B | X & Y)

If b is missing,

(a:A & Y.a | X & Y-a)

If a is missing

(b:B & X.b | X-b & Y)

if there are no labels left then

X & Y

So the algorithm is a stock standard merge on the stable sorted fields with a trick:
we always output types for matching labels. If a label is missing we steal it from
the type variable of the track from which it’s missing and to balance the books,
we have to subtract it from that variable.

Having done this step, we now feed in the next step by recursion. In other words for
some concrete pairs of polyrecords, we just assign a variable to everything except
the first fields, perform the step, the replace the variable with the next field and
another variable. We can do this, then intersect the final resulty with a third
intersectand OR we can just generalise the operation to an arbitrary number
of them.

In Felix, during monomorphisation, all type variables are eliminated,
which means all intersections can be eliminated too.


John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jul 14, 2022, 2:01:15 PM7/14/22
to felix...@googlegroups.com
Hi John,

Doesn't that lead to errors though for example:

(Int * String) & Int === ((Int ^ Int) * String) === (Int * String)

In other words that would imply that a pair of int and string is isomorphic to int, but an int is generally a registerable value, and a pair a pointer to memory where the values are stored.

So I can see that if we force 'int' to be a pointer to memory, then the above is true, as an int and a unary tuple of int have the same representation (a pointer to memory), and if we truncate any tuple to length one we would get the same.

But doesn't this cost too much? No register argument passing, no function return values in a register, definitely not 'C' call semantics?

Cheers,
Keean.

--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jul 14, 2022, 5:00:47 PM7/14/22
to felix


> On 15 Jul 2022, at 04:01, Keean Schupke <ke...@fry-it.com> wrote:
>
> Hi John,
>
> Doesn't that lead to errors though for example:
>
> (Int * String) & Int === ((Int ^ Int) * String) === (Int * String)
>
> In other words that would imply that a pair of int and string is isomorphic to int, but an int is generally a registerable value, and a pair a pointer to memory where the values are stored.

In Felix pairs are expanded. There’s no runtime polymorpihsm. Same as C++ at the moment:
all type variables are removed by monomorphisation. However all coproducts are more or less
boxed although the representation of sums and polymorphic varients is regular, wheres
nominally type variants use a complex representation which ensures C compatability for
example list = Empty | Snoc list * int uses NULL for Empty so the representation is

struct node_t { node_t *next; int v; }* list;

and even more important optional T is a pointer to T for Some case and NULL for the None case.
I could implement run I think it’s a serious design fault there is no unboxed coproduct because
all product types are mutable, whereas coproducts are in theory shared and so immutable,
but this is not actually enforced. Sharing should be an optimisation.

So a pair (int * string) is, by width subtyping, already in theory a subtype of int,
the same way as for a record: by throwing away the extra field. Felix doesn’t allow this at the moment,
that is, the specialisation algorithm that requires an argument to a subtype of a parameter, rejects width
subtyping of tuples and arrays.

But that is inconsistent with the intersection rule. In other words, if you had just ONE non-blank label in a
record, you could slice the end off a pair of blank labelled components. And if you then sliced the
labelled component off both, you’d get a tuple sliced. So that would make the type system unsound.

However, if you regard a tuple as a closed type, and a record with blank labels as an open type,
it would be OK but then the two types would be distinct: the constuctors at the moment forcibly
identify them.

>
> So I can see that if we force 'int' to be a pointer to memory, then the above is true, as an int and a unary tuple of int have the same representation (a pointer to memory), and if we truncate any tuple to length one we would get the same.
>
> But doesn't this cost too much? No register argument passing, no function return values in a register, definitely not 'C' call semantics?

Well, at the moment, if you want a boxed product type, you have to use pointers explicitly.

So for example &int and &(int * string) and &int * &string would be the issue.
At the moment the rule is that read only pointers are covariant, write only pointers
are contravariant, and read-write pointers are invariant. So IF

A > A * B

then

&<A > &<(A * B)

by covariance where &<T is a read-only pointer and &>T is a write only pointer.



John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jul 14, 2022, 5:11:39 PM7/14/22
to felix...@googlegroups.com
Considering a 4-tuple of 8bit bytes, and a single 8bit integer. In a register the integer would be in the low 8 bits, but what order would the tuple be in a 32bit register?

It looks like it could work providing tuples are stored from low to high bit.

Cheers,
Keean.


--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jul 14, 2022, 5:12:28 PM7/14/22
to felix


> On 15 Jul 2022, at 04:01, Keean Schupke <ke...@fry-it.com> wrote:
>
> Hi John,

There should be unboxed variants, implying copying actual values not just pointers,
which allows variants to be mutable. Let me call it overlay:

This would mean

overlay list = Empty | Cons { v:int; next:list; }

would be an error: recursive expansion. Whereas

variant list = Empty | Cons of int * list;

is currently acceptable. You would fix overlay by

overlay list = Empty | Cons {v:int; next: &<list; }

This would make list components mutable but the length immutable.= because
the next pointer is read only. If you wanted to splice lists you’d use a read write
pointer.

Coproducts matter because intersection and union should apply to at least
polymorphic variants, which are the dual type to scoped records. At present,
repeated variant cases are NOT allowed, however.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 14, 2022, 5:27:54 PM7/14/22
to felix


> On 15 Jul 2022, at 00:51, John Skaller2 <ska...@internode.on.net> wrote:
>
> OK so it suffices to consider a binary intrersection of two polyrecords with only one label each and two row
> variables:
>
> (a:A | X) & (b:B | Y)
>
> Extension to more fields is by induction (recursion) and more intersectands by iteration.
> I would need to prove the operation is associative. The rule is now trivial
>
> Case 1: a, b distinct:
>
> a b
> ----------------------------------
> A Y.a X-b
> X.b B Y-a
> =====================
> A & X.b, Y.a & B X-b & Y-a
>
> The result is
>
> (a: A & X.b, b: Y.a & B | (X-b & Y-a)


Acually this is wrong because X might contain another “a” label and of course, it should have
been

A
Y.a
——
A * Y.a

etc anyhow. I will revise the algorithm.

If there are equal labels, we do an intersection, otherwise write the lowest keyed type.
Simple. If we RUN OUT of labels, THEN we have to start stealing from the terminating type variable
to match the strand that didn’t run out of labels. We stop when both strands run out of labels,
and ikntersect each of the (type variable minus stolen labels).




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 14, 2022, 5:44:08 PM7/14/22
to felix


> On 15 Jul 2022, at 07:11, Keean Schupke <ke...@fry-it.com> wrote:
>
> Considering a 4-tuple of 8bit bytes, and a single 8bit integer. In a register the integer would be in the low 8 bits, but what order would the tuple be in a 32bit register?
>
> It looks like it could work providing tuples are stored from low to high bit.


Well in Felix, the standard tuples are addressable. So an array of bool would be an array of bool.

However Felix has *compact* prodiucts as well. That’s a *different* product functor.
In this case, an array of bool is an array of bits. ALL compact types fit into a machine
register. Products, sums, and exponentials.

Now, the order of the bits is irrelevant to the typing. It could be big or little endian
(it does have to be one or the other I think).

SO the question is, what is the representation of

(`1:2\,`0:2)

is it 2 or is it 1?

I agonised over this. Europeans expect numbers to be BIG ENDIAN.
This is an example of colonial bullshit. We use ARABIC numbers and
arabs write from right to left. So for an arab 1234 puts the units FIRST.
For an arab, numbers are LITTLE ENDIAN.

Furthermore computer people expect

(1,2).0 —> 1`
(1,2).1 —> 2

in other words we expect tuples to be indexed left to right, smallest index first,
that is LITTLE ENDIAN.

So acually if you follow that rule,

(`1:2\,`0:2) == 1 (little endian indexing)

I acvtually never remember what Felix does and I can’t read the code I wrote
so i have to actually test it:

println$ ((true\, false) :>> int);
~/felix>flx xx
2

So its BIG ENDIAN. Which means index 0 is the rightmost for compact tuples
and leftmost for addressable ones :-)




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 14, 2022, 7:03:29 PM7/14/22
to felix
Discussion with Keean brings up the issue of run time polymorphism,

When you use boxed types in a functional language you can emit binary code
which is run time polymorphic. This is because of a hack (and it is a hack!):
pointers to immutable data are isomorphic to the data AND all pointers
have the same representation. Therefore you can use a void* at run time
for ANY type so for example swapping the components of a tuple:

fun f[U,V] (a:U, b:V) => b,a;

can be implemented as

void * f(void *pair) { return new pair<void*>(pair->second, pair->first); }

if you use type erasure, or, if you do not use type erasure you can just wrap
that binary with templates (and do the type erasure by using casts).

This means you have less code, and you get sharing, and you also copy
values around cheaply by just copying pointers. However there is a price:
all data is immutable.

If you use expanded types, you cannot do type erasure easily, so you get
lots of similar code and you copy data around: the advantage, however,
is you can have mutation. And the fact is that copying data is actually quite
cheap and cache friendly whereas indirections are extremely cache unfriendly.
You also have more issue with memory management with boxing, copying
eliminates the need for garbage collection .. however you now need explicit
pointers for recursive types which re-introduces memory management issues.

HOWEVER run time polymorhpism IS possible even with expanded types.
I have a part of Felix dedicated to making this work, although it’s never
been completely implemented. The method is simple: run time type information.

Basically, what boxing gives you is a universal way to initialise, copy, move,
and assign values, because all those operations are done with a single type:
a machine word holding a pointer.

With RTTI based approach, the data is allocated, copied, moved,
and assigned and destroyed using a vector of function pointers instead.
There are a fixed number of operations that are universal. Garbage collection
is still possible.

To make this work properly, structural type operations: product and coproduct
formation: must be possible. So you have to have a run time type record
of operations for primitives, and a way to construct products and coproducts
systematically, which means the RTTI records themselves have to be
dynamically combinable. For example if you have RA and RB as RTTI for
A and B types, you need RA * RB for the A * B type. In particular element wise
copying has to work from the combined type, and so does destruction by the
garbage collector.

Actually ALL of this is easy but laborious to implement: the ONLY hassle
is stack allocation. Most of what I describe is already done. However the GC
cannot handle products of RTTI (it always handled arrays, in Felix everything
is an array). At present, for products, the *compiler* generates a new primitive,
This is trivial: RTTI records used by the GC contain an array of offsets to
parts of an object which contain pointers. So the GC can find all the pointers.
For a product, the arrays are just concatenated (with offset adjustments of course),
but the result is still a primitive. The GC doesn’t have to do copies or anything,
but it does have to do destruction: that is handled by C++ destructors, which automatically
sequence destruction of components of products. We know what C++ does, so it’s
easy enough to just destroy each component, copy each component, etc.
In other words, a vector of RTTI objects can easily represent a product, and such
a vector is trivial to construct at run time.

Coproducts might be a bit harder: I find them harder to think about and compile
time combination is MUCH harder to write the code for (and is made trickier since
Felix boxes coproducts but not products .. a mistake IMHO). But it can still be done.
That really only leaves exponentials (that is, an eval method) which might be quite
hard, I’m not sure. Rermember, now we’re talking closures using RTTI.

The hard bit is probably handling coroutine correctly. At the moment the Felix compiler
has no idea what a coroutine is (it’s all done dynamically by the run time system, even
if the programmers uses static circuit statements). This prevents optimisation.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 14, 2022, 10:27:35 PM7/14/22
to felix
So I know exactly the algorithm for polyrecord intersection now.
But it requires some changes to implement.

1. Felix polyrecords use a type variable not a row variable. For this to work
we have to distinguish record/tuple/array types from others and also
we have an issue that if the residual is a type variable it is a row variable
but if it merely *contains* a type variale, it’s probably a single term OR
a record/tuple/array.

A side not on this: In Felix a product of one thing is the thing itself.
That is a tuple type with once component is identical to that component
with the projection function being the identity. This is the standard mathematics.

But it means that a tuple has TWO roles: as a product with N>1 projections,
but also it can be a tuple of ONE component, with one projection, namely
the identity.

From the data type alone there is no way to tell the difference.

Now for a scoped record type, with a non-blank label

(a:int, a:long)

is quite distinguishable from

(a:int * long)

The first one has two projections a_0 and a_1, the second only one
projection, a. However if the labels are blank, it’s a tuple, and now
we CANNOT tell the difference: both become just int * long.

To be consistent, there should only be one “a” projection which returns
a tuple type int * long in both cases.

Unfortunately, if you allow (1,) to be a 1-ary tuple *distinct* from plain 1,
it fixes one lot of inconsistencies but it generates HELL in other places.
For example a function taking a single int would be DISTINCT from
a function taking a tuple with one component. At present, we can
say a function always takes a tuple argument and write

f (…)

where the … are replaced by the list of component expressions, and if there is just
one, you can just write that one. In addition

f() = f(()) = f ((())) ….

at the moment. But if you have distinct tuples of one component then

() not equal ((),) not equal (((),)) not equal ((()),) ……

Anyhow … row variables make it consistent but now we need
at least a kind KIND_row for a row variable. And you’d no longer be able to do this:

fun f[R] (v:(a:int | r:R)) => r;

because r isn’t a type now, its a row. You;d have to write this:

fun f[R:ROW) (..) => (|r);

explictly making R a row type, and the value r, being a row value,
could not be used stand-alone, but have to be put into a record,
including one without any prefix fields as shown.

2. The algorithm requires two operations on row variables:

R.name // the type of the first field called name
R - name // the row with the first field named name removed if there is one
// “any” otherwise …

Note that any is only right for intersections, I think, it would be void for variants.
So we’d also need a COLUMN variables for variants, so the “default” for missing
things was different.

In any case, there is no specific term for type projections. So there’s no way to write

(A * B).0 == A

because the projection 0 term is a term not a type, there are no type projections
for extracting components. We’d need them for tuples and records .. and worse,
we’d need them for arrays.

NOTE that we CAN implement all these things with type matches inside type functions.
For example

typefun proj0 (T:TYPE):TYPE => typematch T with | A * B -> A endmatch

which fails if T is not a pair.





John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jul 15, 2022, 2:15:17 AM7/15/22
to felix...@googlegroups.com
Does that mean that all the tuple code in Felix always encodes a single element tuple as the element itself? 

That would mean the first element accessor function would need a special case for length 1 tuples?

Cheers,
Keean.


--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jul 15, 2022, 9:15:04 PM7/15/22
to felix


> On 15 Jul 2022, at 16:15, Keean Schupke <ke...@fry-it.com> wrote:
>
> Does that mean that all the tuple code in Felix always encodes a single element tuple as the element itself?

Yes.

> That would mean the first element accessor function would need a special case for length 1 tuples?

Yes.

You’re not saying the problem the way I would but it is indeed an issue.

A product is not a value: it’s a functor which outputs an ordered collection of projections.
So the product of one type is the identity function: it can be shown to meet the categorical requirements.

Lets look at Felix projection terms:

| BEXPR_prj of int * Flx_btype.t * Flx_btype.t
| BEXPR_rprj of string * int * Flx_btype.t * Flx_btype.t
| BEXPR_aprj of t * Flx_btype.t * Flx_btype.t

The first one if for tuples, the second one is for records, and the third one if for arrays.

We also have:

| BEXPR_identity_function of Flx_btype.t

A tuple projection consists of an integer selecting the component, the domain type,
and the codomain type. The domain for _prj has to be a tuple type with at least
two components so the selector is 0 or 1 for two components or more for more ..
no issues here.

If we have a tuple with no components, there are no projections.

If we have a tuple with one component, we should use the identity function,
because that is the sole projection. In particular, if that one component
happens to be itself a tuple, all is fine.

The projection term does not specify the number of components,
that is deduced from the domain type, which must be a tuple of two or more components.

Consider:

BEXPR_proj (0, BTYP_tuple [int, long], int)
BEXPR_proj (0, BTYP_tuple [int, long], BTYP_tuple [int,long])

The first projection regards the domain as a pair and selects the first component, int.
The second projection WOULD regard the domain as a tuple of ONE type because
the codomain shows that this function must be the identity projection.

This is really nasty, because it is the ONLY case where the domain type does not
contain enough information to determine which projection you’re talking about,
and the codomain would have to be consulted. In all other cases, the codomain
is not necessary, it can be deduced, and so the constructor if requiring both
domain and codomain could use the codomain as a safety check.

The actual constructor function starts off like this:

let bexpr_prj n d c =
if (n = 0 && d == c) then bexpr_identity_function c
else begin begin match d with

This function also have to handle arrays, compact tuples, compact arrays,
and also pointer projections, structs, and other stuff as well but that’s not
relevant here.

The PROBLEM then, is the concrete syntax. If you write

x.0

then x has to have tuple type with 2 or more components, and you have
to use an identity function if you really mean a one component tuple.
For heterogenous tuples this is fine.

BUT consider now the really nasty problem: in Felix a tuple with all same
type components is an array. In particular the type you write as

int * int

is encoded as

BTYP_array (int, 2)

It is never encoded with a BTYP_tuple, we require all types to have a unique
encoding (normal form) otherwise the handling of overlapping encodings
would be a mess.

In particular, there are “no arrays of length 1 or 0” by which I mean,
an array of length 0 is just the unit tuple, and has no specific type,
and an array of length 1 is again just the element itself.

The lack of arrays length 1 and 0 really makes a mess of array
handling functions. [Felix varrays can be length 0 or 1 but the length
is not part of the type of a variable length array].

Note, when I say “lack of arrays length 1” that doesn’t mean you can’t
construct one. You can. It just means the type will be the type of the elemenrt.

Note the syntax for an array is

int ^ 3

an array of 3 ints. And

int ^ 1

is an array of one int .. it just happens that it is equal to int. Similarly

int ^ 0 = 1

An array of length 0 is the unit type.

It’s is all consistent. It would ALSO be consistent to have tuples and arrays of length
1 and arrays of length 0 which retained the base type, but then you’d need explicit
isomorphisms to construct and deconstruct the length1 cases, and the length 0 case
for arrays whereas in Felix, these isomorphisms are identities which means you never
have to write them .. but it also means you CANNOT write them.

There are other issues:

((),1)

Is this just 1? At the moment Felix allows unit values in tuples. But mathematically
it’s nonsense. Tuples with arbitrary units inserted or removed are all isomorphic,
therefore the number of items in a tuple is not well defined unless we fail
to make these isomorphisms identities. Felix eliminates units where possible
but it doesn’t do so in tuples because that would mess up the indexing
by position.

But then, the “tuple of one value” already messes up the indexing.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 15, 2022, 9:48:31 PM7/15/22
to felix


> On 16 Jul 2022, at 11:14, John Skaller2 <ska...@internode.on.net> wrote:
>
>
>
>> On 15 Jul 2022, at 16:15, Keean Schupke <ke...@fry-it.com> wrote:
>>

What all this comes down to is something like this: EVERY system of any use
is full of isormophisms, and EVERY representation of it (i.e. programming language)
has to choose which isomorphisms are identities.

The choice characterises the language. In fact some things are not identities but
appear to be because they’re implicit, and some implicits aren’t even isomorphisms:
for example Felix has subtyping and it is usually implicit, and because the semanics
are extrinsic, these subtyping relations have to involve coercions: these are
inserted implicitly.

Ocaml, on the other hand, doesn’t usually even need explicit types because it
uses type inference, and it doesn’t have any implicit subtyping at all .. it uses
row polymorphism instead. Etc etc etc.

Another one is so common most programmers don’t even know the difference:
there is a difference between a function and a function value; the latter can
involve closure formation which is typically implicit. But inside any compiler
the distinction is manifest: in Felix there are *multiple* applicatons:


| BEXPR_apply of t * t
| BEXPR_apply_prim of bid_t * Flx_btype.t list * t
| BEXPR_apply_direct of bid_t * Flx_btype.t list * t
| BEXPR_apply_stack of bid_t * Flx_btype.t list * t
| BEXPR_apply_struct of bid_t * Flx_btype.t list * t

The first one is a general apply, that is, its the exponential operator.
The others all accept a function as an argument. This term makes a closure:

| BEXPR_closure of bid_t * Flx_btype.t list

which allows a specific function to be used as the first argument of BEXPR_apply.
So when you write

f x

in Felix, there is no function value involved. But if you write

var g = f;
.. g x ..

then g is a closure, and the applicastion uses BEXPR_apply

There’s a HUGE difference. Closures are heap allocated objects.
Functions typically degrade to C functions or at least stack functionss
(the closure is put on the stack). Heap closures are POINTERS.

So type systems that have

f : int -> int

and say the type of f is int->int are technically WRONG. No, that’s
the type of a CLOSURE.

Another example is the distinction between iso-equality and equi-equality.
In Iso equality, the unfolding of a recursive type is isomorphic but not equal
to the type. FPLs (and Felix) usually use equi-equality so infinite trees
have a finite type representation, but in doing so they throw away information.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 16, 2022, 12:26:28 AM7/16/22
to felix
So here’s the thing now: In C++, you can write x.mem where x has a template type T,
and it will only work or fail after T is instantiated. So one of the core rules of Felix is
that polymorphic stuff has to be fully type checked which means, if it works polymorphically
it is certain to work after monomorphisation. [And of course further the C++ generated will
work and the binary code it generates will work at run time too].


Therefore, consider row polymorphism:

fun f[T] (x:(a:int | T)) => x.a;

then you’re NOT allowed to write x.b because T might or might not
contain a T. Furthermore, the row variable is bound in the function
and the row polymorphic type is only allowed when it is bound by
either a function or type schema: Felix DOES NOT have polymorphic
types it only has schema and the type variables have to be eliminated
during monomorphisation.

Now the intersection of two records is determinate: it’s the fields of
both added into a new record, except that if a field is in both,
then it has the type of the intersection of both. We can generalise this
to infinite records by saying “undef ^ T” = T, undef ^ under = undef”
so now all records have all fields. For scoped records we just intersect
correspondingly sequenced fields.

However for POLYRECORDS there’s a problem. A polyrecord is basically
a record with a row variable. The row variable is set when its a parameter
and you pass a record, it picks up the unspecified fields. But what happens
when you pass a POLYRECORD?

fun f[T[ (x:(a:int | T)) =>
let fun g[U] (y:(b:int | U) => … in
g x

???

The rule is clear: this fails, even if T might have a “b” field in it, it might not too.
We can work out here if we changed it so x:(a:int, b:int | T) that U = (a:int | T)

OK so what about intersections?

The answer is we cannot do them at all for polyrecords. Consider

(a:int | T) & (b:int | U)

You might think this is

(a:int, b:int | T & U)

but this is NOT the case because if T contained b:short then we’d have

(a:int, b:short ….

since short is a subtype of int. The only way to write it correctly is this:

(a:int & U.a, b:int & T.b | (U-a) & (T-b))

with the rule that if U.a is undef, the intersection t & undef t AND
if a isn’t in U, then U-a = U. We can take undefr to be type “any”
and this rule is then entirely consistent because t & any = t AND
all records contain infinite copies of every label, they just might
be undef typed, so it’s always legal to remove one.

If it happens U contains an actual a field of type V then it’s removed
and the result int & U.a becomes int & U by substitution.

In other words, the result is stable and correct for all possible
record types T and U, as required by the Felix mantra.
IN PARTICULAR all “undefs” are removed by monomorphisation.

However it’s not clear how one would decide the best match
given parameters and arguments with intersection types.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 16, 2022, 9:42:20 AM7/16/22
to felix
So now’s the issue. Consider type classes.

In Felix, if you bind to a type class virtual, then during monomorphisation it may be
that a required instance is not found. We consider this as follows: the binding is type
safe which means any monomorphisation is sure to be type correct, but if there’s
a missing instance, it’s not considered a type error but an *instantiation* error.
There’s no recourse to run time binding.

For row polymorphism we could do the same. We first assume a row variable
contains a required field, and bug out with an “instantation error” during monomorphisation
if it doesn’t. So for example our intersection:

(a:int | T) & (b:int | U) = (a:int & U.a, b:int & T.b | (U-a) & (T-b))

is correct polymorphically, but on monomorphisation we bug out if T fails to contain a “b"
field or U doesn’t contain an “a” field. In other words we do the computation assuming
the instantiation will satisfy the requirements, then bug out if it doesn’t on monomorphisation.
Of course the problem here is that in fact, taking U.a = any if U doesn’t contain an “a” is
in fact well defined.

The problem is if we write a function with domain an intersection as above, how in the heck
do we determine if an argument record is a subtype of it? If the argument is (a=1, b=2)
it’s a subtype provided U and T don’t define “a” or “b”, or indeed if they’re type int.
But if they have type short, or string for that matter, the argument isn’t a subtype.

Of course the idea is that the type variables are set by the argument:

fun f[T] (x:(a=int, b=int | T) => …
f(a=1,b=2,c=3)

causes T to be (c:int), but if the function has two row variables, and the domain is the
intersection above, which type variable is it in? What if there are two “a” in the argument:
we have the same problem. It’s nonsense.


John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jul 16, 2022, 1:26:28 PM7/16/22
to felix...@googlegroups.com
Surely the type of that intersection is: 

 (a:int | T) & (b:int | U) = (a: int, b: int | T & U)

The overall type is 'T & U', and the constraints are 'a: int' and 'b: int'?

Of course we must determine all of this statically.

To do this dynamically is going to require a type guard at some point, and we need path dependent types. If we have a record R at runtime and we don't know what properties it has we require RTTI:

fun f(r: R) {
    if (hasProp(r, 'a', 'int') and hasProp(r, 'b', 'int')) {
        // r has type (a: int, b: int | R)
    }
}


Cheers,
Keean.


--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jul 16, 2022, 9:33:07 PM7/16/22
to felix
So here is the algorithm for record intersection.

The function “br” is beta-reduction which invokes intersections
when seeing an intersection type. Intersection at the moment is
only defined for types which have a normal form using the
BTYP_record type constructor (which excludes both polyrecords
and tuples), or monomorphic nominal types for which the user
has issued a supertype coercion.

let record_pair_intersect counter br ls rs =
let rec aux ls rs out =
match ls, rs with
| ls, [] -> List.rev out @ ls
| [] , rs -> List.rev out @ rs
| (s1,t1) :: tl1, (s2,t2) :: tl2 ->
if s1 = s2 then
let t = br (btyp_intersect [t1;t2]) in
aux tl1 tl2 ((s1, t)::out)
else if s1 < s2 then aux tl1 rs ((s1,t1)::out)
else aux ls tl2 ((s2,t2)::out)
in aux ls rs []

let record_intersect bsym_table counter br ls =
let flds = List.fold_left (fun acc r ->
match r with
| BTYP_record rs -> record_pair_intersect counter br acc rs
| t ->
print_endline ("Flx_intersect: Record intersection requires records, got " ^ Flx_btype.st t);
failwith ("Flx_intersect: Record intersection requires records, got " ^ Flx_btype.st t)
) [] ls
in
btyp_record flds



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 16, 2022, 9:41:41 PM7/16/22
to felix


> On 17 Jul 2022, at 03:26, Keean Schupke <ke...@fry-it.com> wrote:
>
> Surely the type of that intersection is:
>
> (a:int | T) & (b:int | U) = (a: int, b: int | T & U)

Felix uses Daan Leijen’s scoped records. So this would not be correct.
Suppose type U= (a=short) and short < int. Then the result would have

(a=short,…..)

because int & short = short. So we have to use

(a= int & U.a, … | (U - a) ...

since we already took an “a” field out of U. Note this is the case even without
scoped records. U might contain an “a” term and the result of doing a legal
substitution for U cannot be allowed to change the result. So we have to account
for the result in the polymorphic intersection.

This doesn’t arise in Ocaml or Koka because nether has subtyping,
both use row polymorphism instead. Strangely, Ocaml does have intersections,
however the row type doesn’t name the extension variable it’s just … so it
doesn’t have the issue .. (as I understand it anyhow).



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 16, 2022, 10:00:40 PM7/16/22
to felix


> On 17 Jul 2022, at 03:26, Keean Schupke <ke...@fry-it.com> wrote:
>
> Surely the type of that intersection is:
>
> (a:int | T) & (b:int | U) = (a: int, b: int | T & U)
>
> The overall type is 'T & U', and the constraints are 'a: int' and 'b: int'?
>
> Of course we must determine all of this statically.
>
> To do this dynamically is going to require a type guard at some point, and we need path dependent types. If we have a record R at runtime and we don't know what properties it has we require RTTI:

Yes but Felix has no real polymorphism (at the moment). All type variables are
eliminated during monomorphisation. Underneath, there are only type schema,
and the schema must be resolved during monomorphisation.

The only dynamics at run time are variant discriminant tag dispatch, and these
are constrainted to a finite set, including for polymorphic variants; that is,
open types are closed by the time we get to execution.

However, the compiler insists that polymorphic binding prior to
monomorphisation obeys all the proper type safety rules so that
it could “potentially” delay final resolution to run time.

For polyrecords and polymorphic variants this is entirely feasible,
indeed, if you do run time type stuff you can end up with full
dependent typing (and then try to optimise some of the run time
errors away by the compiler proving some stuff can’t go wrong,
possible with the help of proof sketches by the programmer).

So at the moment, say, type class instantiation is entirely monomorphic,
and if an instance is missing you get a compile time error. This also excludes
dynamic loading of instances. (In fact, even linking: at the moment Felix
is a whole program analyser).

If you’re generating for Javascript I guess things are different, since
for example, some lookup can be delayed until run time.


John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jul 17, 2022, 1:33:52 AM7/17/22
to felix...@googlegroups.com
Okay, I see Daan's records allow multiple keys with the same name. I guess these are useful for modelling scoped variable environments?

Keean.


--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jul 18, 2022, 10:05:54 AM7/18/22
to felix


> On 17 Jul 2022, at 15:33, Keean Schupke <ke...@fry-it.com> wrote:
>
> Okay, I see Daan's records allow multiple keys with the same name. I guess these are useful for modelling scoped variable environments?

Possibly. I think the idea is that if you have say

fun add_b (x:(a:int | r:T)) => (x with b=2);

the result is not conditional on type T having a “b” field or not: it’s
well defined always. Without duplicates, the result would be conditional
on whether T contained a b or not. So you’d need a term

r:T without b

instead of T. The problem is the well typedness of the term
can no longer be determined polymorphically. With duplicate labels
allowed, extension is always sound.

Unfortunately intersection no longer works properly: that is,
it’s like type classes: they’re a cheat. They promise type safety
but bug out on monomorphisation sometimes due to a mssing instance.

Similarly would could bug out record operations that require a particular
field in a variable, if it isn’t present. My intersection formular is universally
correct, provided you allow that all record fields defined as “undefined”
so to speak are treated as Top (btyp_any in Felix). This is because
Top /\ t = t. However, if you do take a field out of a variable
to define a field intersect, you have to make a record of that fact,
and remove it when the type variable is instantiated.

You can always remove a field which is undefined because it’s treated
when doing intersections, as Top.

With unions, the unit is void, so now you treat all undefined fields as
void, so that void \/ t = t.

In any case I have defined union and intersection for ordinary
records which are open but don’t contain a capturing type variable
for “the rest of the fields”. I should do so also for polymorphic
variants except Felix doesn’t currently allow duplicat constructors
which it should, for symmetry.

NOTE: Felix MUST have record intersections because they’re
mandatory for representing Objective C interfaces. In particular
an object which is an instance of interface A and also B is
an instance of A /\ B. I’ve already used this, although I’m stuffed
if I know where I did so: the only suport I can find is in
the specialisation algorithm.



John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jul 18, 2022, 10:53:43 AM7/18/22
to felix...@googlegroups.com
I guess I am missing something :-) When I have defined record types, they are structural so the name and type of all the elements is part of the type of the record. In this way we can statically determine if a field already exists, and what type it has. We effectively treat records as Codd's 'relations', and we can define restriction, projection, union, difference and cross-product. 

With these records nothing is conditional, because the type of a record can statically be determined. Adding a duplicate element (column) would be a type error at compile time, the same for accessing an element that does not exist.

Cheers,
Keean.

--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jul 18, 2022, 10:26:45 PM7/18/22
to felix


> On 19 Jul 2022, at 00:53, Keean Schupke <ke...@fry-it.com> wrote:
>
> I guess I am missing something :-)

Polymorphism. in particular your row variables have to keep track of which labels are
disallowed, with scoped records you don’t because duplicates are allowed.





John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 19, 2022, 12:05:03 AM7/19/22
to felix
Ok so this now works:

///////////
typedef r1 = (a:long, b:string);
typedef r2 = (a:int, c: double);
typedef r = r1 /\ r2; // intersection should be (a:int, b:string, c:doule);

var x : r = (a=1, b="Hello", c=4.2);
println$ x._strr;

typedef u = r1 \/ r2; // only field "a", with type int | long = long
var y: u = (a=42L);
println$ y._strr;
////////

with output:

(a=1,b='Hello',c=4.2)
(a=42l)

Intersection and union and now allowed for plain record types
and for primitive monotypes. The operations on monotypes
MAY NOT WORK if the subtyping relations on the monotypes
does not form a lattice. Without the lattice constraint, there can
be more than one least supertype or greatest subtype, if this
happens Felix just picks one. The resulting operations are
then NOT associative.

Polyrecords (with row variables) do not support these operations.

Tuples and arrays do not support these operations either:
the type has have a record normal form. In principle a tuple
is a record which happens to have duplicate blank field names.
However width subtyping of tuples would be surprising:

int* long > int * long * string

with the coercion chopping off the string. So

(fun (x:int,y:int)=>x+y))(1,2,”Hello”)

would be well typed if we allowed it. In particular

fun f(x:iint)=>x+1;
printn$ f(1,2); // OK, throws out the 2

would be a surprise as would

fun f(x:int^3) …
f(1,2,3,4,5) // OK, throws out 4 and 5

This means we don’t allow width subtyping for tuples
and arrays but we do for “proper” records, meaning, a record
with at least one non-blank field name.

NOTE: at present, the specialisation algorithm ONLY supports one rule:

| BTYP_intersect ts, t ->
(* print_endline ("Flx_unify: Argument as subtype of intersection parameter, must be subtype of each intersectand"); *)
List.iter (fun p -> add_ge (p, t)) ts

This allows a parameter to be an intersection, and requires the argument
to be a subtype of each intersectand. The rule for the argument
being an intersection cannot be implemented easily. The JUDGEMENT
is trivial: the parameter must be a supertype of at least one intersectand.
And that’s the problem.

To explain, if the parameter is a supertype of T, then it is automatically
the supertype of T /\ X for any X, because X can only *refine* the type.

This means there is no unique MGU, in other words, the principal
typing property is lost.

The results for unions are inverted but the same problem, in reverse
exists. You can make judgments, but you cannot product a unique MGU.




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 19, 2022, 12:23:09 AM7/19/22
to felix
I just read this in an article ..

"Is there a category of categories?
It is when you ask this kind of questions that you start to understand why some mathematicians end up mad.”

https://antoine-doeraene.medium.com/how-type-variance-fits-into-category-theory-e662d2c7f522



John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jul 19, 2022, 12:55:48 AM7/19/22
to felix...@googlegroups.com
So you can use row polymorphism, where you say you have a record R (which is a static structural type, you just don't know what one), and you can constrain it so that:

{a: int | R}

So R has at least a property 'a' type int.

This gives polymorphism with the behaviour I would expect. I am not sure what advantages Daan's records have over this?

Cheers,
Keean.


--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

Keean Schupke

unread,
Jul 19, 2022, 12:58:51 AM7/19/22
to felix...@googlegroups.com
Note because the {a: int | R} is a constraint, duplicates are allowed, and you end up with intersections naturally:

{a: int, a: float | R} === {a: int & float | R}

Keean.


On Tue, 19 Jul 2022, 03:26 John Skaller2, <ska...@internode.on.net> wrote:
--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jul 19, 2022, 1:27:46 AM7/19/22
to felix


> On 19 Jul 2022, at 14:58, Keean Schupke <ke...@fry-it.com> wrote:
>
> Note because the {a: int | R} is a constraint, duplicates are allowed, and you end up with intersections naturally:
>
> {a: int, a: float | R} === {a: int & float | R}

The notation (a:int | r:T) means a record with at least field “a” of type int, and ANY subsequent row
of fields. It’s not a constraint.

If duplicates were not allowed, this would be an error. The correct form would be

(a:int | r: (T-a))

meaning T was not allowed to contain a field “a”. Alternatively you might allow
T to have a field “a” and just delete it.



John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jul 19, 2022, 2:24:37 AM7/19/22
to felix...@googlegroups.com
I'm using the notation for constraints :-) 

My question was more "why not use constraints"? I find the semantics much easier to deal with

K.


--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jul 19, 2022, 2:35:36 AM7/19/22
to felix
Following Ocaml, you can now write variance indicators on type variables:

struct X[+U,-V] {
x:U;
y:V;
}

At the moment, the grammar simply allows this everywhere in any type schema.
The information is not used yet.

A variable is invariant if not annotated, covariant if a + is used, and
contravariant is a - is used.

For primitives, these annotations are essential if you want variance
for a polymorphic primitive. Note that types like “std::vector” will be
invariant because they support mutation. However a non-mutation
operation set will be covariant.

In Java, you can use a subset of an interface in a particular call,
which is covariant, and another subset in a different one, which
is contravariant, even though the total interface is invariant.

This is equivalent to the idea in C++ a template class specialisation
only needs to define the methods that are actually used. I think
this is too complicated (it’s known as call-site variance).

The idea is that if A < B, then X[A] < X[B] if the parameter is
covariant, X[B] < X[A] if it is contravariant, and nether if it is
invariant. There’s a possible case for bivariance the Ocaml example is

type ‘a X = int

is bivariant because X[A] < X[B] and X[A] > X[B] no matter what
A and B are or how they’re related, since int < int (and int > int).

In principle, given the variance of primitives, we can *calculate* the
variance for any struct or variant type so in principle, if these have
a specified variance, we should check it agrees with the computed
variance. The computation is a bit tricky apparently but clearly
possible modulo recursion (how to handle recursion?).

Felix will have a bit of extra pain, due to pointers.
But the notation is a problem, because if you allow the compiler
to calculate the variance, then the plain T is ambiguous: does it
mean invariant? Or does it mean whatever the compiler calculates?

I guess we could use ? for “whatever the compiler calculates, but really
one would exect a plain T to mean that. Hmmm.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 19, 2022, 2:56:38 AM7/19/22
to felix


> On 19 Jul 2022, at 16:24, Keean Schupke <ke...@fry-it.com> wrote:
>
> I'm using the notation for constraints :-)

Yeah I figure that. But it’s overspecified. You really only need

T-a-b

to mean T without an a or b field. The point is that you don’t need to
know the type so your notation is overkill :-)


> My question was more "why not use constraints"? I find the semantics much easier to deal with

Well one reason is that specialisation is a heck of a lot more compilcated.

In Felix, “unification” algo has two modifications. The first is that it can ALSO
handle subtyping. So you can submit both equations and inequations of these
forms:

Eq (t1, t2)
Ge (t1, t2)

to the engine. Secondly, the engine takes a set of type variables to solve for.
All other type variables are existential (act as opaque primitives).
So the variables are partitioned into dependent variables we want to solve
for, and independent variables.

An MGU is a list of assignments to dependent variables, the RHS of the
assignments must not mention them. So the algorithm does more than
just judging subtypes, it has to instantiate the LHS of all submissions
so the LHS and RHS are either equal, or the LHS is greater or equal
as required by the submission, AND it has to be the most general such
solution.

In theory a dependent variable does not need an assignment if it is not
used, in that case “any” could be used instead.

So now the issue is simply, how to represent constraints?
The algorithm only knows two kinds of constraints: equality and greater-equal.

For a start, it’s nonsense to say “T has no field a” unless T is a record type.
So the constraints cannot be applied to type variables in general.

The use of row variables fixes this problem, since thay have a different kind
and consists of field*type pairs PLUS the possibility of there being a constrained
kind of type variable.

With scoped rows, NO constraints are required for subtyping however
intersection and union are hard to define consistently WITHOUT adding
the constraint which pops a field. For unscoped rows, you need the constraint
disallowing a field even without intersection and union.

The argument is, if you have scoped rows, then you do not need subtyping
of records at all, since row polymorphism is more general (and ditto
for polymorphic scoped variants). Hence, intersection and union
are not required.

In Ocaml record types are annotated <a:int, b:int ..> where the .. is
a row variable. It’s a weak formulation, because you cannot capture
the extra fields directly (you can get them from the whole record
type MINUS the explicitly given fields though).

The advantaghe of that is that intersection and union don’t have to
compute a row variable for the result (since you cannot get at it).
Ocaml uses non-scoped rows though.




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 19, 2022, 11:35:16 PM7/19/22
to felix
So with variance data, we can try to find an MGU for this inequality

X[t1,t2] > X[u1,y2]

by

ge(t1,t2])and eq(u2,u2) where the variance is covariant,invariant

etc. Now suppose

struct D[t1,t2] : B[u1,u2]

that means

B[u1,u2] > D[t1,t2] …. (1)

where t1,t2 are type variables and u1,u2 type expressions dependent on them.

Now how can we prove this with an MGU as witness:

B[b1,b2] > D[d1,d2] …. (2)

Noting: the dependent variables are in the b1,b2 terms, the independent variables
are in the d1,d2 terms, but in the formula (1) the dependent variables are in the D
term and independent in the B term, the reverse of the position in the formula
to solve.

Now I want to note first a result from real analysis called the Squeeze Theorem.
It says given a continuous function f, and two continuous functions u and l,
such that l(x) < f(x) < u(x) for all x, where u,l are upper and lower bounds for f,
then if u-l converges to zero, that is, u and l converge to a common limit y,
then f converges to y also.

The two bounds “squeeze” the candidate function f, and if they squeeze hard enough,
then we can know f to any precision required.

For our problem, I’m going to invert this theorem and use the following,
I shall call the Explosion Lemma:

Suppose B1 > B0 and D0 > D1 and B0 > D0 then B1 > D1:

B1> B0 > D0 > D1 by transitivity

the idea is that we know B0 is the greatest lower bound for B1, and D0 is
the least upper bound for D1 **by specification**.

The dependent variables we have to solve for initially are contained in B1,
that is, b1 and b2 are formulas using only dependent variables v1,v2 ….

The first thing to do is solve D0 > D1. This is just the assignments

t1 = d1, t2 = d2 .. (3)

so here we actually have an equality. Next we have to solve

B1 > B0

The variables of B0 are u1,u2 which are formulas in t1, t2, so
we can just substitute (3). So now we are left with formulas
of B1 using variables v1,v2 .. and formulas in B0 using only the
independent variables of D1. Again we can try to solve this using
variance rules and an MGU may be produced: now the
final proof is obtained by substituting the assignments
to v1,v2 .. into B1 which eliminates the dependent variables v1,v2 ..
and leaves only formulas in independent variables of D1, and
we’re done.

Whew .. err .. did i get this right?

NOTE: this machinery ONLY works when you have this:

struct D[t1,t2] : B[u1,u2]

where t1,t2 dependent variables and u1, u2 are expressions in them.
It does NOT work for a general relation (I think that is undecidable).
That is, if you have D[ds] < B[bs] where ds,bs are dependent on
some other variables vs.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 20, 2022, 3:54:52 AM7/20/22
to felix
Example:

struct List[+T];
struct Team[+T] : List[T * T];
Dog : Animal

(fun (x: List[Animal * Animal])=>…) (Team[Dog])

Require

List[Animal*Animal] > Team[Dog]

List[Animal * Animal] > List[Dog * Dog] covariance of both List and *
List[Dog * Dog] > Team[Dog] specification
List[Animal * Animal] > List[Dog * Dog] > Team[Dog] transitivity

Hmm.



John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jul 20, 2022, 4:56:21 AM7/20/22
to felix...@googlegroups.com
I don't like this kind of subtyping, as it makes a category error. A dog is a real concrete thing, so is a cat, a sheep etc. An animal is _not_ a thing it is a classification. Therefore animals should not be creatable. This is why I prefer typeclasses, an Animal is an abstract concept that things can either be an instance of or not. But things can also equally be in other classifications like 'hairy things', 'have four legs'. So a table might be a member of 'has four legs' and a cat a member of all three.

But that's just my personal rant :-)

Keean.


--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jul 20, 2022, 8:50:16 PM7/20/22
to felix


> On 20 Jul 2022, at 18:56, Keean Schupke <ke...@fry-it.com> wrote:
>
> I don't like this kind of subtyping, as it makes a category error.

i agree, it was just an example!

> A dog is a real concrete thing, so is a cat, a sheep etc. An animal is _not_ a thing it is a classification.

But you have made a category error here. “An” animal is a thing. “Dog” (without the “an” prefix) is also a classification.
“The” dog, Snoopy, is a thing, of class “Dog” and also class “Animal”.

But speaking of type classes .. what is the variance of typeclass type variables??

At the moment, I have propagated variance data for these nominal types:

(a) external types (i.e. bindings of C/C++ types)
(b) structs
(c) cstructs
(d) variants

For example

struct X[+T] { x:T; }

is not only parsed correctly but the variance data gets stored in the unbound and fully bound symbol tables.

But what if type variables are inherited???

class Integral[T] {
struct Nat { x:int; }
...
}

The typeclass Integral has a type variable as a parameter and we can annotate it with variance.
In this case we could write:

class Integral [+T] { …}

I want to note here

Integral[int]::Nat

is the correct name of the enclosed struct but you could write:

Integral::Nat[int]

unambiguously if Felix allowed it (I can’t remember!)

Of course the problem is, a type class can contain many data types, and the variance
of each may be different .. but there’s nowhere to put the variance annotations
other than on the type class. In fact we have:

class ArrayValue[t,v]


So I DO put type parameters in classes.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 20, 2022, 9:29:38 PM7/20/22
to felix
I’ve been trying to understand how to cojoin the notions of functional
and cofunctional programming correctly. Languages like Haskell which
are (supposedly) purely functional are a hack. In fact the declarative
form has to be cojoined with an evaluation strategy to actually get
a programming language.

The key to this I think is like so: start with a category with products.
and a particular N-functor that finds them and produces N projections.

Now take the dual, which reverses all the arrows and we have a category
with coproducts. Now we have an N-functor which produces injections.

But there is something very wrong here. Consider FPL with pattern matching.
For a numerically indexed sum type, we simply provide a sequence of
corresponding handlers which accept the argument of the appropriate
constructor.

The problem is that these handlers ARE NOT FUNCTIONS. This is the
mistake of functional programming. The correct form for the handlers
are CONTINUATIONS.

The reasoning is simple: a switch selects one of several statically defined
paths for control to flow down, and control continues to flow to the end
of the program. The selection is based on a discriminant, the but the
correct discriminant is NOT a data type: it is, in fact, the program counter.

When we store a value of a variant type, we are encoding in the stored
discriminant, a mapping which selects a control path, that is, it
is a FUTURE value of the program counter.

So, the category with sums can be seen as a kind of flow diagram:
sum types are objects the category out of which there are multiple
arrows.

Anyhow te QUESTION is: how do we integrate these two categories
into a single category? In a way which *preserves* the notion of control
flow the coproduct category defines, as well as the notion of data,
which the product category defines?

A partial answer exists: there’s a relation known as the distributive law.
It is a constraint, one how we can join the two categories into one.

My picture is of adjuncts where you can map control information
into data, and data into control. Control into data is done by
storing the program counter which is an address of code in memory.
Data into control is done by classifying data: when we partition
a set by some discriminating function, we are actually generating
a sum of the classes, meaning a function applied to a class will
act in a conformal way for all its components, the collection of
these functions is precisely a switch.

Now the problem is that we programmers know programs are EXECUTED
and that means energy is consumed to produce actions which perform
operations. The concept of action is central to programming: an action
is a state transformation that actually happens in a computer.

My picture then is an action is the result of swapping from product
to coproduct categories repeatedly. And what this means is that
the distributive law DEFINES what an action is:

a(b+c) <—> ab + ac

We are continually flipping from a product at the top level on the LHS
to a sum at the top level on the RHS.

How does this happen? The answer is: by continuations. We enter
a continuation with a single argument at one code address, we do some
calculations and exit by invoking one of several different continuations,
each with their own argument types. In other words the continuation
has SPLIT control flow, and also simultaneously SPLIT the data.

Lets look again: the continuation takes a single PRODUCT as input,
but it does a switch to complete it’s action, so the output is
a coproduct. The choice is based on partitioning the input argument.
[We assume purity meaning there is no other variant data to access]

So the ACTION is:

a(b+c) —> ab + ac // PRODUCT TO COPRODUCT

The adjoint action is in the other direction:

a(b+c) <— ab + ac // COPRODUCT TO PRODUCT

This is what a *functional* match expression does: it takes multiple input paths
and produces a single output path carrying a product (to feed into the selected
continuation).

The way it does this is to “reify” the control path, that is, store encoding
of a future control split as data. That’s the discriminant of the sum.

Functional matches do NOT take a data type as input: the control
split is done by the coproduct category, and it’s the merging of these
paths intyo a single path with variant data that the functional part
of the pattern match is doing.




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 21, 2022, 3:01:20 AM7/21/22
to felix


> On 20 Jul 2022, at 17:54, John Skaller2 <ska...@internode.on.net> wrote:
>
> Example:
>
> struct List[+T];
> struct Team[+T] : List[T * T];


Note to self: We are writing this coercion:

supertype[T] List[T * T]: Team[T] = …

which has type

Team[T] -> List[T*T]

and in Felix

supertype[T] List[T * T] (Team[T]) : List[T * T] =>…

in other words, the “name” of the coercion is the supertype (abstraction) but
the type parameter is driven from the subtype, which is the parameter of
the coercion function.

But this means the notation is WRONG. We cannot allow the [T] on supertype
to be the driver and then have to put the [T] on Team as well, because the user
might think they can put, say [T+T] on the Team parameter and that is not
allowed.

In addition in the supertype coercion first above, the “:” does not mark the start
of the return type, but actually the start of the parameter type. Hmmm...

The problem is that “in theory” we should allow anything. But the fact is,
we cannot resolve subtypes with an arbitrary coercion. On the other hand
this notation is in the “right order”:

struct Derived[T] : Base[T*T]

Curiously, with say struct inheritance, how do we define the coercion?
We know in C++ it’s just a subtyping “slice”. In Felix, we have to construct
a new object of the Base type and copy only inherited fields across.
Hopefully we can do this with a constructor call to the Base .. yes, constructors
are defined which take all the fields in order.


John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jul 21, 2022, 3:16:50 AM7/21/22
to felix...@googlegroups.com
I agree I made a category error, and that's part of the reason not to like this kind of subtyping, it's too easy to make mistakes :-)

However I would argue the category error was to consider a dog a concrete object. It's not it's just another categorisation, an abstract concept.

If I have some object Infront of me, it might be classified as a Dog, and an Animal. In fact I think relations between abstract concepts are fine, for example saying all Dogs are Animals.

This is modelled by 'interface inheritance' in typeclasses.

But I would point out that these abstract concepts are properties of the observer (the human) and not the object itself.

My conclusion from this is that there should be no subtyping, everything is invariant. Objects are simply themselves. So we just have 'plain old data' (C style structs) and typeclasses. Interestingly this is the approach Rust took as well.

But because typeclasses/interfaces can inherit we can for example define:

typeclass Animal
typeclass Dog extends Animal
typeclass Cat extends Animal

f<forall A : Animal<A>>(x : A) A

We can pass a type that implements Dog or Cat, and we know we get the same type back.

If a function wants to be passed an Animal it can be passed a Dog or Cat, and if it returns an Animal it can return a Dog or Cat.

If we adopt this approach then covariance and contravariance are not necessary...


Cheers,
Keean.



--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jul 21, 2022, 3:45:12 AM7/21/22
to felix
>
> My conclusion from this is that there should be no subtyping, everything is invariant. Objects are simply themselves. So we just have 'plain old data' (C style structs) and typeclasses. Interestingly this is the approach Rust took as well.
>
> But because typeclasses/interfaces can inherit we can for example define:
>
> typeclass Animal
> typeclass Dog extends Animal
> typeclass Cat extends Animal
>
> f<forall A : Animal<A>>(x : A) A
>
> We can pass a type that implements Dog or Cat, and we know we get the same type back.

Felix already does that AND with parametric polymorphism:

class Animal[T] {fun f:T -> T=….; …}
class Dog[T] { inherit Animal[T]; … }


> If we adopt this approach then covariance and contravariance are not necessary...

Well that is really just record subtyping. For the type class .. well in Felix only invariant makes
sense for the class type variable at the moment.


John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jul 21, 2022, 4:36:27 AM7/21/22
to felix...@googlegroups.com
> Well that is really just record subtyping. 

That's true if you consider that typeclasses can be implemented by passing a record as an implicit parameter.

However there is a practical difference. With subtyping we have unwanted compile time type-erasure. If we are passed an 'Animal' we have no way of knowing if it is a Dog or a Cat. With the typeclass approach "f<A: Animal>(x: A)" we actually know whether 'A' is a Cat or a Dog, because we must instantiate the generic parameter <A> with a concrete type.

Of course I think Dog and Cat should probably also be typeclasses, and the actual type should probably just be the structure of the data.

Cheers,
Keean.


--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jul 21, 2022, 11:16:03 AM7/21/22
to felix

>
> However there is a practical difference. With subtyping we have unwanted compile time type-erasure.

It is often wanted. For example a projection “forgets” all about a tuple except one component.

Anyhow Felix has type classes. It also has objects and interfaces: an object is a record of functions
and an interface is a record type.

There is a special construction where a function can have “methods” in it, it terminates
by automatically returning closures of these methods over the function data frame.

Since interfaces are just records, we can do intersections and unions.

In this system the object is 100% encapsulated by functional abstraction.

There’s nothing stopping you from binding these functions to anything at all,
for example two objects.

I use interfaces to describe DLL plugins. Here is the interface for flx toolchain plugins:

interface toolchain_t {
whatami : 1 -> string;
host_os : 1 -> string;
target_os : 1 -> string;
cxx_compiler_vendor : 1 -> string;

dependency_extension: 1 -> string;
executable_extension : 1 -> string;
static_object_extension: 1 -> string;
dynamic_object_extension: 1 -> string;
static_library_extension: 1 -> string;
dynamic_library_extension: 1 -> string;
pathname_separator : 1 -> string;
get_base_c_compile_flags: 1 -> list[string];
get_base_cxx_compile_flags: 1 -> list[string];

cxx_dependency_generator : (src:string) -> int * string;
c_dependency_generator : (src:string) -> int * string;
dependency_parser : string -> list[string];

cxx_static_object_compiler : (dst:string,src: string) -> int;
cxx_static_library_object_compiler : (dst:string,src: string) -> int;
c_static_object_compiler : (dst:string,src: string) -> int;
static_library_linker : (dst:string,srcs:list[string]) -> int;
static_executable_linker : (dst:string,srcs:list[string]) -> int;
dynamic_executable_linker : (dst:string,srcs:list[string]) -> int;

cxx_dynamic_object_compiler : (dst:string,src: string) -> int;
c_dynamic_object_compiler : (dst:string,src: string) -> int;
dynamic_library_linker : (dst:string,srcs: list[string]) -> int;

debug_flags : 1 -> list[string];
}

The flx tool uses toolchain plugins to compile and link C++ stuff.

Felix dynamic loaded uses a registry to check if a DLL is already loaded.
It can also be populated from static link versions of a library: this is called pre-loading:

open Dynlink;

// Now add all the symbols.
proc addsymbols ()
{
static-link-plugin
toolchain_clang_macosx,
toolchain_iphoneos,
toolchain_iphonesimulator,
toolchain_clang_linux,
toolchain_gcc_macosx,
toolchain_gcc_linux,
toolchain_msvc_win
;
// flx
static-link-symbol dflx_create_thread_frame in plugin dflx;
static-link-symbol dflx_flx_start in plugin dflx;

}
}

The dynamic loaded checks the registry first, so the above plugins are
shipped in the executable, but access uses the same code as if the
plugin were in a run time loadable shared lib.




John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jul 21, 2022, 12:00:00 PM7/21/22
to felix...@googlegroups.com

It is often wanted. For example a projection “forgets” all about a tuple except one component.

You can use existential types to 'forget' the type at compile time, which restricts you to only accessing the value through the bound type-classes. Existential types make 'boxing' explicit in the type system.

So I like that this separates the concerns between interface inheritance and compile time type erasure.

For example in Haskell:

data P =  forall t . Printable<t> => MkP t

Declares a constructor 'MkP' that we can only call on a 'Printable' type, or we get a compile time error. However given a 'P' the type of the contents is erased, all we know is that it is 'Printable', so we can call any function from the Printable interface on it, but no other operations.

This approach works well when you want to add a new type to a collection, but is not good when you want to add a new interface.

But must admit a lot of people find this weird and difficult to understand. I thing the RTTI approach is more easily understood. 

Keean.

John Skaller2

unread,
Jul 22, 2022, 12:36:19 AM7/22/22
to felix


> On 22 Jul 2022, at 01:59, Keean Schupke <ke...@fry-it.com> wrote:
>
>
> It is often wanted. For example a projection “forgets” all about a tuple except one component.
>
> You can use existential types to 'forget' the type at compile time, which restricts you to only accessing the value through the bound type-classes. Existential types make 'boxing' explicit in the type system.

In yours. Not in Felix. Coproducts are boxed, products and primitives are not.
The type is not erased either, it’s temporarily hidden.

type myint = new int;

Felix never erases types, it has to know them and their names to generate code.

Systems that use boxing can get run time polymorphism but the performance is disgusting.
Copying is much faster and supports mutation .. and you can still use pointers.

The fact Felix uses boxed coproducts means it can use type signatures like
common ones in FPLs but I consider it a design fault.

I’m moving to a system where there is a universal product and coproduct
constructor with a tag that specifies the representation:

product (“C”, (int,long)) ==> struct { int mem1; long mem2; }
product { “box”, (int,long)) => struct (int const *mem1; long const *mem2; } const *

There are many different useful products. The most important is the least understood,
despite only high school maths: compact linear types.

product (“ccompactlinear:, (5,4,3)) => e0 + 3 * e2 + 12 * e3

If you look at the Ocaml terms used for types in Felix there are .. well just too many.
The semantics are hard coded for each one.




John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jul 22, 2022, 4:04:53 AM7/22/22
to felix...@googlegroups.com
I agree, I think boxing is to be avoided. That's why I like making it explicit in the type system, then we can be sure that all 'plain' types are unboxed. Any use of boxing has to be approved by the programmer.

My understanding is that boxing is basically wrapping a value in a object oriented wrapper class. Unboxed values are plain-old-data like in 'C'. So in an array every element contains a class pointer even if it's an integer, although this can be optimised to a single class pointer for the whole array if the language supports homogeneous arrays. So for a language like JavaScript where all basic arrays are heterogeneous, every value in the array has to have a class pointer, and we can't store a 'naked' integer.

I would point out that the kind of boxing in Haskell using existentials is very similar. You use a pointer to the data in exactly the same way, except the object pointed to includes one or more typeclass function table pointers instead of a class method table pointer. However non-existential types have no method/function pointers at all, they are just plain-old-data 'C' style structs. You can also be explicit about whether it's a boxed array or an array of boxes:

data boxedArray = forall t . BoxArray [t]

data Box = forall t . Box t
data arrayOfBoxes = [Box]

On the topic of coproducts, I find the 'C' style unions pretty useless as you have to hand code any method of determining what option you have. The Functional Programming style discriminated union, where we have a 'tag' for each option, usually just an integer coding the option order from 1 to N, is not considered boxing, it's just a discriminated union. These can have internal or external storage. With internal storage we have to have all options having a known length, and then we reserve enough memory for the longest plus the tag integer, so we can store the option number plus the data in place. With external storage this will be the option integer plus a pointer to the data. This is not slow because we simply 'switch' case on the 'tag' integer and then we know what the type of the data is. For example:

data A = I int | F float

let x = I 1 // stored [0 1] length 2 words
let y = F 1.1 // stored [1 1.1] length 2 words

A boxed coproduct would contain objects for each option, and there is no 'tag'. You obviously could not put a 'naked' integer in a boxed coproduct, you would have to have an Integer class, and put an object of that class in instead.


Cheers,
Keean.






--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jul 22, 2022, 8:56:22 AM7/22/22
to felix
FYI: Felix sums have this representation (all of them):

struct X { uint64_t tag; void *pdata; };

So they’re boxed in the sense that the data is found by a pointer.

This means that a type like

functional_list = 1+ int * list

looks like this:

struct sum { tag; pdata = prod }
struct prod { int; sum {tag; pdata = prod; }

The pdata is ignored if the tag is 0 (end of list).

Nominally typed variants used a more advanced encoding
which is (a) more efficent and (b) C compatable

For example

variant list = Empty | Snoc of list * int

uses a NULL pointer for the Empty case, and a pointer to

struct { next; int }

Variants of 4 or less constructors use the lower 4 bits of a pointer
as the discriminant.

The problem is that this model

(a) requires a garbage collector
(b) does not allow mutable data to be pointed at in a coproduct



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 22, 2022, 9:56:19 AM7/22/22
to felix
Felix needs OO support in the C++ style. I have questions. Consider

struct D[T] : B1[T*T], B2[T+1] { …} ….. (1)

At the moment there is a subtyping table for monomorphic nominal types.
The user defines the entries with

supertype B : D = coecion from D to B.

for example:

supertype long: int = "(long)$1”;hi

This means you can write

B(d) where d:D
long(int)

The coercion is a function with the name of the supertype.
For the original formula (1) we’d get

supertype B1[T*T] : D[T] = ???

If our structs are actual C++ structs with inheritance, the coercion would be like

fun D[T] (B1[T*T]):D[T] => “static_cast<?0>($a)”

where ?0 is the return (super)type.

One question is how to do lookup if there are multiple bases. In particular, if
B1 and B2 both have a field x, which one is found?



John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jul 22, 2022, 6:03:26 PM7/22/22
to felix...@googlegroups.com
So I would call a Felix coproduct as an unboxed discriminated union with external storage.

It's unboxed because 'pdata' can point to raw data like an integer not just objects. It's discriminated because there is a 'tag'. It's external storage because there is a pointer to the data rather than the data itself.

Keean.





--
You received this message because you are subscribed to the Google Groups "felix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-lang+...@googlegroups.com.

John Skaller2

unread,
Jul 23, 2022, 8:43:34 PM7/23/22
to felix


> On 23 Jul 2022, at 08:03, Keean Schupke <ke...@fry-it.com> wrote:
>
> So I would call a Felix coproduct as an unboxed discriminated union with external storage.
>
> It's unboxed because 'pdata' can point to raw data like an integer not just objects. It's discriminated because there is a 'tag'. It's external storage because there is a pointer to the data rather than the data itself.

You can say that. I’d say it’s boxed because there’s a pointer and if the object is modified
all hell breaks loose. The tag is part of the box: they’re basically tagged pointers.
But this is just terminology. The model allows you to make a discriminated union
with existing objects by using a pointer, instead of copying. Any which way recursion
of the type works because there’s a pointer in there.

Unboxed discriminated union is somehow more general because it allows
mutation, but you can always use pointers to get boxing and recursion.
That’s how products work. A union can contain itself, a product cannot.

Actually there’s a case using iso-equality for unfolding in place.
The client has to manually specify to unfold. The advantage is
efficient storage. For example a list:

true
int
true
int
false

unfolded once unfolded. The length 2 is now part of the type. The storage
is contiguous. It’s the same as a list of nodes with pointers except that
the pointer is *implicit* and points at the next free machine word.
You still need the discriminant. The Cons operation has to make a completely
new list, so mutations to the old list after this do not impact the new one.
However all the visitors are the same, for example fold stil works.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 23, 2022, 8:58:46 PM7/23/22
to felix
struct X : Y { .. }

So, I am now propagating base information. However the subtyping algos are not
implemented yet. Also the code generator is ignoring the base so inheritance will
not actually work.

I’m not sure how to represent a struct with a base. There are two ways:

1. Do it my way
2. Make C++ do the heavy lifting

My way would be to include the base as an actual member of the derived type.
Then lookup will search the primary members first, and if not found, search the base.
The advantage of this model is that the coercion from derived to base just returns
the subobject, and a pointer coercion is just a cast.

With multiple bases, coercions are easy, the coercion of pointers
just has to add the offset of the subobject.

The problem with this method is it doesn’t respect C++.

Making C++ do the work, however, is actually much harder.
Finding the type of a field requires lookup in the derived struct
first and if that fails, in the base, but the returned type from
that has to be specialised twice. First, to respect the derivation
and secondly to respect the derived struct specialisation.

One way to make it a tad easier .. maybe .. would be to use
C++ templates and let C++ do the heavy lifting, but I have no
idea how to do that properly if the specialisation is a Felix type,
for example a variant or sum (or tuple for that matter!)

However, a coercion would now be done in C++. The problem
would be with duplicate indirect bases…..

Don’t asj about methods .. :-)



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jul 25, 2022, 6:55:01 AM7/25/22
to felix
So getting any Felix OO class stuff to be useable from C++ (that is, to generate
fully C++ compliant class definitions) seems almost impossible. Copy constructors
use reference types for example. Whereas I would ensure base class subobjects can
be uniquely identified:

class X : p = Y, q=Y {} .. ok

the code generator will not always be able to do this (because in C++ it is impossible).
Functions have multiple arguments, in Felix there is always one.

I think it is too hard. Also Felix uses a GC.

It’s easier to get close semantics with the right code, but which works
better in Felix. Multiple inheritance is a pain, especially with virtual bases
as well as virtual functions.

You may ask why do we even want OO? Well quite a lot of the C++ run time
I am writing uses virtual dispatch. OO isn’t the answer but the technology is
useful. Felix has coproducts which are basically statically type checked
run time typing: variant arguments are heterogenous. So another form
of statically checked dynmics cannot hurt.


John Skaller
ska...@internode.on.net





It is loading more messages.
0 new messages