flx_fbuild_prep

63 views
Skip to first unread message

John Skaller2

unread,
Sep 9, 2020, 11:27:29 PM9/9/20
to felix-l...@googlegroups.com
The makefile now contains a target:

macosxfpcs:
build/release/host/bin/flx_build_prep --debug --target-dir=build/release --target-bin=host --configure --compiler=clang --os=macosx --bits=64

which updates build/release/host/config *.fpc files from extracted standard ones
AND from

~/.felix/config/host/config/*.fpc

This can be used for persistent overrides of system packages OR for
extras. The point is a clean build of Felix from scratch will not clobber
files here. This replaces having to copy the files manually.

The fbuild bootstrap has not be adjusted yet.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 13, 2020, 9:45:37 PM9/13/20
to felix google
So Felix can now parse some Objective C.

In my test case objc-grammar.flx this parses:

////////
open syntax ObjC;
@interface Sub : Super
{
@public NSString s;
}
@property (weak,readonly) A B;
@property (fred = joe) A B;
@property (maxine = jean:) D E;

+ (TYPE) cmeth;
- (TYPE) imeth;
- (TYPE) xWith: (TYPE)x yWith: (TYPE)y;
@end

println$ "Hello World”;
////////

Note that you can’t put @ in column 1 because flx thinks it’s an fdoc processing command.
The only type recognised at the moment is the word TYPE :-)

There are two options for types and variables I’m considering:
(1) ObjC syntax
(2) Felix syntax

There’s also a question what to do with the code. I am thinking of two statements
for interfaces:

objc-generate @interface …. @end

generates a copy of the code in the header file as well as generating bindings whereas

objc-bind @interface .. @end

assumes the client has #included some ObjC headers including the one for the specified
class, and we just want Felix bindings of it.

Emitted headers would be more or less the same as input EXCEPT that Felix types
get translated into ObjC ones. The machinery to do this already exists of course.
ObjC types get left as written.

For variables we can allow

var name: felixtype;

and also

objctype name;

provided the parser can tell the difference. Not sure of this, something definite is probably
better.

The bindings will generate a constructor with the class name as usual, it will have to
find an “init” method or something. Methods will generate Felix functions like

fun WithA_WithB (obj:ClassName, a:Atype, b:Btype) : RetType =>
cexpr[Ret] “[$1 WithA: $2 WithB: $3)"(obj,a,b) endcexpr
;

in other words, the method name WithA:WithB: is mangled by replacing : in the ObjC
method name with an underscore. An alternative is the quote mark so it reads

fun WithA’WithB’ (….


The idea here is that the ObjC syntax is a DSSL, Domain Specific Sub Language.
the usual rule is that it extends Felix, so type and expressions in this stuff
are FELIX types and expressions.




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 17, 2020, 9:38:26 AM9/17/20
to felix google, gordon Childs
OK so version 1 of the Objc DSSL will be like this:


open syntax ObjC;
@objc-bind
@interface Sub : Super

+ (NSString) cmeth;
- (int) imeth;
- (void) xWith: (TYPE)x yWith: (TYPE)y;
@end

println$ "Hello World”;

Note this is NOT Objective C, it’s FELIX and it specified a BINDING to an existing
ObjC interface which must be included in the generated code somehow, possibly by
adding a requires clause (yet to be sorted!)

What will happen is this:

1. Super must already be defined as Felix type mapped to an ObjC class type.

2. Felix will define

type Sub = “Sub*”;

Note the defined type is a pointer.

3. Felix will define a subtyping coercion

supertype Super: Sub = “$1”;

which means values of type Sub can be passed as arguments anywhere Super is specified.

4. An instance or class method will be defined with the specified FELIX return type.

5. The method name will be the concatenation of the Obj name thingos with “:”
replace by “‘“. For example xWith’yWith’ will be the method name of the last case
in the example.

6. For a class method I don’t know what to do. A call like

[Sub cmeth]

is how it is written in ObjC. It’s easy to make a function that does that,
but the method name will then have to be globally unique. Underneath
ObjC is passing a pointer to the class object so perhaps we need a
Felix type for that, i.e something like

type ClassType = “”;
const ClassSub : ClassType = “Sub”;

Then “ClassSub” can be passed as an argument.

7. For an instance method, the type will be a function type

OBJ -> ARGS -> RET

where OBJ is Super in the example, ARGS is a tuple of the remaining parameter types,
and RET is the (mandatory) specified return type EXCEPT if RET is void, in which case
the type will be a procedure type and RET will thus be 0.

The critical thing with this type is that you can form object closures. It is possible to consider
instead of a tuple of arguments to Curry ALL the arguments.

Method names overload as usual only on the object type. Within an object, Obj method
names must be unique, which means closues over the object will always bind uniquely.


There is other stuff in the Obj grammar, including instance variables, and properties.
These will be handled later.

It’s important to note again with this DSSL that the types specified are FELIX types
NOT OBJECTIVE C TYPES. In particular this would NEVER work in Objective C:

+ (NSString) cmeth;

because you cannot return an NSString by value, you have to return a pointer.
But the library already says:

type NSString = “NSString*” requires package “Foundation”;

The actual binding of an instance method will be like this:

fun xWith’yWith’ (obj: Sub) (a1: TYPE, a2: TYPE) =
cexpr[NSString] “[$1 xWith: $2 yWith: $3]” (obj, a1, a2) endcexpr
;


EXCEPT if it’s a procedure when the binding will be adjusted accordingly.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 18, 2020, 4:29:41 AM9/18/20
to Gordon Childs, felix google


> On 18 Sep 2020, at 17:27, Gordon Childs <gordon...@gmail.com> wrote:
>
> This is great - so can I now pick an objective-c class (say AVAudioFile) and, after some wrapping with the DSL, use it in felix?

Yes, that’s the idea. you can already do this now, it’s just more laborious to do the wrapping.

Some further rules:

* A return type of instancetype is not allowed at present. Write the class name instead.

* Class methods do not have an object argument. instead, the Class name is prepended
to the method name. For example in the test case:

/////////
open syntax ObjC;
@objc-bind
@interface Sub : Super

+ (NSString) cmeth;
- (int) imeth;
- (void) xWith: (int) yWith: (NSthing);
- (int) xWith: (int) yWith: (NSthing) zThing:(double);
@end

println$ "Hello World”;
////////

we have

fun Sub’cmeth : unit -> NSString

generated. Note this means you cannot form closures for class methods.

* Instance methods always accept "exactly two arguments".
* The first argument is the object
* the second argument is the rest of the arguments as a tuple or () if there are none

For example

fun imeth: Sub -> unit -> int // NOTE unit argument!!
proc xWith’yWith : Sub -> int * NSthing // Note IMPLICITLY returns void
fun xWith’yWith’zThing: Sub -> int * NSthing * double -> int

This means you can always form a closure of an instance method over the object,
and more pointedly you MUST do so. In particular

var j:int = imeth obj ();

Note carefully to call this one you have to supply the unit value as well as the object.
Also note now reason for this:

var j: int = obj.imeth (); // Recall operator . just means reverse application

This works because application is left associative and so

var j:int = (imeth obj) (); // shows associativity

I have no idea is closures will work correctly with ARC. in general closures form
cycles which defeat reference counting. However this is unlikely here because
they’re effectively closures over bindings which are unlikely to be passed as
arguments to other ObjC bindings.

Note this means method names CAN BE OVERLOADED on the object type.

* If the return type of a method is “void” Felix uses a procedure otherwise a generator.
It HAS TO BE SPELLED “void”.

* there’s no support yet for ivars or properties. These aren’t hard but I’m confused about
things like @private and readonly property. I also suspect in most Apple interfaces there
are just methods exposed.

* Protocols aren’t supported yet. These are just class templates. So it shouldn’t
be hard to add them.

DEBUGGING WILL BE NASTY. As it always is when the compiler generates stuff
and you get an error, you end up with references to generated code or, quite often,
reference “up in the air” because the generated code has no source location.

CAVEAT. If you have an ObjC method like

- (NSString*) description;

the Felix DSSL has to read:

- (NSString) description;

This is because

type NSString = “NSString*”;

i.e. in Felix the “pointer” is already there.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 18, 2020, 8:25:40 AM9/18/20
to felix google, Gordon Childs
And .. THIS WORKS:

////////////////////////
include "std/apple/Foundation";

open syntax ObjC;

header small_class_interface = c"""
@interface SmallClass: NSObject { }
- (int)get1977;
@end
""";

body small_class_implementation = c"""
@implementation SmallClass
- (instancetype)init {
self = [super init];
return self;
}
- (int)get1977 {
return 1977;
}
- (int)getsum: (int)toadd {
return 1977 + toadd;
}

@end
""";

requires small_class_interface, small_class_implementation, package "foundation", package "ObjC";

@objc-bind @interface SmallClass
-(int)get1977;
-(int)getsum:(int);
@end

var sc : SmallClass = cexpr[SmallClass] "[[SmallClass alloc] init]" endcexpr;
println$ "Get: " + (sc.get1977()) . str ;
println$ "Add: " + (sc.getsum' 42) . str ;
///////////////////



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 18, 2020, 9:06:27 AM9/18/20
to felix google, Gordon Childs
Class methods (at least with no arguments) now work too:


@objc-bind @interface SmallClass
+(SmallClass)alloc;
-(SmallClass)init;
-(int)get1977;
-(int)getsum:(int);
@end

var sc : SmallClass = cexpr[SmallClass] "[[SmallClass alloc] init]" endcexpr;
println$ "Get: " + (sc.get1977()) . str ;
println$ "Add: " + (sc.getsum' 42) . str ;


var sc2 : SmallClass = #(#SmallClass'alloc.init);
println$ "Get: " + (sc.get1977()) . str ;
println$ "Add: " + (sc.getsum' 42) . str ;


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 19, 2020, 12:35:31 AM9/19/20
to felix google, Gordon Childs
I am changing the API for argumentless methods for compatibility with
(the very few) existing bindings in the Felix apple library.

So for example

-(SmallClass)init;

will be called by

object.init

instead of

object.init ()

The latter is more consistent, but is extra verbose considering almost all methods will
be called rather than object closures formed. You can easily make a closure like this

{ object.init }

anyhow. The triggering issue was that I just implemented this:

fun description : NSObject -> NSString = "[$1 description]”;

which follows the latter convention, not the former. It is currently

IMPOSSIBLE

to give curried arguments in bindings. In fact the alternative using
inline bindings never occurred to me before this project:

fun f(x:int) => cexpr[int]”$1” x endcexpr;

is equivalent to

fun f: int -> int = "$1”;

but more verbose BUT it allows full Felix typing including currying because,
quite simply, the first f function is actually a full Felix function, not a binding.
It effects a binding using an inline binding, cexpr .. endcexpr instead.

THERE IS ALWAYS A CLASH BETWEEN ACTUAL SIMPLICITY of the
underlying system and WHAT THE USER PERCEIVES AS SIMPLE based
on minimal syntax of common cases.

Another example of this clash is the existance of “constant type constructors”
copied from Ocaml’s design, when in the abstract all type constructors should
be functions, and constant constructors should be constant functions (acceptig
a unit argument). But do you REALLY want to write #True everywhere instead of
just True??


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 19, 2020, 7:19:24 AM9/19/20
to felix google, Gordon Childs
So what to do with properties?
=========================

In the binding code,

@property T v;

is exactly the same as

-(T)v;
-(void)setv:(T);


and

@property (readonly) T v;

is just the first, so it’s a totally useless wank. The syntax is also bad because it’s C like.
So better would be just

@var v:T; // read write
@val v:T; // read only

none of the other property attributes make any sense in an interface as far as I can see.

Protocols
========

As far as I can see a protocol is just an abstract type. So declaring one is exactly
the same as declaraing a class, in other words

@protocl = @interface

do exactly the same thing. Now suppose we have a method fred in a protocol hasFred and

@interface X<hasFred>

then we expect

x.fred

to work, as if the interface of X specified a fred method. Now by my theory
the @protocol generates:

type hasFred = “id”;
fun fred: hasFred -> thing = “[$1 fred]”;

exactly as if it were a class interface, and,

supertype hasFred : X = “$1”;

is generated for class X. This means there is an implicit coercion
from objects of type X to those of the hasFred, and therefore

var x = cexpr[X] “[[X alloc] init]” endcexpr;
.. x.fred ..

will work and return a thing. That’s because it really reads

fred x

and fred requires a hasFred, but X is a subtype of hasFred so it will work.

Protocols ALSO may be SPECIFIED to conform to other protocols.
This does NOT mean they have the extra method injected in, no,
its exactly has above: it’s nothing but a subtyping coercion.

in summary the Felix algebraic type system is so vastly superior it can
already handle protocols exactly the same way as supertypes.
From an interface view, this is just multiple inheritance.




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 19, 2020, 9:19:06 AM9/19/20
to felix google, Gordon Childs
ok so protocols are now also available:


@objc-bind @protocol hasDescription
-(NSString)description;
@end

Exactly the same as a class interface except no super class, no ivars,
just methods. Class methods are implemented as well but because of how
I did them, you end up with an attempt to call the method on the protocol
name. This suggests using AnyClass as the type of a class, so that
class objects exist, and then having to pass it. In this case a protocol
class method is called on a dynamically passed value of the type AnyClass,
which is an actual pointer to a class object, and dynamic dispatch should
make it work. Not sure on this so at the moment class methods in protocols
are just an accident :-)

Once a protocol binding is defined you can use it in a class or protocol
interface:


@objc-bind @interface SmallClass<hasDescription>
+(SmallClass)alloc;
-(SmallClass)init;
-(int)get1977;
-(int)getsum:(int);
@end

To prove it works:

var sc : SmallClass = #SmallClass'alloc.init;
println$ "Get: " + (sc2.get1977) . str ;
println$ "Add: " + (sc2.getsum' 42) . str ;

// SmallClass is a subclass of NSObject by default
fun require_subtype (x:NSObject) => (x.description).str;
println$ sc.require_subtype;

// SmallClass obeys protocol hasDescription
fun require_description(x:hasDescription) => (x.description).str;
println$ sc.require_description;

The protocol is an incomplete type in C++, namely this:

struct hasDescription *

I have no idea what ObjC does, but we don’t care at the moment.

As you can see, pointer subtyping is used both for class super types,
class protocols, and protocol protocols too.

Optional methods in protocols are not supported because they don’t make
any sense.

NOTE CAREFULLY: The bindings are TYPE SAFE.
PROVIDED … the bindings accurately reflect what they bind to.
Its the same as for C bindings. if you bind Felix type int to C type float
you can’t expect things to actually work :-)


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 19, 2020, 9:12:32 PM9/19/20
to felix google, Gordon Childs
So a small syntax change allowing this:

objc-bind
@protocol hasDescription
-(NSString)description;
@end
requires small_class_interface, small_class_implementation, package "foundation", package “objc”
;

The @ is gone from the top level keyword “objc-bind” and a “;” is required at the end of the
statement. Same for class interfaces. You can now add requirements, they’re simply
added to the type involved, I didn’t bother with methods. I probably SHOULD add them
to class methods (because there’s not necessarily a use of the class instance type!)

===================

Now there’s an issue with protocol class methods. Consider:

objc-bind
@protocol hasDescription
-(NSString)description;
+(NSString)classname;
@end
requires small_class_interface, small_class_implementation, package "foundation", package “objc”
;

This will generate a constant method:

fun hasDescription’classname () = “[hasDescription classname]”;

which won’t actually work because AFAIK hasDescription doesn’t exist.
In Felix the binding is:

type hasDescription = “struct hasDescription*”;

which is a dummy incomplete type. The Felix type has to exist to allow
coercions, in the specification:

objc-bind
@interface SmallClass<hasDescription>
+(SmallClass)alloc;
-(SmallClass)init;
-(int)get1977;
-(int)getsum:(int);
@end
requires small_class_interface, small_class_implementation, package "foundation", package “objc”
;

Felix will generate this:

supertype hasDescription: SmallClass = “$1”;

So that this works:

fun require_description(x:hasDescription) => (x.description).str;
println$ sc.require_description;

The actual function will get a pointer of type SmallClass* and call the description
method on it like so:

cexpr[NSString] [$1 description] x endexpr

The point is the supertype coercion does NOT erase or change the type,
the type WILL NOT be struct hasDesciption*, in fact I think I could have left
the type binding blank like so:

type hasDescription = “”;

In other words its a phantom which enables type checking but doesn’t impact
the generated code.

So the issue is: what the heck do we do with class methods, in protocols
particularly, but the issue also impacts classes. The alternative for classes again
is that ObjC has an actual type

AnyClass*

which is the type of all pointers to class objects (NOT instances!).

This means you can apply ANY method to any class, and it may or may
not work at run time but there’s no way to type check it. By contrast the
existing mechanism;

SmallClass’alloc ()

is type safe in the sense the argument () is type checked and the return type is known
to be SmallClass (in Felix, in C SmallClass*).

NOTE ALSO: can you call a class method on an instance object? I don’t know.
If you can that won’t work at the moment.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 19, 2020, 11:05:57 PM9/19/20
to felix google, Gordon Childs
I’m now looking into clang blocks. More generally closure technology.
A good article although old:

https://www.mikeash.com/pyblog/friday-qa-2011-06-03-objective-c-blocks-vs-c0x-lambdas-fight.html


1. FELIX CALLBACKS

To do callbacks, Felix has a callback construction. What this does is create
a special type and a thunk. i will report details later because I’ve forgotton exactly
how it works. The construction gives a C function signature with one modification:
the function type name is repeated in the argument list in one place. This signifies
that the corresponding argument is a void * which is assumed to point at
some object that holds context for the C function, the client data pointer.

Felix generates a thunk which assumes that the client data is a Felix function value,
which underneath is a pointer. It then casts the void * to the appropriate Felix function
type and invokes the apply() method with all the other arguments passed to the
callback. Therefore, you have a full scale normal Felix function operating as a callback.
If it’s a procedure the additional constraints of a function apply: you cannot do SVC requests
and thus cannot do channel I/O. The garbage collector, however, should work, because
it saves the machine stack using a longjmp and restores it after a collection.

Of course for some callbacks this can’t be allowed, the GC does allocations, takes
ages to run, acquires and releases system locks, etc etc. GC can be inhibited for
some types on allocations but this may not be enough. The programmer must ensure
the function will meet the requirements, the compiler will not help (at the moment).
I note in particular calling functions in the general case causes a heap allocation!
It’s often optimised away but no promises! The only work around is to use the cfun
binder, which guarrantees to generate a C function with the specified signature
which means any attempt to access the thread frame (and thus GC) will lead to
a C++ compile time error.

To use a Felix callback you call the registration function passing the thunk as the
C function, and the Felix function as the client data.

Therefore, to use this feature BOTH the callback type and registration function
provided by the client API must include an identifiable client data pointer argument.

It’s messy to use this machinery, and last I looked there was a problem that one
of the types was wrong: in one case it’s one thing and in another case another thing,
and there’s no way to cast it to meet both requirements. So the result has a type safety
loophole. I'll investigate further.

PROS: works with all common API that model callbacks using client data pointers
and C functions which is almost all of them. in particular, it’s platform independent,
so it works with all compilers on all OS.

CONS: its complicated to use and doesn’t enforce constraints imposed by Felix
itself, nor those imposed by the client API.

2. CLANG BLOCKS

Now we get some messy issues. These things are well typed and work in C, C++,
and ObjC and ObjC++ in general, but ONLY with clang. They’re safe in the sense
that they copy captured values at block creation time so continue to work even if
the containing scope ends its life. The block object itself starts on the stack and gets
copied to the heap on demand, which means pointer to it are unreliable.

A pointer to a possible heap allocated value is used if the variable being referenced
in the block is designated __block. I think these start on the stack too but also get
copied to the heap.

Block objects are not reentrant. If you copy one, you’re copying a pointer only.
This means local variables of the block are shared. So you have to use
a special Block_copy function to copy it, and this is true if you want to pass the
block outside the lifetime of the stack. Messy! The user has to THINK.

PROS: share __block variables are managed automatically
The syntax gives blocks a definite type

CONS: platform/compiler dependent
Blocks crash if the containing scope ends, unless explicitly copied onto the heap

3. C++ LAMBDAS

These things provide greatest flexibility about capturing rules. They’re first class
values, so cannot become invalid, ever. However variables captured by reference,
or as usual anything pointed at, can be invalidated when the lifetime of the
refered to thingo ends. The BIG DOWNSIDE and it is a HUGE DOWNSIDE
is that they have no definite type. It’s close to saying this stuff is utter crap.
It works when the API requires a generic function like object but is useless
when a specific callback type is required. Conversion machinery does exist
but is fraught with problems.

PROS: platform independent
CONS: no definite type


SUMMARY
=========

Lots of people on Apple use Clang blocks because the API’s like Grand Central Dispatch
are designed for them. But supporting a compiler specific extension in Felix when that
extension is a general function type requires extensive changes: basically I would have
to add a new function type to Felix, which impacts the whole compiler.

A weaker approach may be to use opaque but polymorphic types somehow, and put the
machinery in a DSSL.

Perhaps the BEST method is to use Felix callback machinery AND some kind of DSSL.

However if we have to handle client supplied blocks, or face an API that does not provide
traditional C callback handling, or deprecates it, this could be problematic.

All in all, research must continue. The big problem is that Felix is designed assuming
heap allocations are OK and this just isn’t the case inside most callbacks.
One solution is to provide an alternate allocator but to use two allocators may have a performance
impact .. we need to keep at least a separate Judy array or some extra data to determine which
deallocator to use. The performance issue can be avoided for system objects by marking the type
as exclusively using the fast internal allocator. Also some other method could be used to
distinguish the allocator used, eg by just examining the address. That would support a fixed
pre-allocated block to speed up some early allocations.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 20, 2020, 12:39:38 AM9/20/20
to felix google, Gordon Childs
So to handle ivars (instance variables) I am thinking to use the same machinery as cstruct.
But this leads to an issue. Recall

cstruct fred { int x; y:int; };

defines a binding to an existing C struct (leave out the “c” and it generates the struct too).
It’s a bit weird to cannot put “var x: int” but that’s what we have.

So for an Objective C class interface, the bit in {} is exactly a C structure anyhow.
You can put anything in there you can in C or C++.

Consider:

objc-bind
@interface SmallClass<hasDescription>
{
int x;
y: int;
}
+(SmallClass)alloc;
-(SmallClass)init;
-(int)get1977;
-(int)getsum:(int);
@end
requires small_class_interface, small_class_implementation, package "foundation", package "objc"
;


in a cstruct you can write:

var a : struct-name;
… a.x …
… &a.x <- 42

because the name of a struct member is also the name of both a value and pointer projection.

The PROBLEM is that I am currently doing this:

type SmallClass = “SmallClass*”;

I will have to replace that with something like:

cstruct _SmallClass { .. }
typedef SmallClass = &_SmallClass;

But now, in ObjC you have to write

sc -> x

to refer to the value x, where sc is type SmallClass*. This is also an lvalue so accepts
an assignment. in Felix on the other hand

sc.x

will be a pointer to int when sc is of type SmallClass so you can write

sc.x <- 42

but for a value you have to write

sc*.x
or
*(sc.x)

But the problem is worse, because the cstruct binding will fail because the underlying
C type is SmallClass NOT _SmallClass.

I have run into this problem before in a different guise that shows a weakness in the
Felix binding system. With Felix structs the rule is that

fieldname value == value.fieldname

is a value projection and

fieldname pointer = pointer.fieldname

is a pointer projection. Fine. But what if we have an abstract type which is a pointer
underneath??? The canonical examplar is varray! (Is canonical exemplar an oxymoron
or am i just a moron? A nasty aspertion to the good people of the Cuban island of Moron).

The problem with varray is it nonsense to write

&v. 1 <- v . 2

because v is already a pointer, a reference to a length known C array.

In previous cases I handled this problem for lifted abstract types exactly by

private incomplete type _fred = “fred”;
type fred = new &_fred;

or just

type fred = “fred*”;

but then you have to hand define the projections. A change to the compiler and parser
to allow

cstruct ftype = “ctype*” { int x; }

may help. The idea is it defines Felix projections

fun x : ftype -> int = “$1 -> x”;
fun x : &ftype -> &int = “&($1 -> x)”;

and says

type ftype = “ctype*”;

Now we can have

cstruct SmallClass = “SmallClass*” { int x; }

and given

var sc : SmallClass …

the expression

sc.x

invokes the value projection passing, at the C level, a pointer
to SmallClass. The problem is then reduced to implementing the
two projections,.

At present, these functions are NOT defined in the symbol table,
although they used to be. Instead, the code is synthesised on demand
by the lookup routine. in other words, when you try to apply a function f to
a value of type T, if f cannot be found AND T is a struct or cstruct type,
then we look for a field in T named f.

But the binding is defined in the virtual machine, and only rendered in
the back end. Hmmm...



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 20, 2020, 3:21:53 AM9/20/20
to felix google, Gordon Childs
I am starting to get a handle on this.

First observe:

struct A { int x; }

var a = A(1);
var pa = &a;

println$ a.x;
println$ *(pa.x);
println$ pa*.x;
println$ pa->x;

In the grammar the following are enforced:

1. operator . means reverse application
2. pa *. x and pa-> x both mean the same as *(pa . x)
3. when x is applied to a value it is a value projection
4. when x is applied to a pointer it is a pointer projection

Thus a component assignment is done by:

pa . x <- 99;

where the LHS is a pointer to an int as required because

x: &A -> &int

is the pointer projection used. The core idea eliminates lvalues but we have
instead this:

&a

“take the address of a variable”. Actually the variables IS an address and &a is
just that address. Contrarily “a” by itself is actually a fetch from that address.
So in principle & is NOT AN OPERATOR, rather &a is just the variable address.
Pointers can also be obtained from bindings and heap allocations with operator new.

Note here pa could be dereferenced returning the whole struct A:

var b = *pa; // a copy of a

NOW having covered the preliminaries we have to see what the problem is
with opaque pointers. These are values that seem like ordinary values to Felix,
but underneath they’re pointers. So when you “pass them by value” syntactically you are
actually “passing by reference” semantically.

The exemplar is varray. But in Objective C it is EVERY TYPE.

The critical feature of these types is typically that the opaque pointer CANNOT
BE DEREFERENCED. It refers to an uncopyable shared value.

So with SmallClass I added some ivars and hand implemented the standard protocol:

@implementation SmallClass
- (instancetype)init {
self = [super init];
self->x = 42;
self->y = 98;
return self;
}
….


objc-bind
@interface SmallClass<hasDescription>
{
int x;
y: int;
}


println$ sc.x, sc.y;
&sc . x <- 88;
println$ sc.x, sc.y;

Result:

(42, 98)
(88, 98)

So this all works fine but it is STUPID. Varray has the same problem.
Why take the address of variable sc, when it is not in fact being modified?????
It is the structure it points at that is being modified.

In fact what we want is to emulate pointers so that:

sc.x

is a POINTER to an int, and,

sc->x

fetches the value of the int x, and

*sc

IS ILLEGAL. But it is more complicated. Suppose x were an NSString * in C, and thus
an NSString in Felix. Then in fact

sc.x

in Felix would accept an opaque pointer of type SmallClass and retuun another opaque
pointer of type NSString. The problem is we want the machine address of the slot in
object pointed to by sc if we want to assign to it. So actually

sc.x // should return a MACHINE ADDRESS
sc->x // should return an opaque pointer (in the NSString case)
sc->x // should return an int in the int case

and *sc shouldn’t be allowed. So if I implement the projection “x” to return
a machine address then

*(sc.x)

will work IN BOTH CASES. In particular

sc*.x
sc -> x

will both BUG OUT because they mean

(*sc).x

and sc cannot be dereferenced. The problem is that none of these
operators can be overloaded except deferencing. In fact,

an immediate dereference followed by a component selection

is just fine, as long as the dereferenced value is used up immediately.
In other words, its ok to return a proxy provided it has to be immediately
used up.

Maybe operator -> should be overloadable. This would avoid the problem.






John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 20, 2020, 9:11:18 AM9/20/20
to felix google, Gordon Childs
So half formed plan now. Review: a real pointer has the following behaviours:

*p //deref
p . field = field p // field is a pointer projection
p*. field = (*p) . field // value projection via deref
p->field // defined same as above

The meaning of . *. and -> is enforced by the parser, however *p binds
to an overloadable function deref, which by default binds to the compiler
deref intrinsic _deref. Note, ordinary overload not type class based.

With this model an value of abstract type such as a varray or this in ObjC:

type SomeStruct = “SomeStruct*”;

which is supposed to work like a pointer, can overload the projection function
to return a real pointer or another abstract one, but in ObjC deref is not possible
and so the other operators will fail because the interpretation is enforced and depends
on deref.

The solution is to allow -> to be overloaded. lets look at this one:

@interface SmallClass { NSString *st; } .. @end

This is ObjC code not Felix binding code. Now we define

type SmallClass = “SmallClass*”;
fun st : SmallClass -> &NSString = “&($1->st)”;

so now if sc is a SmallClass in Felix

sc . st

will be a MACHINE POINTER to the slot named st. This means

sc . st <- @“Fred”;

will correctly assign a new string to the st member of the object pointed at
by st. Of course now to get the value:

*(sc.st)

will return an NSString in Felix or a NSString* in C. But this syntax is not what we want!
What we want is:

sc -> st

to return the string, that is, to act like a value projection. The only way to make this
work is to overload operator -> because the current parser definition leads to

(*sc) . st

and *sc is undefined. But this is not so easy to do because whilst sc is in general
an expression, the st there is just a name. It’s that name which can be overloaded
by making it a function, but that is already defined.

In C++ operator->() is an overloaded unary postfix operator but SPECIAL rules apply
to using it: after the function is called on a value, it is called again recursively until
a machine pointer type is found, then the fixed semantics of that are applied.
Felix doesn’t current do that.

There IS a way to make it work, we define defer to return

type proxy[SmallClass] = “SmallClass*”; // roughly!

and then define

st: proxy[SmallClass] -> NSString = “$1->st”;

so now

sc -> st

is defined as

(deref sc) . st

and deref sc returns a proxy type which st overload accepts as an argument
and returns the required component value.

The PROBLEM with this is that proxy[T] isn’t a real type. So this would type check
in Felix:

var x = *sc;

but would fail at C++ compiler time because its a cheat. Well we could do this:

type _proxy_SmallClass = “void”;
fun deref : SmallClass -> _proxy_SmallClass = “$1”;

and then it all works provided we don’t try to store a value of *sc.
I will check this.

=========================

But the problem for varray is sligthtly different because the projection
can be an integer expression. At the moment

v . 42

is defined as

42 v

and we can define that by

fun apply[T] : int * varray[T] -> T = “$2[$1]”

because a varray is actually an ordinary machine pointer to T in C.
Varray differs from C array in that both the max length and used length
are recorded in JudyArrays which allows bounds checking and to fetch
these two associated values. The problem is the same kind of problem
for storing a value though. We need

fun apply[T]: int * _proxy_varray[T] -> &T = “$2+$1”;

So then

v->42

will resolve to

(*v) . 42

which is an application of 42 to a proxy again. This invokes the apply method.

The machinery therefore also works for varray. It relies on a phantom type

_proxy_typename

to be defined alone with typename. Since its a phantom, it must be erased
during code generation. If it isn’t, what happens depends on the definition:
we could use “” or “void” or just define it to be the same type.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 20, 2020, 10:22:23 AM9/20/20
to felix google, Gordon Childs
OK it works now:

objc-bind
@interface SmallClass<hasDescription>
{
int x;
y: int;
}
@end
;
...
println$ sc->x, sc->y;
sc . x <- 88;
println$ sc->x, sc->y;

output:

(42, 98)
(88, 98)

if you write


var bad = *sc;
println$ bad . x;

all hell breaks loose at C++ compile time because I bound the deref proxy type to C++ void:

field has incomplete type 'flxusr::objc__hyphen_bind::_a64464t_66332'
(aka 'void')
_a64464t_66332 bad;

Now technically I could make that work by simply binding the proxy type to the SAME type.
if I do this, it looks nice and you can even make a copy but you have to remember it works
like a pointer and *sc is STILL a pointer. You might expect since SmallClass is a pointer
it can be dereferenced to fetch the underlying structure as a value but that isn’t what happens
(and cannot because Objective C somehow prevents it: you cannot dereference an NSString *
even though its an actual machine pointer, even in Objective C. I guess the copy constructor
is invoked and is disabled leading to a compile time error.

So the machinery is not exactly pleasing .. but it does work in practice.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 20, 2020, 6:56:29 PM9/20/20
to felix google, Gordon Childs
Properties will work soon. They’re pretty useless.
Take care do NOT define an ivar and propety with the same name.

The only attribute which has any semantics is readonly. If found,
only a getter method is generated. Otherwise a getter and setter.

PROTOCOLS AND SUPERTYPE.

**** instancetype is not implemented ****

Instance type is a kind of generic but what it really represents
is covariant return. in C++ consider

struct B { virtual B* method(); }
struct D : B { D* method(); }

This is type safe and correct because the return type of the overriding method
is covariant. This is what ObjC instancetype means. It is like writing:

struct B { virtual instancetype* method(); }
struct D : B { instancetype* method(); }

where the first use is replaced by B, and the second by D. In Objective C this
is important although it HAS NO SEMANTICS. Why? Because ObjC is dynamically
typed, so we don’t care if the type returned is NSObject* or even id, methd calls
on the object will work anyhow, based on the actual run time type of the object.

The difference is optimisation. if you know the type statically the method can be called
directly without using the run time method lookup table.

In the Felix binding all hell breaks loose because Felix is rigidly type safe.
Here’s what happens: Consider a protocol

@protocol Copyable -(Copyable)copy; @end

and some class

@interface MyClass<Copyable> -(int)get; @end

So now this will not work:

var my : MyClass;
var cpy = my.copy;
println$ cpy.get;

Why? because this doesn’t work either:

var cpy: MyClass = my.copy; // TYPE ERROR

because the protocol returns a Copyable. Sure, a MyClass isA Copyable
but it also has a get method and that gets lost because the return of
copy is invariant in the static object type which is just Copyable.

if we had instancetype implemented, the return would be covariant,
and the static type would be MyClass and the get method wouldn’t get lost.

This requires SIGNIFICANT CHANGES to the compiler. They’re not hard
but its very boring and it invades the system. The way we do this is we have
to keep track of the supertype and protcol methods in a table. Then when the class
is defined, we have to copy all the method into it, inheriting the methods, and changing
“instancetype” to the current class type.

Actually the implementation of everyone of these inherited methods is done the same
way, we just call the existing method and cast the return type covariantly. This is a downcast
but it is safe (assuming the underlying Objective C method calls work correctly!!).

There are two exceptions to this rule:

1. If a covariant return is NOT used in the method we don’t have to bother
2. If the method is overridden in the class we had BETTER NOT also copy it

A prime example of (2) is of course init. The current class init method usually calls
the super type init method so the two must be kept separated.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 20, 2020, 7:50:58 PM9/20/20
to felix google, Gordon Childs
Properties now work:


objc-bind
@interface SmallClass<hasDescription>
{
int x;
y: int;
}
+(SmallClass)alloc;
-(SmallClass)init;
-(int)get1977;
-(int)getsum:(int);
@property int z;
@property (readonly) q:int;
@end
requires small_class_interface, small_class_implementation, package "foundation", package "objc"
;



println$ "z=" + sc.z.str;
println$ "q=" + sc.q.str;
sc.setz' 44;
// sc.setq' 44; // this should fail
println$ "z=" + sc.z.str;


If I uncomment the setq’ method:

SIMPLE NAME setq' NOT FOUND ERROR
In /Users/skaller/felix/objc-bind.flx: line 87, cols 4 to 9
86: sc.setz' 44;
87: sc.setq' 44; // this should fail
******
88: println$ "z=" + sc.z.str;

Routine: inner_lookup_name_in_env



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 20, 2020, 9:40:21 PM9/20/20
to felix google, Gordon Childs
Ok so now, I think everything requires is working EXCEPT

1. instancetype.
2. Class methods in protocols

The latter requires some more research. At present a class method in an instance
is modelled by just using the class name as part of the method name, for example:

@interface MyClass +(int)cmeth; @end

println$ MyClass’cmeth ();

This is tested:

...
+(SmallClass)alloc;
-(SmallClass)init;


var sc : SmallClass = #SmallClass’alloc.init;

Butr this isn’t much use is protocols. The alternative is to required the class object
as an argument AND/OR an instance. So it would work like:

SmallClass . alloc

or even

sc . alloc

if we allowed an instance. But this needs a type AnyClass to be defined which is
the type of class objects, and then we need to defined a constant of that type for
the class object:

const MyClassObject : AnyClass = “&MyClass”;

assuming the RHS actually works which assumes MyClass is a static
object. Something like that does work but the name is not as shown.

The downside of this is that all class methods would be global so for example
declaring “alloc” would be a problem, since there would only be one. I need to find
a better way to do this probably with a special function that accepts a selector value.

Anyhow, more research is needed there.

The instancetype problem is daunting. At the moment, the system cheats.
The AST from the parser is mapped into binding code in sex2flx in Flx_desugar,
the bindings are all ordinary Felix bindings so that is the total extent of the
invasion into the compiler. In fact it is only done there because its easier to decode
things in Ocaml than Scheme .. it all COULD have been done in the parser.

But instancetype requires tables of methods to be kept, and the bindings of
inherited methods deferred until all the tables are generated and partially
populated with explicit methods. Then the tables have to be filled in using
the transitive closure of non-overridden methods returning instancetype.

The actual methods so inherited are just the original ones with a downcast,
although build a function in Felix is laborious it’s really only one routine.
The big problem is that the tables have to be kept, installed in the symbol
table, and then populated by the transitive closure (AFTER installing in
the symbol table, and using a suitable topological sort to find the transitive
closure). Then the extra inherited methods get added to the symbol table
and the actual tables get deleted.

Its probably easier to use an auxilliary table. In that case the job can be
finished earlier, as soon as the separate parse trees of a program are merged
into a single tree.

It remains also to handle optional methods of protocols. This is normally done by
allowing a test to determine if a method is supported, but that’s standard library
stuff I think.






John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 21, 2020, 7:22:34 AM9/21/20
to felix google, Gordon Childs
Here’s a bad idea. All class instance and protocol instance methods have one
of the shapes

D -> C
or
D-> A -> C
or
D -> 0
or
D-> A -> 0

Suppose we have a special phanom type

instancetype_t

This type is bound as usual but it is a phantom so must be eliminated.

1. The variable C is the only one that can be instancetype_t
2. Only functional methods can have C = instancetype_t

So now it is possible by scanning the symbol table to FIND all these
methods.

Now, we ask, what is D? By construction is is always the type of the class
or protocol in which the method is defined. Therefore, we know the
containing class of all such methods.

Now, we could replace instancetype_t with D, to get a correctly formed
statically type safe method and we shall do so but we shall also do MORE.

Suppose we call the method with object type B. If the call succeeds,
then B must be a subtype of D. In this case the return type should also
be set to B.

We can do this with a trivial wrapper for the first case:

fun wrap1[D,B] (meth:D -> instancetype_t, obj:B) =>
C_hack::cast[B] (meth obj))
;

We’d prefer this:

fun wrap[D,B] (meth:D->instancetype_t) (obj:B) => …

however I’m not completely sure that will set B.

Note that despite the cast, it is typesafe! The function will fail
if meth cannot be applied to obj, in otherwords B has to be a subtype
of D for it to work.


The only hassle is type variables cannot play with subtypes directly.
So we must defer the correctness check like so:

fun wrap[D,B] (meth:D->instancetype_t, obj:B) =>
(meth (obj :>> D)):>>D
;

Coercions always succeed polymorphically and fail after monomorphisation.
We have to allow instancetype_t to exist now and be coerced to anything.
this one needs to be implemented with a static_cast. To be sure we can
defined instancetype_t now to be void* in C, which can always be cast
to any other pointer type.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 21, 2020, 12:29:20 PM9/21/20
to felix google, Gordon Childs
Yeha!

It works!! I hacked the compiler a bit. First we have:

objc-bind
@protocol Cpy
-(instancetype)cpy;
-(instancetype)cpywithmsg:(NSString);
@end
;

The actual objective C class defines these method for SmallClass:

-(instancetype)cpy { return self; }
-(instancetype)cpywithmsg:(NSString*)s { NSLog(s); return self; }

the Felix code declares a protocol Cpy:

objc-bind
@protocol Cpy
-(instancetype)cpy;
-(instancetype)cpywithmsg:(NSString);
@end
;

which is used in the SmallClass interface

objc-bind
@interface SmallClass<hasDescription, Cpy>
{
int x;
y: int;
}
+(SmallClass)alloc;
-(instancetype)init;
-(int)get1977;
-(int)getsum:(int);
@property int z;
@property (readonly) q:int;
@end
requires small_class_interface, small_class_implementation, package "foundation", package “objc”
;

You will notice the return type in the protocol is “instancetype”.
The method is not declared in SmallClass directly, and it is NOT INHERITED.

Instead I used a trick to make this work:


// Covariance
var sc2 : SmallClass = sc.cpy;
println$ "sc2.z=" + sc2.z.str;
var sc3 : SmallClass = sc.cpywithmsg' @"Copy with message";
println$ "sc3.z=" + sc3.z.str;

Notice, cpy IS returning a SmallClass, the type of sc2 and sc3 is explicitly specified
so it MUST agree with the return type of cpy and cpywithmsg’. Actually this syntax
now allows the RHS to be a subtype of the specified type, but we have the OPPOSITE
situation here: SmallClass is a subtype of Cpy!!!

It can only work by downcasting the return type of Cpy to SmallClass and that is what
is done. AUTOMATICALLY.

The second case is similar to the first but the first is easier to explain.

When Felix sees an application

method object

which has type instancetype, the expression is coerced to the type of object.
That’s it. We KNOW its an Objective C method because that’s the only place
we should be using instancetype. Also ALL method have one of two signatures:

D -> C
D -> A -> C

are the only two options, because, I generate them like that, C has three cases:

1) 0 if the method is a procedure (not interested)
2) an ordinary type (not interested)
3) instancetype

In case three we simply coerce the result of applying the method to a value
of type OBJECT to type OBJECT.

In fact instancetype is a special compiler known type, but it is actually set
equivalent to void*. Since an instancetype is the type of an instance of a class
and ALL such types are pointers to some structure named after the class,
a simple cast of pointers is done in the backend.

the covariance detection relies on the fact that with ObjC OO the argument
of all methods is an instance type.

NOTE: none of this has ANY impact on the underlying objective C code.
This is purely to get rigidly type safe Felix bindings. You can of course
still do any kind of Objective C junk and if your interfaces do not correctly
type the objective C stuff will not work.

This is the last major feature required to make instance methods work.
Class methods of class should also work. It’s not clear what class methods
of protocols would do since there’s no way to specify what the class is.
[We’d need a special type like classtype and there’s no way to deduce that]




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 21, 2020, 6:38:16 PM9/21/20
to felix google, Gordon Childs
Here’s how you make a Felix closure into a block:

var cls = fun (x:int, y:double):int { stuff .. return result; }

type myblock_t = “int (^)(int,double)”;
…. cexpr[myblock_t] "^int(int x, y:double) {return $1->apply(x,y);}” cls endcexpr ...

We can generalise myblock_t a bit:

type block2_t[R,P1,P2] = “?1(^)(?2,?3)” (R,P1,P2);

this is for two arguments, rinse and repeat for N=0,1,3,4, etc. The block cexpr can be generalised:

fun block2[R,P1,P2] : (P1 * P2 -> R) -> block2_t[R,P1,P2] =
“?1(^)(?2 p2,?3 p3)]{ return $1->apply(p2,p3)”
;

So t he block2 function takes a Felix function as an argument and returns a block.
Again we have to repeat for multiple arguments. This block captures the Felix function,
which is a pointer, from the environment.

block2 (fun (x:int, y:double):int {return x; })

should work. The Felix function has enough type information to let block2 deduce the types.
The actual function will be heap allocated and have some magical C++ class type,
so the expression will resolve to a pointer to that class, and that class will have an apply
method.

The HASSLE is this: Felix garbage collects things. But when you dispatch this block
to some system function or ObjC Api thing, Felix can no longer reach it so it might
be collected. To fix that we have to make it a root to prevent collection. That’s easy.
The problem is unrooting it when it is no longer required. Be nice if ARC could do that
but I suspect we would just tolerate leaks :-)


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 21, 2020, 10:09:52 PM9/21/20
to felix google, Gordon Childs
So here’s an actual working demo:

/////
header block0_typedef = """
template<class T>
struct block0 {
typedef T (^type) (void);
};
""";
type block0_t [R] = "block0<?1>::type" requires block0_typedef;
fun invoke0[R] : block0_t[R] -> R = "$1()";

var x = 42;
var f = cexpr [block0_t[int]] "^int { return $1; }" (x) endcexpr;

println$ f.invoke0;
x = 99;
println$ f.invoke0;
///////////

The header above is required because Felix does not support C type syntax.
When you say

type T = “U”

Felix just says

typedef U _t8476798;

This works fine for “int” and “int*” but not function types. So you have to first
make a typedef in C that reduces the function type to a syntax Felix accepts.
I used an old trick, a nested typedef, but the more modern

template<class T>
using typedefname<T> = typeexpr_using_T;

from C++ would work too. I should populate the C++ headers with these things.
(They’re needed for plugins already I think .. did I already do it??)

Having done that I can then use the C++ template to make a type binding.
Note that ?1 is replaced by the first type variable’s binding on use.
Finally I can define the invoke function to call a block0.

Having done all that, we can now defined a block using a cexpr, and then
invoke it. Note this block captures x.

Apple/Clang say they capture by value at the time the clousre is created.

WRONG. Apple LIES AGAIN.

The value of x is changed from 42 to 99 AFTER the closure is created
and stored in variable f. The second invocation prints 99.

Here is the ACTUAL CODE FELIX GENERATES:

void _init_(thread_frame_t *ptf){
ptf-> x = 42; //init
ptf-> f = ^int { return (ptf-> x); }; //init
{
_a17576t_65778 _tmp65786 = ::flx::rtl::strutil::str<int>(((ptf-> f)())) + ::std::string("\n") ;
::flx::rtl::ioutil::write(stdout,((_tmp65786)));
}
fflush(stdout);
ptf-> x = 99; //assign simple
{
_a17576t_65778 _tmp65787 = ::flx::rtl::strutil::str<int>(((ptf-> f)())) + ::std::string("\n") ;
::flx::rtl::ioutil::write(stdout,((_tmp65787)));
}
fflush(stdout);
}

ptf is a pointer to a struct of type thread_frame_t which contains the global data.
So ptf is on the stack whilst this particular function is executed, but x is not.
So I am guessing it copies ptf when the block is created, which has no effect,
it doesn’t copy ptf->x, the actual expression representing what is called x in Felix.

In particular blocks created inside C++ class non-static members will probably
just copy “this” not the non-static member variables of the instance.

Similarly blocks WILL NOT WORK AS ADVERTISED IN OBJECTIVE C.
Because in Objective C everything is a pointer to a structure, and only
the pointers will ever be copied. The only exception is int etc.
ObjC passes all class typed things by reference like almosty all OO languages.
And Felix passes the context of closures the same way: as a pointer to the
lexically enclosing functions frame object, which is precisely a C++ class instance.

Perhaps blocks will work in functional C++ code because C++ uses value semantics
for types like strings, vectors, etc etc. Whole data structures get copied if you pass
by value, using copy constructors. But this will ONLY apply to the machine stack,
not the context of methods implicitly refered to be this (well I haven’t tried it but
as a language designer .. it really HAS to work as descrbed). Its actually amazing
the compiler can tell when a variable refered to in a block refers to an upscope
automatic (stack) variable.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 21, 2020, 10:30:19 PM9/21/20
to felix google, Gordon Childs
Hmm, here’s something I didn’t implement:

id<p1, p2>

in ObjC this is an opaque type providing protocols p1 and p2.

In Felix, if some value x provides protocol p1, you can just write

fun f(x:p1) => …

because p1 is a type. You can dispatch all the methods of protocol p1 on x safely.

But there is no type p1_and_p2. A value conforming to both can be coerced to either.
this trick fixes it:

fun f(x1:p1, x2:p2) => ….
.. f (x,x) ...

but is hardly satisfactory. Protocols have record semantics, that is, if one has the same methods
as another and then some extra ones, its a subtype. Subtyping means adding methods.

Therefore a protocol with all the methods of both p1 and p2 is a subtype of both p1 and p2.
In Felix record type addition can express this I think. Have to check.

But we need a way to express it in the binding to objective C and I don’t have one.
We want to write:

fun f(x:(p1 + p2))

meaning “conforms to both p1 and p2”. I think this is actually an intersection type.
It’s a bit weird that an intersection EXPANDS the methods of a type but that’s the case
for the same reason adding methods to a type makes it a subtype (smaller not bigger).
you can think of adding methods as making the type “more specialised” and then it makes sense.

The problem can be easily solved by creating a new nominal type:

class both<p1,p2>;

Done. But this is really bad. We really do want a combinator. Felix actually has intersection
types but they’re not used because I never figured out the semantics. But this is a case
where we actually need them.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 21, 2020, 11:57:51 PM9/21/20
to felix google, Gordon Childs
Ok so I have introduced some convenience definitions:
In C++ header files:


template<class R,class P1, class P2>
struct _type3 {
typedef R (*cfun)(P1,P2);
typedef void (*cproc)(R,P1,P2);
#ifdef __clang__
typedef R (^bfun)(P1,P2);
typedef void (^bproc)(R,P1,P2);
#endif
};


and in Felix:

type cproc3[R,P1] = "typename _type3<?1,?2,?3>::cproc";
type cfun3[R,P1] = "typename _type3<?1,?2,?3>::cfun";
type bproc3[R,P1] = "typename _type3<?1,?2,?3>::bproc";
type bfun3[R,P1] = "typename _type3<?1,?2,?3>::bfun”;

The integer specifies the number of type parameters and ranges from 1-5
at the moment. Note that the Felix bindings are NOT the same type as
for example:

P1 * P2 —> R

which is a C function type in Felix. The underlying C types ARE the same
but Felix doesn’t know that.

The reason for these is that Felix generates C code for a variable or
parameter like this:

T name;

This works for T = int, and T = int *, but it doesn’t work if T is a function type.
Or Clang block type. Or an array type for that matter, or C++ pointer to member,
or any of the other idiotic syntaxes used for type in C. The templates bring the
syntax back to something halfway sane.

The point is when you use a Felix type bound to a C++ type, the compiler cannot
“see into” the quoted text in the definition. So it HAS to be a syntactically “atomic”
type expression.

So here is the old code again:

////////////////
header block0_typedef = """
template<class T>
struct block0 {
typedef T (^type) (void);
};
""";
type block0_t [R] = "block0<?1>::type" requires block0_typedef;
fun invoke0[R] : block0_t[R] -> R = "$1()";

var x = 42;
var f = cexpr [block0_t[int]] "^int { return $1; }" (x) endcexpr;

println$ f.invoke0;
x = 99;
println$ f.invoke0;
/////////////////

and here is the same thing using the new typenames:

///////////
fun invoke0[R] : bfun1[R]->R = "$1 ()";

var x = 42;
var f = cexpr [bfun1[int]] "^int { return $1; }" (x) endcexpr;

println$ f.invoke0;
x = 99;
println$ f.invoke0;
///////////

We got rid of the boilerplate crud by standardising it.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 22, 2020, 10:21:20 AM9/22/20
to felix google, Gordon Childs
It is necessary due to ObjC protocols to introduce intersection types.
I want to explain what they mean to myself mainly so I know how to implement them.

But first the motivation. Felix implements protocols and Objc instance types with inheritance
in a very clever way using the existing subtyping rules. Felix allows you to say a nominal
type U is a subtype of another type P. For stuctural types there are fixed rules built into
the unification engine.

The common example of subtyping which is at first confusing and then easy to understand
is record types, so we will start there. Suppose you have a function that requires a record
with fields f and g for example:

fun pr(x:(f:int, g:int)) => x.f.str + “,” + x.g.str;

Now, it is safe to call

f (f=1,g=2,h=3)

even though the argument is the wrong type, because it has the fields f and g
which function pr uses, the extra field can’t hurt. Because of this understanding
the argument type here isA value of the type of the parameter x, in other words,
its a subtype. The way we implement it is as follows:

First, the unification engine says “passing the argument is OK because its type
is a subtype of the parameter”.

Next, when we are binding the application we notice the type of the argument is
wrong, but since unification says its OK anyhow we COERCE the argument to
the type of the parameter so now the coerced argument is exactly the right type:

f ((f=1, g=2, h=3) :>> (f:int. y:int))

The coercion throws out the h field from the type, but what happens at run time?
In OO systems and FPLs with boxing everything is a pointer and probably nothing
happens. But in Felix, a record is a struct with the fields organised in alphabetical order.
So the coercion actually creates a new record of the correct type by filling in the required
fields from the argument and dropping others. The point is in Felix, coercions often
DO SOMETHING, they are not just casting the type to some other type.

For nominal types, the user simultaneously species both the subtyping rule
and the coercion operation in one statement:

supertype (x:Super): Sub => ….;

or if they’re primitive bindings

supertype : Super : Sub = “…. what to do here .. “;

So the lesson so far: for records, adding fields creates a subtype.

What about ObjC protocols?

Same lesson. Adding methods creates a subtype.

Now look at this Felix binding:

@interface X<A,B> …

What this says is that class X instances have all the methods of protocol A and
all the methods of protocol B as well as any other method specified.

In other words, X is a subtype of A and also a subtype of B. In Felix A, B are
nominal types generated by the binding but they’re PHANTOM types in the sense
that no value has exactly those ypes. X values have each of those types as well
as X, so subtypes of A and B but are not exactly A or B usually.

In ObjC class instance values are just pointers, and a protocol is underneath
is just some void* which is CAST to a type in the compiler to allow overloading
to find methods defined with that type as an argument.

But now theres a problem. A function that needs both A and B, what is the
parameter type?? What ever it is, X is has to be a subtype of it!

The answer is an intersection type:

A & B

Remember ADDING methods or fields makes a type SMALLER not bigger.
There are LESS values that have the additional methods than there are
without them. So intersection is ADDING methods because having a method
is a constraint, restriction, refinement or just plain smallifying operator :-)

The unification engine has the rule:

X < A & B iff X < A and X < B

in other words the argument type must be a subtype of BOTH the types
of a parameter whose type is an intersection of two types.

The problem is, how do we do overloading and what is the coercion requires
to force the argument to the right type?

The answer is obvious now. What we do is try to overload on the first type
of the intersection. If that succeeds, we coerce the argument to that type.
Otherwise we try the second type.

I want to note now that Felix is going to do it in order to even though
intuitiively & is symmetric:

A & B = B & A

in Felix this is NOT going to be true (although I could implement that
doing things in sequence is easier than in parallel :-)

It matters. Consider the intersection of two record type with the SAME
field. What happens? Is it ambiguous or what?

In Felix, GIVEN ordering, its no problem because records allow duplicate
fields. The duplicates do NOT merge, the intersection record has BOTH.
But one must be found first so ordering matters.

OK so whatr if we have a tuple containing TWO intersections each of two
types? Then we must consider all 4 possible combinations. That part is easy,
and the order of the loop through each intersection just got specified,
but which one do we loop through first??

And how do we tell??? Its easy to find an intersection type, but what if
the argument type is:

P * (Q & Y)

The answer in this case is obvious. This is exactly the same as

(P * Q) & (P * Y)

that is, intersection distributes over product formation. And clearly
over sum types too. The rule for function is a bit trickier :-)

The point is that we HAVE to expand the types so the intersections
are always at the top otherwise we cannot simply run through the
intersection components trying to overloads.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 22, 2020, 11:10:26 PM9/22/20
to felix google, Gordon Childs
YeHA!

A simple case of intersection types is now working. There’s more to do but
this is really nice:

//////////////////////////////////////
objc-bind
@protocol hasDescription
-(NSString)description;
@end
requires small_class_interface, small_class_implementation, package "foundation", package "objc"
;

objc-bind
@protocol Cpy
-(instancetype)cpy;
-(instancetype)cpywithmsg:(NSString);
@end
;

objc-bind
@interface SmallClass<hasDescription, Cpy>
{
int x;
y: int;
}
+(SmallClass)alloc;
-(instancetype)init;
-(int)get1977;
-(int)getsum:(int);
@property int z;
@property (readonly) q:int;
@end
;

var sc : SmallClass = #SmallClass'alloc.init;

// Covariance
var sc2 : SmallClass = sc.cpy;
println$ "sc2.z=" + sc2.z.str;
var sc3 : SmallClass = sc.cpywithmsg' @"Copy with message";
println$ "sc3.z=" + sc3.z.str;

// Intersection ******** LOOK HERE *************************
fun dc (x: hasDescription & Cpy) => x.cpy.description.str;
println$ sc.dc;
//////////////////////
sc2.z=44
2020-09-23 12:47:17.517 flx_run[22121:5639250] Copy with message
sc3.z=44
SmallClassInstance
//////////////////////

The function dc simply requires its argument to support both hasDescription and
Cpy protocols, which SmallClass does. The test first calls the cpy method
of the Cpy protocol, then calls the description method from the hasDescription
protocol on that.

Now do not be confused it is clear why it works at the Objective C level.
That’s nothing interesting. The critical thing is that we have two features:

1. covariant instance types
2. anonymous (structural) intersection types

which enable full static type checking with correct method selection overloading
on a value of intersection type. Felix looks for the method in each intersectand
type: it performs TWO overloads and merges the results.

Intersections should work for records too, however protocols are just void* underneath
which is fine because all ObjC class instance types are just machine pointers.
Record values on the other hand are structs and subtyping coercions actually
work by copying fields to construct a new value. Function coercions are even
trickier.

As well as this the implementation is not complete. It relies on the intersection
being at the top level of any type. If an intersection is buried in the type,
for example a component of a tuple, it won’t work.

The fix for this is simple: all nested intersections will be top leveled
by every type constructor to ensure the normal form (intersections
are always at the top level). there are a few other normalisation rules:

* an empty intersection is void
* a one component intersection is just the component

The main normalisation rule is problematic! Consider

(A&B) -> C

Do we really want this type to be

(A->C) & (B->C)

??

in addition the loss of principal typing means types with intersections may not
always work when they contain type variables. The binding of a subtype argument
to an intersection parameter should always work though: that case supports
principal typing.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 23, 2020, 8:03:57 PM9/23/20
to felix google, Gordon Childs
Just to be clear on the unification issue. In the Ocaml code there is a routine “ge p a” that measures if
a parameter is a supertype of an argument and provides the most general unifier (MGU) which makes
this so. The MGU is a list of assignments to dependent type variables which makes this so. Actually the
routine requires a set D as well which lists the dependent type variables we are trying to solve for,
and takes a list of inequalities. If an inequality is not appropriate the routine tries for equality. The code
is found in src/compiler/flx_core/flx_unify.ml.

For example here is part of ge:


| BTYP_array (l, BTYP_unitsum n), BTYP_tuple rs ->
if List.length rs <> n then raise Not_found;
List.iter (fun r -> add_ge(l,r)) rs

This examines if an array parameter type l^n is a supertype of a tuple argument
with component list rs. If the length of the array and tuple don’t agree Not_found
is thrown indicating unification is impossible. Oherwise we add new inequalities
to solve, l must be supertype of each r in the list rs. Note it is MORE than just that,
we are looking for a set of assignments to type variables which make this true
for all such r *simulateously*.

There is another rule:

| BTYP_tuple ls, BTYP_array (r,BTYP_unitsum n) ->
if List.length ls <> n then raise Not_found;
List.iter (fun l -> add_ge(l,r)) ls

which handles the case where the parameter is a tuple and the argument is a unitsum.
We also have more cases:

| BTYP_tuple ls, BTYP_tuple rs ->
if List.length rs <> n then raise Not_found;
List.iter (fun r -> add_ge(l,r)) rs

| BTYP_array (l, BTYP_unitsum n), BTYP_array (r, BTYP_unitsum m) ->
if m <> n then raise Not_found;
add_ge (l,r)

If you have follows some of the compact linear type stuff you will recognise the array
cases ARE NOT FULLY GENERAL. The algorithm cannot handle array exponents
which are compact linear but not unitsums. IT SHOULD.

Now, I am showing you this to show you how it works. For intersections:

| BTYP_intersect ts, t ->
List.iter (fun p -> add_ge (p, t)) ts

This says that the argument type t must be a subtype of the parameter type p,
for every p which is an intersectand of the paramater. We have to set the type
variables that make this so simultaneously. This is the standard subtyping rule.

Here’s the second case:

| t,BTYP_intersect ts ->
raise Not_found

WOOPS!!! There are some things to note. First, Ocaml processes pattern matches
in order so the parameter type t here CANNOT be an intersection because that
was already covered by the first rule. Also other cases are excluded because
the type constructors enforce normal forms. The ts list cannot be empty or contain
only one intersectand.

Never the less this case is obviously solvable:

int >= int & int

with an MGU containing no type variable assignments. And this one is too:

P >= A & A

where P is an independent (universally quantified) type variable, with the MGU

P <- A // assign A to P

because A & A = A. The rule for a JUDGEMENT here is simple: if ANY of the RHS intersectands
is a subtype of t, the intersection must be too. The intuition is: if one of the intersectands is a subset
of t, then intersecting it by anything can only make it even smaller, the result must remain
a subtype. Notice that this is sufficient but not necessary, it can still be that no intersectand
is a subtype, but the intersection is. This cannot be solved by structural comparisons however,
which is all unification engines can do.

There is a way to use unification to make this judgement. We only need to solve ONE
of the inequalities to be be happy. We cannot add to the list of equations to be solved
simultaneously, what we do instead is call unification RECURSIVELY for each component.
If at least one such call finds a unifier, the subtyping condition is satisfied.

THE PROBLEM IS THERE COULD BE MORE THAN ONE SUCH UNIFIER.
What do we do? Pick one? Unification is supposed to find the MOST GENERAL unifier.
And in this case there isn’t one.

Now I need to explain, why we MUST have the most general unifier. What we are actually
doing is taking the application of a polymorphic function, that is, a template, and specialising
it by fixing all the type variables so it handles a particular argument. Usually, the argument
can only contain existential type variables, that is, unknown types but ones which
cannot vary and are NOT dependent variables the unification is trying to set.
they look the same syntactically but they’re effectively constant types like “int”.
This happens when a polymorphic function is nested in another one, we’re overloading
the inner function and the overloads on the outerone are not relevant so its type
variables, universal when solving for it, are existential inside its body. This is something
type theorists never seem to understand.

But there is one special case: recursion. If a function calls itself we have to simultaneously
set the type variables for the outer (non-recursive) call to the function and the inner
recursion. We use alpha conversion to ensure we do not get confused about which
variables are dependent and which not in the inner call. But a solution requires
the most general unifier otherwise we might end up with polymorphic recursion
which we want to avoid: polymorphic recursion only works for polymorphic functions
and Felix has no polymorphic types (it uses type schema, so only functions can
be polymorphic).

In ny case, finding the most general unifier is the only way to allow successive
refinement. If we didn’t use the most general case subsequent refinements
which would otherwise succeed could fail.

But it cannot be done for the case above EVEN THOUGH WE CAN JUDGE
THE SUBTYPING IS SOUND.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 24, 2020, 1:06:27 PM9/24/20
to felix google, Gordon Childs
I’m still struggling with unification. So lets start with an example

K > T & U

Now i’ve been thing about this wrong. We could set T to K, and U to anything bigger than K such as W,
the universal type OR U to K, and T to say W. So there are two unifiers neither more general than the other.

But this case simply cannot happen! The type variables in an inequality are always on the LHS.
We do not need to track which variables are dependent. We do this only to simplify equalities.
For example the cases

T^N = U * U

and be solved for the LHS by T=U, N=2. If we swap the order

U * U = T ^ N

we get the same solution: its an equality, PROVIDED we’re hunting for T and N. So by keeping
track of the dependent variables we reduce the number of pattern matches needed by half.

But in Felix inequalities

P > A

are ALWAYS trying to find substitutions for type variables of a function occuring in a parameter.
There are no variables in an argument that need setting (except in the type recursion case where
a variable ends up on both sides and that’s a mu variable defining the recursion which has
nothing to do with unification, it just happens to drop out of solving equations).

Felix does sometimes test for equivalence like this:

A > B and B > A

however. For example if f:A->B and x:C, and we see f x, Felix checks if A = C.
If not Felix inserts a coercion: f (x:>>A) to force the argument to the function domain.
Equality of intersections is really hard. But actually we can do this without caring
about a unifier. We could just insert the coercion anyhow. After monomorphisation
the coercion can be removed if the types are equal and that is easy to judge.

The problem case is still

K > T & U

but now T and U are monomorphic so we’re only trying to find substitutions for type variables
in the K term.

This looks hard but it isn’t! On the RHS there are only a very limited number of possible cases!

If the RHS terms are simply monomorphic nominal types, then:

if T > U then T & U = U, so K can be matched against a single non-intersection
which we can already do. I neither is a subtype of the other, the intersection is
irreducible and so we can just set K = T & U.

Since we use nominal phantoms for method protocols, the answer for records
must be the same. But the thing is we can actually CALCULATE the intersection.
We just collect all the mentioned field, and intersect the result with any type variables
which are intersectands. There MUST become records on monomorphisation
or the intersection is void.

So actually there are only three kinds of base intersectands:
nominal types related by the subtyping graph the user supplied,
and records. Any other intersectand type has a shape, such as a tuple,
in which some component could be a base type (record or nominal type).
in this case ALL the intersectands must have this shape, since otherwise
the intersection is void.

In this case we perform a contraction, that is, we make a single term
of that shape, with a nested intersectand. Now we apply the rest of the algo.

For example:

(int * A) & (int * B) —> (int * (A & B))

I need to consider function types as well. But other types simply cannot be
intersected structurally, only their components can be.






John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 26, 2020, 8:21:24 AM9/26/20
to felix google
This works:

~/felix>cat tmp.flx
struct X { x: int; y: double; };
var x = X (y=2.3,x=1);
println$ x._strr;

~/felix>flx tmp
X {x=1,y=2.3}

It works for cstructs too. The demo shows you can provide a record argument to a struct
constructor, where the record has the same field names and field types as the struct.

But this doesn’t work and I can’t figure out why:

typedef AudioFormatFlags = uint32;
typedef AudioFormatID = uint32;
cstruct AudioStreamBasicDescription {
mBitsPerChannel : uint32;
mBytesPerFrame : uint32;
mBytesPerPacket : uint32;
mChannelsPerFrame : uint32;
mFormatFlags : AudioFormatFlags;
mFormatID : AudioFormatID;
mFramesPerPacket : uint32;
mReserved : uint32;
mSampleRate : double;
};

// FIXME: cstruct should accept a record
var asbd = AudioStreamBasicDescription (
mSampleRate = 44100.0,
mFormatID = kAudioFormatLinearPCM,
mFormatFlags = kAudioFormatFlagIsFloat,
mBytesPerPacket = elementSize,
mFramesPerPacket = 1u32,
mBytesPerFrame = elementSize,
mChannelsPerFrame = 1u32,
mBitsPerChannel = 8u32 * elementSize,
mReserved = 0u32);

If I make an uninitialised var, and assign the components, there are no errors.
However subtyping could interfere so that doesn’t prove it should work.

The compiler says the record has type:

record(
mBitsPerChannel:(uint32),
mBytesPerFrame:(uint32),
mBytesPerPacket:(uint32),
mChannelsPerFrame:(uint32),
mFormatFlags:(uint32),
mFormatID:(uint32),
mFramesPerPacket:(uint32),
mReserved:(uint32),
mSampleRate:(double))

which is correct and the field names are all correct too as far as I can tell.
the constructor argument type is:

uint32 * uint32 * uint32 * uint32 * uint32 * uint32 * uint32 * uint32 * double

however it really should accept a record also.

in fact, the responsible routine is in src/compiler/flx_bind/flx_reorder.ml and it
actually matches parameter names with the record argument.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 26, 2020, 3:46:13 PM9/26/20
to felix google


> On 26 Sep 2020, at 22:21, John Skaller2 <ska...@internode.on.net> wrote:
>
> This works:
>
> ~/felix>cat tmp.flx
> struct X { x: int; y: double; };
> var x = X (y=2.3,x=1);
> println$ x._strr;
>
> ~/felix>flx tmp
> X {x=1,y=2.3}
>
> It works for cstructs too. The demo shows you can provide a record argument to a struct
> constructor, where the record has the same field names and field types as the struct.
>
> But this doesn’t work and I can’t figure out why:

becasue, it was in an “open class thing { .. }

and the handling of shadow scopes didn’t check.
It does now. However the initialisation is still wrong.
I think it uses a reinterpret cast which cannot possibly work with a record.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 28, 2020, 8:55:14 PM9/28/20
to felix google
Subtyping judgements with intersection types and objC protocols.
==================================================

I ran into a surprise here so I want to explain what should happen.
There are some cases that do NOT work but I don’t have a test case yet.
Our domain of disource here is MONOMORPHIC subtyping and intersections,
polymorphic intersections are harder.

First: the representation of an ObjC protocol (or class) is a machine pointer.
Coercions are all nops: if A isA B, we can just use an A where we need a B.

An ObjC protocol is basically a set of methods. However this is IRRELEVANT
except to understand that an intersection of protocols

typedef AB = A & B;

consists of the methods or A and B. We will assume there are not conflicts
for the moment.

When you define a new protocol:

objc-bind @protocol AnB <A,B> @end;

this is NOT the same as the intersection AB. The reason is that this
is a NOMINAL type with coercions:

AnB -> A
AnB -> B

The intersection AB supports the same coercions. It SHOULD be the
case also that

AnB -> AB
// AB -> AnB NO NO

The reason for the last subtyping rule being not allowed
is to be seen by considering we MIGHT have written:

objc-bind @protocol AnB <A,B> -(int)x; @end;

so that AnB also has an extra method. Even though we didn’t, Felix
still does not allow that AB -> AnB although I could have implemented
that, code which typechecks would break if the extra method were added.

The core subtyping rules are:

A & B -> A
A & B -> B

the logic is, A & B has the methods of both A and B, so we can coerce it to A
but throwing away the B methods, for coercion to B we throw out the A methods.

If a type T is a subtype of A and also a subtype of B, it must also
be a subtype of A & B because it has all the methods of A and of B:

T -> A and T -> B ==> T -> A & B

Its not surprising, this is the definition of intersection!

What is not immediately obvious is this profound observation:

Let S be the category of types with arrows A -> B if A is a subtype of B.
The above rule now says

Let T.A,B be any types and
then there exists a unique h: T -> A & B
and functions piA: A & B -> A and piB: A & B -> B
such that ANY f:T -> A ,g: T-> B can be FACTORED THROUGH A & B as follows:

h . piA : T -> A = f
h . piB : T -> B = g

Do you recognise this rule?

YOU DANG WELL SHOULD!!!!! Go back to school if you don’t!!!
This is the DEFINITION OF A PRODUCT.

We consider the arrows of S to be unique, in that there is at most
one arrow from A to B, either A is a subtype of B or not.

in Felix, it would be ambiguous otherwise however it (should be) allowed,
Felix should just pick one coercion. We have to do this because suppose A -> B
and B -> C then A -> C using two coercions. However the user
can specify another coercion A -> C as an optimisation.

Note that UNLIKE tuples, the & operator is both symmetric and associative.
For ths subtyping lattice to be complete we need a minimal and maximal element.
The minimal element is clearly void or 0, it is a subtype of everything (including itself).
For ObjC that would be a NULL pointer because it clearly has “no methods”.

The top of the lattice is trickier! in ObjC it is “NSObject” if we like, or more extremely
the type id because everything is a subtype of id. This type has ALL the methods!
In particular ANY method can be safely called on id. Since this is clearly nonsense,
such a method call had better fail with an exception, the type represents something
for which a method call cannot return. The only possible exception is the copy operation
since its a nop.

In the literature this type is called w, which is lower case Omega.
[Its weird because w <= w -> w]

Felix as a related type defined by the recursion:

typedef any = any;

or the typecode

typedef any = x as x;

It is a self recursion:

\mu x . x

Every type satisfies the equation. i want to cover the rules for functions
using 0 and any. We note that 0 is not inhabited. We also note functional
programming weenies are WRONG not to allow this type! Some incorrectly
believe that because 0 maps to false via the Curry Howard isomorphism,
void cannot be allowed because the type system would be unsound.
This is WRONG. The theorem says the type system is unsound
if an INHABITED type maps to false, but void isn’t inhabited.

First we have to note 0 is not only useful it is MANDATORY.
A type system without it is NONSENSE. An example:

int ^ 0 = 1

is perfectly good data type, its an empty array which is precisely
the unit tuple. For all types T there is also a unique function

0 -> T

This function characterises T, that is, there’s a 1-1 correspondence
between types and the characteristic function of that type.

Of course this function cannot be applied because there are no
values of type 0, so there is no way to construct an argument
of type 0 to appy the function to, but it definitely exists.

note that if you have a function with “several parameters” one
of which is type void, the function cannot be called because
the product of 0 and anything is 0.

Ok, so what about a function returning 0?
Well a function you write which accepts an argument of type A,
cannot map any elements of A to elements of 0, because 0
is empty. So it must also be that A = 0. In that case, the
function cannot be applied, so the fact it returns 0 is just fine.
It is in fact the identity for 0.

Lets now consider “any”. If a parameter is anything, we result of evaluating
the argument can be discarded because the parameter cannot be used.
This is true even if a function (such as this one!) taking any is applied to it,
because that function, in turn, cannot use the value. However note such
a function can return a value. If the domain of the function is any, the function
is necessarily constant.

if a function returns any, its result cannot be used. The function

0 -> any

exists and characterises any. The function

any -> 0

does not exist because that would imply 0 = any.

Now the key thing is to understand that any is the unit of intersection:

any & B = B
A & any = A

and therefore is the right choice for the nullary intersection of nothing.
It is the analogue of the unit tuple type being the empty product, for indeed,
in the subtyping category with intersections as products, intersection is
indeed .. a product :)

Representation note: the function 0->T for any type can be represented
in Felix by a NULL function pointer (recall in Felix a function is C++ heap
allocated object passed as a value by passing a pointer to it).

The function any -> K can be optimised away if the constant value in K is known.
Otherwise it has to be a real function. Its argument, however, can be just unit,
provided argument actually passed is coerced to unit. This is safe, because
the resulting parameter value cannot be used.

To get the typing right, we must disguish type variables that can be set to
0 and those that can’t. For example 0 is a valid type for variable N in
the array formula X ^ N. However it is not valid in the formula M * N.
Although that product exists, it doesn’t uniquely have two components.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Sep 29, 2020, 8:30:06 PM9/29/20
to felix google
Felix has a feature designed to support using Felix functions as callbacks.
I’m going to explain how it works with a real example.

First here is the documented C type of the callback:

typedef OSStatus (*AURenderCallback)(
void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData
);


The following Felix C function almost has the right type:

cfun audioRenderCallback (
inRefCon: address, // void*
ioActionFlags: &AudioUnitRenderActionFlags,
inTimeStamp: &<AudioTimeStamp,
inBusNumber: uint32,
inNumberFrames: uint32,
ioData: AudioBufferList
) : OSStatus
{…}

namely

typedef audioRenderCallback_t =
address * // void*
&AudioUnitRenderActionFlags *
&<AudioTimeStamp *
uint32 *
uint32 *
AudioBufferList
--> OSStatus
;

Notice the —> used, which in Felix denotes a C function, as opposed to a Felix function.
C functions are machine pointers into the machine code. Felix functions are pointers
to C++ class objects which have an apply() method.

There is an issue in Felix at the moment, that read-only pointers like

&<AudioTimeStamp

are translated in the compiler back end to

AudioTimeStamp*

and not

AudioTimeStamp const*

which is required, so for the moment we have to cast the generated C function pointer:

ctypes AURenderCallback;

// Force the Felix cfun to have the type Apple wants
fun castCB : audioRenderCallback_t -> AURenderCallback =
"reinterpret_cast<AURenderCallback>($1)”
;

The pointer is installed in a control block:

cstruct AURenderCallbackStruct {
inputProc: AURenderCallback;
inputProcRefCon: address;
};

which accepts a C callback function pointer and a void* containing arbitrary
client data. The OS calls the callback passing the client data inputProcRefCon
to it as the first argument. Here is the required construction:

AURenderCallbackStruct (castCB(audioRenderCallback), (&sampleNumber).address)

where sampleNumber is an int, a pointer to which is cast to an address, installed in the
control block, and when the callback is invoked pass to it: the callback then casts
it back to an int*. The castCB is required only because of the const* issue.

================================================

What I’ve shown is the dumb way. Here’s the smart way:


// Define thunk
callback fun audioRenderCallback_thunk :
audioRenderCallback_thunk * // void*
&AudioUnitRenderActionFlags *
&<AudioTimeStamp *
uint32 *
uint32 *
AudioBufferList
-> OSStatus
;

This defines a C function where the SPECIAL use of the function name in its
own argument list signifies where the underlying C callback type expects
a void*.

Felix generates a C function which can be used as a callback, but which accepts
in the first slot above a Felix function pointer. The thunk calls the Felix function,
returning what it returns.

The thunk is a fixed C function which works for all callbacks of the specified type.

So the type of the thunk is in fact messy:

typedef audioRenderCallback_thunk_t =
(
&AudioUnitRenderActionFlags *
&<AudioTimeStamp *
uint32 *
uint32 *
AudioBufferList
-> OSStatus
) *
&AudioUnitRenderActionFlags *
&<AudioTimeStamp *
uint32 *
uint32 *
AudioBufferList
--> OSStatus
;

Notice this is a C function but the first argument to it is a Felix function!
The Felix function has the same type MINUS the client data.

Why remove the client data? Because Felix has closures, and the C function
pointer TOGETHER with the client data pointer is C’s way of representing
a closure. The Felix function points at a C++ object so it has both CODE
in the form of an apply() method and DATA as non-static members.

The type shown above is the exact type required to accept the Felix
function and execute it (passing it the other arguments).

but it is the WRONG type for the C callback, so we have to cast it:


// Force the Felix thunk to have the type Apple wants
fun castCB_thunk : audioRenderCallback_thunk_t -> AURenderCallback =
"reinterpret_cast<AURenderCallback>($1)”
;

It is wrong on TWO counts, the const issue as before AND because its
first parameter accepts a Felix function, BUT the AURenderCallbackStruct
requires a void* in that position.

Here is the actual construction:

AURenderCallbackStruct (
castCB_thunk(audioRenderCallback_thunk),
C_hack::cast[address](felix_callback &sampleNumber.address)
)

You can see the first argument is just the thunk, cast to the right type.
The second argument is the actual Felix function. It’s the client data.
It has to be cast to void* too, and I had to use C_hack here because
there is (as yet) no conversion in the library of function pointers
to an address.

I haven’t shown you the Felix function yet. Since the C callback works
we’re just going to call it instead of writing the code in Felix:


gen felix_callback (inRefCon: address) // the original client data
(
ioActionFlags: &AudioUnitRenderActionFlags,
inTimeStamp: &<AudioTimeStamp,
inBusNumber: uint32,
inNumberFrames: uint32,
ioData: AudioBufferList
) : OSStatus
{
while true do
yield audioRenderCallback (inRefCon, ioActionFlags, inTimeStamp, inBusNumber, inNumberFrames,ioData);
done
}

This is actually a coroutine. It yields the result returned by manually invoking the previously
defined callback. Normally you’d put Felix code in there.

Using a generator is cruicial to ensure the code is not reentrant, and thus preserves
state across calls (although this one has none). In addition, when a Felix function is invoked
it is clone()d first, to ensure it has a fresh object. This does a heap allocation and a copy
of the original function, which is in fact never invoked: it is a prototype “clean” function
only, used for cloning the actual function objects which can get dirty. The reason this
is necessary is to make the functions reentrant which is necessary to support both
thread safety and recursion.

in a generator, however, clone() simply returns the C++ this pointer. This means any
dirty variables retain their old values on the next call, which is exactly what we want
for our callback. Iterator functions in Felix are also generators. In our example
an IMPORTANT piece of state (actually the only piece) is implicit: the function
program counter. That too is preserved, and so after yielding the result of the first
call, on the next call it will continue on where it left off. We don’t use that feature
in our example but we could. this is the MAIN reason for using Felix callbacks.
They maintain state automatically.

Notice I used a Higher Order Function,
with the argument being the original client data, and which returns the actual function
to be passed to the thunk.

So this completes the explanation. Here are issues raised:

=========================

1. read only pointers are rendered in the back end as read-write pointers
Already issue on GitHub. Fix should be simple but you never know :-)

2. The callback thunk Felix generates has exactly the correct type
for FELIX to call the callback passing in the function to be called,
but if the thunk is used in a binding to a foreign API the type
will be a void*. We could change the type to a void*, and make the thunk
do the cast internally, but then a FELIX use of the thunk would have to
cast a Felix function to a void* (which the thunk would immediately cast back).

So the type required depends on whether the thunk is called by Felix or C.
Its acceptable to cast the thunk to the C type required, but it is NOT acceptable
the client has to write it out. The problem is you have to write the correct Felix
type out in full since that is the parameter type of the casting function.
The C type is usually defined in the host C code and the conversion
function itself will just be a reinterpret cast to that type. however the conversion
function return type has to be written so it is lifted from the C code.

In some case the client C API does not typedef the C function type
and in that case the Felix programmer will have to do it in a floating
header insertion in order to lift it into Felix, and to use it in C cast syntax
easily.

This is all a pain. The callback construction SHOULD PROBABLY also
provide at least the required typedef for the expected C function,.
and perhaps generate the cast needed as well.

More use cases needed and more research. Using a reinterpret cast
written by hand is extremely dangerous because it squashes type errors.

ONE solution is to have either

ccallback ….

or

callback cfun …

do the same thing EXCEPT it uses a void* in the client data position
instead of the Felix function type. It then has to do the cast internally.
This is also unsafe because any crap might be provided.

however in the C binding case, we have to cast the Felix function
to a void* anyhow when registering the callback, so that is the
danger point, because the client programmer is never actually
calling the callback, a foreign event loop is doing it.

in fact .. just changing the type might be the best answer since
almost all uses of this features are for bindings.




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Oct 5, 2020, 2:23:48 AM10/5/20
to felix google, Gordon Childs
I am designing a subtyping category and have hit the following snag (and solved it).

My rules for subtyping coercions is that they MUST be monic: they must be 1-1 and
be embeddings. Note that composition of monic arrows yields a monic arrow.

However Felix accepts the following record subtyping:

fun f(x:( a: int, b:int) ) => …
.. f (a=1, b=2, c=3) ..

which is CLEARLY NOT ONE TO ONE. The problem is in the fact record types
are CLOSED. The definition which works correctly is a thingo which has
*at least* the specified fields. Consider now polyrecords:

fun f[T] (x: (a:int, b:int | T) => …

This is a type of product *correctly* representing a type with a “lower bound
on the number of methods” … to put it crudely.

Note that the function is polymophic but the type is NOT. This is:

the set of all records with at least fields a:int abd b:int

Note this rule has an open interpretation as follows: any nonvid type
which replaces T is valid. If it’s a record the fields are merged,
tuples and arrays are just records with “empty” field names.
if none of the above, it is still a product of one type, which is identical
in Felix to that type (the projection is of course the identity function).

Claim: (a:int, b:int, c:int | K) is s subtype of (a:int, b:int | T).
Proof: set T <- (b:int | K)



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Oct 13, 2020, 3:05:49 AM10/13/20
to felix google, Gordon Childs
Is Felix channel I/O fast?

Test code below but here is the output:

Done 5000000 in 0.253824 sec or 19.6987M per second

That’s 20 million context switches per second. I think that’s
fast enough to keep up with audio signals. :-)

Zero allocations in the code. Using the new read/write_address primitives
which appear to work flawlessly.

/////// TEST CODE ////////
chip Add1
connector io
pin out: %> uintptr
pin inp: %< uintptr
{
var sample = 0;

while true do
var number: uintptr;
read_address (C_hack::cast[_schannel] io.inp, C_hack::cast[&address] &number);
++number;
write_address(C_hack::cast[_schannel] io.out, C_hack::cast[address] number);
done
}

chip Zeros
connector io
pin out: %> uintptr
{
var start = time();
var i = 0;
while i < nmsgs do
write_address(C_hack::cast[_schannel] io.out, C_hack::cast[address] 0ju);
++i;
done
var finish = time();
var elapsed = finish - start;
var ttrans = nmsgs * ntrans;
var rate = ttrans.double / elapsed / 1.0e6;
println$ "Done " + ttrans.str + " in " + elapsed.str + " sec or " + rate.str + "M per second";
}

chip printer
connector io
pin inp: %< uintptr
{
var number : uintptr;
while true do
read_address (C_hack::cast[_schannel] io.inp, C_hack::cast[&address] &number);
//println$ "Got " + number.str;
done
}

var ntrans = 5;
var nmsgs = 1000000; // 100K
device a1 = Add1;
device a2 = Add1;
device a3 = Add1;
device a4 = Add1;
circuit
connect Zeros.out, a1.inp
connect a1.out, a2.inp
connect a2.out, a3.inp
connect a3.out, a4.inp
connect a4.out, printer.inp
endcircuit
/////////////////////


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Oct 18, 2020, 7:58:11 PM10/18/20
to felix google
This code generates a direct loop (no iterator object):

var k = 10;
for i in 0 ..< k perform println$ i;


So does this:

chip silence
connector io
pin acquire : %< &block256_t
pin out: %>&block256_t
{
var p: &block256_t;
while true do
read_pointer$ io.acquire, &p;
for( var i = 0; i < 256; ++i;) do p . i <- 0.0f; done
write_pointer$ io.out, p;
done
}

but if I change the for loop to read

for i in 0 ..< 256

it uses an iterator which is a heap object.

_generator_async_audio_743 = (FLX_NEWP(iterator)(ptf))
->apply(::flx::rtl::_uctor_(6, new(*ptf->gcp, _at100795_ptr_map, true) _at100795 (_at100795(0, 256)))); //init
continue__ll_async_audio_736_L69974:;
_vI64711_silence__apos_2_mv_64711 = (_generator_async_audio_743)->clone()
->apply() /* general apply */; //init
/*begin match*/
/*match case 1:|Some i_param_tone-player_64710*/
if(!andthen(FLX_VNI(_vI64711_silence__apos_2_mv_64711)==1,(FLX_NEWP(_lam_64714)(ptf, this, ptrsilence))) ) goto _ml64713_L69975;
_vI69972_i_param_tone__hyphen_player_64710 = *((int*)FLX_VNP(_vI64711_silence__apos_2_mv_64711)); //init
{if(FLX_UNLIKELY(!(((static_cast<_a112t_100777>((static_cast<_a112t_100777>(_vI69972_i_param_tone__hyphen_player_64710)/*int.flx: ctor size from _a112t_100777*/))/*int.flx: ctor size from _a112t_100777*/) < 256))))
FLX_ASSERT_FAILURE("/Users/skaller/felix/src/packages/arrays.fdoc",46,5,46,27);}
_vI96451_a = _vI69968_p; //init
*&(_vI96451_a->data)[(static_cast<_a112t_100777>((static_cast<_a112t_100777>(_vI69972_i_param_tone__hyphen_player_64710)/*int.flx: ctor size from _a112t_100777*/))/*int.flx: ctor size from _a112t_100777*/)]=0.0f; // storeat
goto continue__ll_async_audio_736_L69974;
_ml64713_L69975:;
/*match case 2:|None*/
if(!(FLX_VNI(_vI64711_silence__apos_2_mv_64711)==0) ) goto _ml64715_L69976;
goto _em64711_L69977;
_ml64715_L69976:;

This is EXTREMELY bad in this context, and even worse the “andthen” method is used
which is a shortcut and with type

bool * (1->bool) -> bool

in other words we get shortcut behaviour by passing the second argument as a closure.

the question is how has the parser decided to use the grammar for a GENERAL iterator
when the specific start/end case could have been used, and WAS used in the micro
test given at the start of this post?

technically, the two element case is a subset of the general case both syntactically
and semantically but the performance difference is extreme. In other words

0 ..< 256

is actually a slice object which has an iterator method. Felix handles this:

var range = 0 ..< 256;
for i in range do .. done

where in general the range object is just any iterator, which is nice,
but the special case needs is only used some of the time. Hmmm...





John Skaller
ska...@internode.on.net





John Skaller2

unread,
Oct 18, 2020, 11:19:47 PM10/18/20
to felix google
trying to work out constraints on subtyping coercions, I have a good set of adequate rules:

1. If A -> B and B -> A then all arrows A->B are equal, all arrows B-> A are
equal and inverse to arrows A->B, hence equivalent objects must be isomorphic

2. If f: A -> B and g:A->B then f = g

3. All arrows must be monic

Now, we really want subtyping coercions to be embeddings, for sets this is
easy, its harder to know how to specify that. Anyhow one of the seminal papers:

http://reports-archive.adm.cs.cmu.edu/anon/1999/CMU-CS-99-156.pdf

Liskov and Wing. I warn the paper is old, waffly, and fundamentally flawed
although the idea is roughly correct. For example this constraint:

Subtype Requirement: Let (x) be a property provable about objects x of type T.
Then (y) should be true for objects y of type S where S is a subtype of T.

the "Liskov substtitution principle", is clearly bogus: we think a record with
two fields is a subtype of one with one field, yet by specification the
property “has N fields” is not shared by the two types.

In fact the fault is not properly identified in a lot of literature,
a record of two fields CANNOT EVER be a subtype of one with
one field. The clue is the missing but implied word “precisely”.
to make subtyping valid we have to change the specification
to read “at least N fields”.

In Felix, a monomorphic record is a closed type and the current subtyping
rules are just plain wrong. If we use the open specification “at least N”
we recover but the type of a polyrecord make this clear:

(a:int | T)

means “at least the field a:int” but any extra are subsumed by the type variable T.
Ocaml has explicit open types, in fact, it can say “at least” and “at most” (the latter
being required for coproduct subtyping).


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Oct 24, 2020, 1:39:47 PM10/24/20
to felix-l...@googlegroups.com
I’ve just rewroked some of the logic for binding applications so a quick review.
The general case of an application is:

f a

or

a.f


where f is a function expression domain a supertype of the type of a.
There is a “special" case in which f is a function name, in which case
the “best match” of all the matching functions of that name in the same
scope is chosen, if any, or a more outer scope tried: this is called
overload resolution. I put “special” in quotes because this case is far
more common than an expression which is not a simple name.

There is another special case, where f is not a function.
Generally, the user can define an apply function with a pair
of arguments (f,a) which will be tried if f is not a function.
The first argument f must not be a function. For example
the library defines

fun apply (f:string, a:string) => f + a;

meaning you can write

“Hello ” “World"

and mean string concatenation. Mathematicians often write

2x

to mean 2 * x and you can define this by

fun apply (x:int, y:int) => x * y;

There are also some special builtin functions including but not limited to:

\langle f,g \rangle // mediating morphism of product
\lbrack f,g \rbrack // mediating morphism of sum
_tie (f,g) // tuple of pointers
\prod (f,g) // parallel function product
\sum (f,g) // parallel function sum

as well as generics _strr, _map and some others I can’t even remember.

There are more special cases though. These are projection shortcuts

1 x // tuple or array projection (or pointer)
i x // array projection (where i is int type)
clt x // array projection where clt is index type
field r // struct or record field projection

and note the most common use is like x.1 rather than 1 x although
these are the same.

There’s also a very dubious case for

+ (a,b)

usually written

a + b

which is record or tuple addition, that is, concatenation of fields.
This is very dubious because it can only work if + is not actually
defined on the argument types.

Then there’s the special case where f is a type and the user
defined a ctor function. And a related case where f is the name
of a struct or cstruct, in which case it can be constructed from
either a tuple of the correct type, or a record of the correct type.

In addition, a function can accept a record with fields the same name
as its parameters, simulating named arguments, and on top
of all that you can have default arguments as well!

It’s even worse because overload resolution not only depends on subtyping
rules, Felix ALSO allows constraints, and on top of that the special kind
of constraint called a typeset (which is going to be replaced by a kinding constraint
eventuall I hope).

The order in which all these things are tried is crucial. A special case can fail two
ways with distinct consequences:

(a) the special case applies, but was wrong
(b) the special case doesn’t apply

In case (a) you get a hard error, for example

(1,2).3

is a hard error of kind (a) because there is no third component of the argument tuple,
however

1 . 3

is a soft error of kind (3) because 1 is not a tuple of two or more components,
in which case other special cases can be applied (in this case the user
“apply” function is used if defined .. otherwise its a hard error).

It’s very tricky to get good error messages here because there are so many
cases and the compiler cannot guess what the user really meant when NONE
of the cases apply. It should assume an overload resolution failure
if the ‘f’ is a function.

The complexity also means it is possible for a wrong interpretation to type check
and compile giving the wrong result, this is most common with plain old overloading
when the user defines things in global scope (JUST DONT!)

There is a TRADEOFF here between succinct notation and good diagnostics
and readability. Using “;” as a grammatic terminator in combination with
allowing “sin x” to be a function application is another example, if you leave
out the “;” by mistake you will often get an overload resolution error,
and the fact Felix does not have keywords makes this even worse.

By contrast this kind of error is rare in Ocaml, because there is no overloading,
and it has keywords which are often leading so that in fact “;” is not needed
as a terminator. However the errors in Ocaml can also be confusing because
of type inference.

All in all, its too much, but its hard to leave ANY of these features out.
For example

(proj 1 of (int * int)) (1,2)

is the full form of projection application, which is a lot more verbose than

(1,2).1




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Oct 27, 2020, 6:44:01 AM10/27/20
to felix google
I hate this a LOT. What Python does is much better.

In Felix is you include a file X, and X includes Y, you get Y as well as X.

The original justification is that there’s no harm in including everything:
separation of APIs is done using classes.

Open directives, for example, are not transitive.

OTOH inherits are weird. Inherits are transitive, but they only impact
inherits and opens, that is, unqualified name lookup. Explicit qualification
ONLY finds symbols actually defined in the specified class.

This means for example:

class X { type A = .. type B .. }
class Y {
inherit X; typedef B = X::B;
}

var a : Y:: A = ERROR
var b : Y:: B = OK

because the typedef defines a new symbol in Y but the inherit does not.

This means there is no systematic and simple way to synthesise a class.

Note that “use” and “redefine” actually define new symbols!
But inherit simply allows chasing definitions when an unqualified
name is found to be inherited.

All this is deliberate .. which doesn’t make it right.




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 3, 2020, 5:38:52 PM11/3/20
to felix-l...@googlegroups.com
https://www.cs.cornell.edu/courses/cs6110/2017sp/lectures/lec30.pdf

Very readable for cognition impaired programmers like me


John Skaller
ska...@internode.on.net





blu...@fi.muni.cz

unread,
Nov 3, 2020, 7:00:34 PM11/3/20
to felix-l...@googlegroups.com
John Skaller2 wrote:
> https://www.cs.cornell.edu/courses/cs6110/2017sp/lectures/lec30.pdf
>
> Very readable for cognition impaired programmers like me

There is also a great lecture series by Frank Pfenning at the OPLSS 2013 workshop.
The videos are available at:

https://www.cs.uoregon.edu/research/summerschool/summer13/curriculum.html

Achim
signature.asc

John Skaller2

unread,
Nov 3, 2020, 8:08:39 PM11/3/20
to felix google
Thanks, I’ll check that out.

I am trying to figure two things out.

1. The usual way is to embed shared variables inside linear logic.
Can this be done the other way? Felix starts with standard logic, sharing by default.
Then it adds a special type constructor Uniq T. Can this work? I actually use it
successfully to handle malloc’d C null terminated strings, but that doesn’t prove
it works in general.

One problem I found was that i made a couple of functions like “new” and
some list operations return uniquely typed results (like map for example).
The problem is I had to explicitly throw out the uniqueness in most code
that actually used these functions so I stopped returning the correct typing
for things because I had to modify the whole library and then find
the idea doesn’t really pan out.


2. How to do borrowing?

A unique variable can be passed to a function that forgets it
quite safely *provided* the variable is NOT killed (consumed).
This is a LOT more user friendly that forcing the function,
which only plays with the refered to thing but doesn’t pass
it on or dispose of it, to return the value so the original
variable, which was killed calling the function, can be relivened.

At the moment I just CHEAT. The control flow analysis the validates
uniqueness safety cannot handle pointers except in special cases
so it ignores them. Thus the correctness of a program is NOT guarranteed
in all cases, but this is fine. I’m happy with it catching some errors, its better
than none, and the constraint (if you don’t use pointers to unique variables
safety is enforced) is reasonable. So I implement borrowing by just
passing a pointer to a function, which relies on the function actually
not breaking the rules (do not pass the pointer on or delete what it
points at).

I’d rather not cheat. Maybe I need another type constructor:

Affine T

or something?

There is MAJOR need for linearity in Felix. Felix is fundamentally based
on coroutines (a procedure is a coroutine that doesn’t actually do any channel I/O).
Transfering data along a channel is intrinsically a linear operation.
It would be good to enforce this with the type system.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 10, 2020, 2:36:22 AM11/10/20
to felix google
It’s not always clear:

var ctl =
let nominal_period = 256.0 / 44100.0 * 0.01 in

Ctl (
clockStopped=false,
period = nominal_period,
nominal_period = nominal_period,
runflag = true,
circularBufferSpeaker = mkcb(),
circularBufferMicrophone= mkcb()
)
;

The mkcb function constructs a circular buffer. I was playing and added a diagnostic print.

Length = 131072, used=0, avail = 0
Length = 131072, used=0, avail = 0
Length = 131072, used=0, avail = 0
Length = 131072, used=0, avail = 0
Length = 131072, used=0, avail = 0
Length = 131072, used=0, avail = 0
Length = 131072, used=0, avail = 0
Length = 131072, used=0, avail = 0
Length = 131072, used=0, avail = 0
Length = 131072, used=0, avail = 0
Length = 131072, used=0, avail = 0
Length = 131072, used=0, avail = 0


Eh what?? 12 calls?

Well yes. It’s not optimal but correct: Ctl is initialised by setting

ctl.field = inrec.field

for each of 6 fields. So it is perfectly valid for inrec to be constructed 6 times,
a component selected, and the other 5 discarded. One would have prefered it to be
optimised so the discarded fields were not needlessly set but there’s no assurance
that the compiler will spot this optimisation. Bad in the above case because resources
(memory) were consumed and this one isn’t garbage collected so it leaked.

But far worse, if the compiler DID spot the optimisation, and a generator
you expected to run is not., eg:

val x = 1, someGenerator();
println$ x.0; // x.1 never used

This could never call someGenerator() and indeed we want that.
The fix AT THE MOMENT is

var x = 1, someGenerator();

because a variable must be eagerly initialsed when control flows through it.
This may change, but at the moment Felix respects that rule. Probably it
shouldn’t; if the whole variable is not used Felix mandates the variable
and its initialiser are elided. Its not clear it shouldn’t also make this
assurance for components of a product: the main obstacle is maintining
the illusion that it has the full product type, even though some components
are never used.

The code to track this exists already in the uniqueness typing checker.
Components of a variable with a product type are called fairy variables.

So take care. If you use functional constructions make sure your code
is purely functional or you might get a big surprise!


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 18, 2020, 10:25:37 PM11/18/20
to felix google
I am planning to upgrade the RTL so a fibre control object (fthread_t) contains
a pointer to the scheduler model (data) object used by the scheduler that spawned it.

Read and write will change so when a matching I/O operation is serviced,
the operation moves the waiting fibre from the schannel and puts it back
on the scheduler model from whence it came.

For this to work the pointer must not be invalidated. I need to prove
this is the case (I believe it must be so already but if not it must be
enforced by further modifications).

The effect of this change is profound. It means fibres cannot migrate.
Previously, the waiting fibre migrated to the running fibre’s scheduler
(that of the running fibre).

IN PARTICULAR even though spinlocks ensure svc synch operations
are atomic, a fibre migrating from one PTHREAD to another is not
comprehensible behaviour. With change, a major new facility is
available: schannel communications across pthreads will now
behave in the expected manner. In particular, pchannels will no
longer be needed.

The next thing to look at is async events. Now that the correct
scheduler is known, the interface can be simplified a bit.

However to use spinlocks, the current mechanism of rooting
a fibre waiting for an asynchronous event so it remains reachable,
will not longer suffice. This is because the current rooting machinery
uses and STL map from the address to a counter, and accesses
are protected by a mutex lock. Unfortunately STL map can allocate
which is also bad (because malloc() might need to call the OS).
Also unfortunately, a singly linked list will not work in O(1) for random
add/delete orderings.


It must be noted that Felix bound queues use special locks
and condition variables, not standard ones. This is because
a locked up pthread cannot respond to a GC stop world
request in a timely manner. Instead all operations must
use timed waits and implement a polling loop which checks
periodically for a world stop.

For external event loops there is another mechanism: when a
pthread yields to an event loop it removes itself from the
registry of active threads. This works because the machine
stack is empty of Felix data at this point, and, because this
thread can no longer change data structures, so it is effectively
also yielded to the GC.





John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 20, 2020, 9:22:01 AM11/20/20
to felix google
Ok it’s implemented and in test. There's a known deadlock possible,
due to using a nested lock guard.

Now fibres cannot migrate, BUT any fibre can read/write to any other.
If you write to a disconnected channel, then run a new scheduler
with a fibre that reads that channel, the writer is released but can’t
run (because another fibre has called run, creating a nested scheduler).
Could still be useful.

Apart from the deadlock to be fixed, schannels should now work
across pthreads as well.

There’s a caveat with cross-scheduler I/O: if a fibre writes and releases
a starving reader on another scheduler, the writer continues. Previously
the reader always went first, so it has a chance to copy the poiinted at item.

I am not worried because I believe write/read should be linear, that is,
the pointer written must be the exclusive (unique) owner of the pointed
at object, the reader then acquires exclusive ownership, and so shouldn’t
need to copy it.

Of course writing a shared object is also allowed: what is NOT allowed is writing
a pointer to an object which could be changed on the reader unexpectedly.

=========

This leads to an interesting idea, that say within a scheduler list which is shared
by pthreads, so that coroutines can all be concurrent, we can create a nested scheduler
where all the coroutines are sequential. It’s always been a vexing problem, that in a set
of concurrent routines, some need to be sequential. The idea is to preserve the fact
sequential coroutines do not need locks, concurrent ones do .. will think more on this.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 23, 2020, 5:13:02 AM11/23/20
to felix-l...@googlegroups.com
So changes: schannel I/O and other sync operations use spinlocks as required.

Previously locks were only used if the active list was shared by two pthreads.
Now the locks are always used even if there is only one pthread.
The reason is that separately spawned pthreads can now communicate
with schannels: these pthreads have their own schedulers and active lists.
There’s no easy way to tell if there is only one pthread running.
This will slow down performance for singly threaded applications,
but allow greater flexibility and speed up applications that do use
multiple threads.

The deadlock mentioned previously has been removed.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 30, 2020, 9:13:49 PM11/30/20
to felix google
Felix now supports Vasiliy Tereshkov’s statically typed embeddable microinterpreter Umka:

https://github.com/vtereshkov/umka-lang

This should run:

//////////////////////////////////////////
include "std/umka";

open Umka;

var amt = 1024 * 1024;
var umka = umkaAlloc();
var hellostr =
c"""fn main() {
printf("Hello Umka!\\n")
}
""";


var ok = umkaInit (umka, c"Dummy", hellostr, amt, amt,0,C_hack::cast[+(+char)]0);
println$ "init ok = " + ok.str;
ok = umkaCompile umka;
println$ "compile ok = " + ok.str;
ok = umkaRun umka;
println$ "run ok = " + ok.str;
umkaFree umka;
//////////////////////////////////////////

The idea is to create a C API, make an Umka binding to it, and then
allow client side “eval” like behaviour as shown in the example above.

One use would be to program the construction of dynamic metatypes.
At the moment this would have to be done in C, then compiled, linked
into a shared library, and loaded, all at run time without error.




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 7, 2020, 10:52:16 PM12/7/20
to felix google
At present the system structure is like this:

Thread -> scheduler <-> fibres -> coroutine -> global data

I recently added the weak pointer from fibres to scheduler.

If we want a per thread allocator, we’re missing the way for the
coroutine to discover it.

Extra pointers cost. Although schedulers are few, two threads can
share a scheduler, so any allocator pointer would have to be in the
scheduler (so the policy would be per scheduler not per thread).

As a complication, schedulers can nest: a coroutine can create a new
scheduler with the “run” procedure (which is a subroutine!). Coroutines
already contains several pointers:

* program counter
* svc pointer
* chain link
* return pointer

as well as, usually, a pointer to the display, including the global data (thread_frame_t).
These are the most common objects at the system level. ;(

The GC is currently part of the global data (thread frame).




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 10, 2020, 11:56:35 PM12/10/20
to felix google
So, our problem is to create a real time thread handling circuits.
To do this, we may use a non-Felix thread so it doesn’t have to stop on GC.

For this to work correctly, either

(1) none of the components are GC allocated,
and don’t contain strong pointers to GC objects,

OR

(2) All the objects are rooted. The reason for this condition is that
the paths between fibres, coroutines, and channels wiil be dynamically
changing whilst the GC is running because the RT thread does not stop.

Although I think (1) is the best solution it is harder to implement and
subject to programmer errors; most chips are created by forming closures
over the pins, so reachability matters a lot.

The requirement for RT signal processing is that all the coroutines run
forever and do not spawn any objects on the GC heap; this is the case
for most signal processing systems which, in order to be real time in
the first place must consist of connected finite state machines only.

So my thought for (2) is to make a modification of create circuit command
which somehow returns a frame object containing pointers to all the
fibres and channels created.

All the fibres will be

(a) pointed at by the scheduler
(b) pointed at by schannels
(c) held by async I/O operations (typically an alarm clock)

The set of fibre is also fixed. However the fibres in the scheduler
run pointer, active list change with channel I/O which is protected
by a spinlock invisible to the GC.

So we really need to root the whole set. My thought is to use a procedure
object to spawn the fibres, magicallyt get the fibre pointers into that object
as well, and store the data frame of the procedure in a reachable place
nominated by the user; this ensures all the objects created are always
reachable, no even if the paths between objects are changing whilst the
GC is running. Hopefully the GC algo will still terminate .. :-)

When the client wants to kill the signal processor, they will set a flag which
will stop some of the loops in components and cause the processing
framework to collapse, so the scheduler will finally return. However now
the client must NULL out the pointer to the object holding everything
to make sure it is reachable.

i had a look at the compiler and it spawns the fibres in a sequence in the
current scheduler. It somehow need to use “spawn_and_return_fibre”
to get the fibre pointers and save them. The channels seem to be created
in the current space which seems wrong (they should be created in a procedure
which binds everything then returns, so the channels are forgotten, otherwise
the circuit won’t terminate correctly). Any which way, it should be possible
to collect the channels into a frame.

So basically PART of the issue can be solved by adding an enhancement
which allows collecting the created fibres and channels into an object a pointer
to which is returned and saved, and, to allow the GC to resume tracing when the
thread is stopped, this pointer is just forgotten (perhaps by NULLing it out or
just forgetting it .. )

This will allow circuits to run in places the GC cannot see directly, this
includes foreign RT threads, but also it means they can be run by foreign
C++ code as well, Specifically, since “run” is just a subroutine a foreign
API calling it can be passed a Felix created scheduler so foreign
C++ can just call it like any other subroutine.

None of this solve the problem of creating and managing the RT aspects
of the RT pthread; but it does allow that thread to run continuously
even when the GC is running, without need for any world stop.



John Skaller
ska...@internode.on.net





Garrett Bluma

unread,
Aug 11, 2021, 3:56:29 PM8/11/21
to felix-l...@googlegroups.com
Remind me again (memory is a bit fuzzy on it), the felix channel logic doesn't use the stack, so stack swapping isn't an issue either, right? If we looked at the assembly we wouldn't see push/pop and ret calls for the stack pointer (at the chip level), it would just look like a loop with short jumps between the chip segments, right?

--
You received this message because you are subscribed to the Google Groups "Felix Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-languag...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/felix-language/88815F14-006E-429A-9F84-CBF65E6B1A87%40internode.on.net.

John Skaller2

unread,
Aug 12, 2021, 1:03:57 AM8/12/21
to felix-l...@googlegroups.com


> On 12 Aug 2021, at 05:56, Garrett Bluma <garret...@gmail.com> wrote:
>
> Remind me again (memory is a bit fuzzy on it), the felix channel logic doesn't use the stack, so stack swapping isn't an issue either, right? If we looked at the assembly we wouldn't see push/pop and ret calls for the stack pointer (at the chip level), it would just look like a loop with short jumps between the chip segments, right?

if it were optimised. But in the dynamic form a service call works by:

a) put a pointer to the request in the continuation object in a reserved place
b) return this (pointer to current object)

Then, the scheduler, after the service is done, will resume it by something like:

while(ptr) ptr = ptr->resume();

This is why the machine stack must be empty when doing a service call, more precisely,
saying “return this” will indeed empty it back to the stack the scheduler had. So all the
“local variables” have to be non-static members if they’re to be preserved.

Felix has no idea what a coroutine is. It knows only service calls, they’re
a virtual machine instruction.

If the compiler (which the nurtl does not use yet) were taught about coroutines,
you’re right, one could just jump from one section of one coroutine to another.
It all depends if the place to jump is statically known, in a variable, or whatever.
With the scheduler, it’s dynamic.

It should still be pretty fast.

The new allocators are much faster allocating than the Felix GC, and there’s
no world stop, but there are design issues avoid cycles.

The channel I/O is slower because there are two levels of reference counting
involved to get the termination semantics correct.

So the “real time” system may not be as fast overall but latency should be bounded.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Aug 12, 2021, 1:38:12 AM8/12/21
to felix google
Some tweaks.

i noticed the “chips” don’t work at all like other continuations. They’re infinite loops so do
not need a _caller variable, which has to be artificially set to nullptr. And they’re
spawned, not called.

So I think callable objects like *subroutines* which can return are distinct
from *coroutines* which do not.

Of course we’re using continuation passing in both cases, but differently.

in principle a *routine* has arguments, but if that includes a continuation
it should be passed explicitly as an argument. A subroutine is then a special
case where the passing of the “caller” current continuation is more “implicit”.

At the moment the caller has to be passed explicitly,



John Skaller
ska...@internode.on.net





Garrett Bluma

unread,
Aug 12, 2021, 11:53:14 AM8/12/21
to felix-l...@googlegroups.com
Ahh. Forgot about fibre scheduler and coroutine pointers. Thanks.

--
You received this message because you are subscribed to the Google Groups "Felix Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-languag...@googlegroups.com.

John Skaller2

unread,
Aug 13, 2021, 6:58:51 PM8/13/21
to felix google

> So I think callable objects like *subroutines* which can return are distinct
> from *coroutines* which do not.

However i can’t find a good way to encode this.

I tried a get_caller() method which works. However, when a fibre is deleted its stack
of subroutines is unwound. I can avoid storing a nullptr in a coroutine, but then
a virtual dispatch is needed to fetch it: coroutines return nullptr.

On return, a subroutine is normally deleted. In fact if a coroutine returns
it should be deleted too.

HOWEVER a subroutine returning becomes inactive and shouldn’t run
again BUT it’s local variables could be required by a conceptually
nested subroutine which the parent subroutine is returning as
in the ML/Felix code like:

fun f (x:int) (y:int) => x + y;

which means

fun f(x:int) => fun (y:int) => x + y;

i.e. f is returning a nested lambda which binds
to the variable x. So we’d better NOT delete it in this
case. The program counter and return address SHOULD be lost,
however in my model a single object holds that information AS WELL
AS LOCAL VARIABLES. This is just for optimisation.

In theory the control values should be separated from the local variables
so the latter can persist when the former go out of scope (but that requires
two heap allocations instead of one).


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Aug 14, 2021, 3:48:26 AM8/14/21
to felix google
Ok so there are some simple changes.

There are now two classes:

coroutine_t
subroutine_t

Coroutines optionally have a setup method you write to store initial values in variables,
subroutine use call method instead.

Subroutine call method must set the caller variable. Coroutines do not have a caller,
they must be spawned.

The current kind of subroutine IS AUTOMATICALLY DELETED when it returns
as are coroutines. However, if the subroutine is a function returning an “in principle" nested
subroutine this must not be done. There’s another class

curry_subroutine

for this which returns control without deleting itself so that the nested routine
can refer to it: in this case the nested routine has to delete it when it is itself
deleted perhaps using a unique_ptr<> type of thingo. This is yet to be
implemented.

Subroutines passed as arguments are not a problem since the parent
liftetime exceeds their scope, in this case the parent must delete it.
however the client called and passed it must NOT delete it.
AND most not pass it on to a routine that hangs onto it too long
(eg you cannot add to a data structure that lives longer that the
original parent).

The rules are obvious but hard to state precisely.
[And getting Felix type system to manage it is the real goal .. so I need to
establish them precisely and worse ,.. fiigure a way for the type system
to enforce them].

For now as usually the C++ programmer has to be disciplined and will
blow their head off if not careful.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Oct 14, 2021, 12:06:26 AM10/14/21
to felix google
In the new run time, I am struggling to figure out how async channels work.

The basic idea is simple enough. Consider an alarm clock. A fibre wants to
sleep until a certain time so it writes to an async channel connected to
the clock. The clock is managed by a service thread, and will probably
be asleep when the client writes to the channel, so the clock must
either poll regularly to see if there are any requests to service,
or sleep in a way that it can be kicked out of its sleep.

The obvious solution is to use a condition variable with a timed
wait. This means the clock will poll for requests with a certain
latency but it can be woken up with an optional signal to reduce
the latency.

Now when the timeout of the alarm has been reached, the clock
must push the waiting fibre back on the process active list.
This means it has to have access to the list. It gains this from
the client fibre, since all fibres contain a pointer to their owning
process. In addition, one or more threads of the process may be
asleep, again using the process sleep condition variable,
so the clock should signal that to reduce the delay of a polling loop.

To ensure correct termination, we also must increment a counter
of pending async operations when we begin an async request,
and decrement it when the request is satisfied and the fibre
made active again.

This is how the existing clock code works. The immediate problem
is this: how can the client wake the clock up? It cannot be done
before the request because that would be a race,
and it cannot be done after the request because that would be too late
since the request has already been serviced by that time, if that
time ever actually occurs. Therefore, it must be done BY the channel
operation.

In turn this implies that the channel knows the condition variable
the clock is using to wait for requests, so it can be signalled.
So the first issue could be expressed as:

1. HOW TO TELL THE CHANNEL WHERE THE DEVICE
END CONDITION VARIABLE IS?

however this turns out to be a problem when we now consider that
channels can be bidirectional, and can support any number of readers
and writers. There could be two clocks, for example, if the load on clocks
was high, to relieve the load, with either clock being able to service
a request.

In this case there will be TWO condition variables to signal, one for
each clock. A flexible technical solution to this is obvious: provide
a single object with a signal method to call, where the method call
can do something arbitrary such as send two signals.

We must also consider, what happens if two clients write
sleep requests to the clock? In fact, this one works out of the box.
What the clock does is remove a request from the channel when
it schedules the sleep. So it holds the fibre in an internal priority queue.
When the time comes to wake the client fibre, it knows which process
to signal because the fibre know which process it is part of.
Indeed, the clock has to put the fibre back on that process active
list. The point is that the channel itself DOES NOT HOLD THE FIBRE
after the clock schedules the wakeup on its priority queue: it is stored
in that queue. So we don’t have to go searching the the channel’s
suspension list.

But this still fails to be general. It also begs the question how
does a signaller object get created and installed? Its obvious that
only the clock itself can do this. BUT HOW IS IT PROVIDED TO THE
CHANNEL? Remember, the signalling MUST be done by the
I/O operation, it cannot be done by the client separately.

Now, we have to also look at the reverse situation, more specically
we shall consider a ticker clock. Whereas with an alarm clock
the client writes a wake up time as a request then sleeps until
woken, a ticker wakes up multiple waiting fibres at the same time,
repeatedly. For example any real time operation such as video
sync or say stepping through actions in a video game.

In this case, the direction of the I/O operation is irrelevant but
we will condier it as a read by the client which returns the actual
time stamp of the tick.

Now the ticker will work entirely differently to an alarm clock.
It goes to sleep and wakes up at a fixed time. At this point,
it examines the request channel, and removes ALL the clients
from the channel, rescheduling each one to their respective
process. It can in fact signal each one also. Note that this may
involve repeated signalling of the same process. It’s not clear
this can or should be optimised since the ticker is a standard
pre-emptable operating system thread.

The interesting thing in this case is that the ticker can IGNORE
signals from the clients. This is a second argument for a signalling
object.


So now we have a model in which the channel/endpoints have
a single device signalling object that can handle multiple devices
if they somehow subscribe to the signalling object.

We will now note, the process signalling side could be handled
by a signalling object.

But now we have not decided where the signalling object lives:
in the channel or endpoint. Currently endpoints must contain the allocator
used for the channel, which depends on how the channel was created.
The endpoints themselves must be created with the same allocator.

Worse, consider a channel which is used to send messages according
to a complex protocol. A simple example is a client server where
a request is followed by a response. The client does a write with the request,
and then does a read on the same channel to get the response whilst the
server does a read to get the request and then writes the response.

In this case, multiple connections to the channel will not work because
the client may fetch another clients write request with a read, expecting
a response. Nevertheless this is an architectural design issue: channels
of any kind have limitations.

Another issue is that even concurrent channels acting between
fibres in the same process may need to send a signal to wake
up an idle thread. Technically communication between fibres
of the same process can be asynchronous, if the process is
multi-threaded. The effect of NOT signalling is that a thread will
remain asleep and only wake up at the polling interval to fetch
a waiting job.




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 7, 2021, 9:39:58 PM11/7/21
to felix-l...@googlegroups.com, raf
I have upgraded the uniqueness checker to handle records as well as tuples.

I am also doing a new routine that checks that a variable is assigned before
use. There are two approaches:

1. Aggro: report error if there is ANY control path on which the variable
is read when dead (not assigned yet). Will give some false positives.

2. Conservative: report only if ALL control paths leading to a read
of the variable do not have assignments. Will miss some cases.

in addition, at least in this routine, taking the address of a variable
is ignore unless it is in the form precisely

storeat(&x, v)

which is an assignment (since, ultimatley, this is the ONLY way
to do an assignment!)

Note that read/modify/write on variables is an open question here.
In a single thread that’s just a load, computation of a new value,
and store (but in multi-core applications we have to deal with
atomic read/modify write liek CAS).

[RAF, you should get this email twice, once from me directly
and once from the google email list]


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 9, 2021, 1:29:39 AM11/9/21
to felix google, raf
Actually, it turns out I think that the control flow liveness analysis can be paramtrised
on the required logic, so upgrades to the routine work for both uniqueness analysis
as well as use before initialisation checks.

And looking at the code, it is also possible to bug out “unused variables”
or at least warn. I wonder if that’s a good thing.

i will note this interesting issue that impacts the analysis:

The analyses are done TWICE. Once on polymorphic code,
and once after monomorphisation.

The problem is if you have an unused unique variable, Felix will
optimise the bug away. So we need to do the analysis first,
before we do that.

This is a general problem with Felix. A lot of type checking etc
is ONLY done on used code. Functions that are in your library
but not used will not be fully checked. This is a really bad feature.




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 9, 2021, 6:10:20 AM11/9/21
to felix google, raf


> On 9 Nov 2021, at 17:29, John Skaller2 <ska...@internode.on.net> wrote:
>
> Actually, it turns out I think that the control flow liveness analysis can be paramtrised
> on the required logic, so upgrades to the routine work for both uniqueness analysis
> as well as use before initialisation checks.

Woops! No! The algorithm is actually different! It is not in fact linear!

Here’s the problem. For uniqueness, paths leading into a label are required to be consistent.
The reason is, along any path, if a variable is live it must eventually be killed, whereas
if it is dead is may be assigned to revivify it, but until that point cannot be killed since
it is already dead. These two actions are mutually exclusive. It may be an independent
record is kept by the programmer of the state, allowing a valid action, but the analysis
requires that the variable be killed on all paths: static analyses would be impossible otherwise
and the constraint is reasonable.

However this is NOT the case for the requirement a share (ordinary old) variable
be initialised before use. Consider:

stuff
if(c)goto lab1;
v = 1; // initialise v
lab1:
v = 2;
use(v); //????

here, v is live at lab1 in the drop through of the conditional, but dead if it is taken.
However, it is live before use in either case because of the v=2 assignment.

So lets try the rule, if the variable is live on one path and dead on another,
at the join, it has to be treated as if it were dead. The problem is now:

stuff
if(c)goto lab1;
v = 1; // initialise v
lab1:
use(v); //????

if we drop through first, the use of v is passed. TICK.
But now, in the branch case, it is dead. WOOPS!

So, we have to pass through the code AGAIN, this time assuming the
variable is dead, in order to fail the use(v). FAIL!

We have to re-analyse the code at if the variable is live,
and is subsequently found to be dead. BUT ..

There could be any number of jumps to a label, and each one could
have one variable dead, so we have to pass through the subsequent
code again, up to once per jump to the label. (total, once per entry
to the label including drop throughs).

We only have to do this if a variable thought live previously turns out
to be dead on another path.

The question now, is how to terminate the recursion, and, what is
the compexity of the algorithm now??


So here is the rule: for all the variables calculate

new_state = current_path_state & label_state

Now, terminate the recusion is new_state = label_state,
otherwise set the label_state to the new_state and continue.

It is clear, an upper bound on the number of times we continue
is the number of variables. the point is, logical and can only
make a live state dead. Once dead, the variable cannot be
revivified. In addition, the number of jumps (plus one drop through case)
is also an upper bound.

The complexity is thus quadratic at worst, but will in most cases
be effectively linear: it’s

O(N * M)

where N is the number of instructions, and M is the number of variables.
This is because we only redo a path if a variable is killed, and a variable
cannot come live again, at at worst we kill one variable at a time
until they’re all dead.

OK. SOLVED. ;-)




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 10, 2021, 8:11:16 PM11/10/21
to felix-l...@googlegroups.com
Just a brief summary how the flow analysis works.
For both uniqueness and init before use (and unused variable later) analysis.

The routines work by simulating execution.
We have an object called a continuation, which has a list of instructions
yet to be executed. It also maintains a set of live variables.

To process and instruction, we first examine the expressions in the instruction.
Since these are all, logically, just function applications, we are mainly interested
in the variables passed as arguments to the functions. Passing a variable
as an argument constitutes a use of the variable, so we chuck an error
if any variable passed as an argument is dead at this time.

Felix does not allow functions to modify non-local variables.
Therefore we don’t have to worry that calling a function will change
a variable from dead to live state. in the uniquness analysis,
However, reading a uniquely typed variable does change the variable
state from live to dead! So we have to do that in that routine.

Now there is one further complication. If the function we are calling is
a CHILD it can access a variable of the caller, even if it is not
passed as an argument. So we have to handle that.

Note: we DO NOT handle accessing variables via pointers, EXCEPT
in the special case of assignment, which is a call to the _storeat primitive,
and then only if the address taking is immediate, like so:

_storeat (&var, val)

which is an assignment of val to var. S

Now before I go on, there’s another compilcation, and, I believe
academic articles on uniqueness checking GET THIS WRONG.
If a variable has product type, we ADD a set of variables,
one for each component. I call these “fairy variables”. For example if

var x = (a,b);

there are THREE variables here: x, x.0, and x.1. Since products
can contain products, each variable is represented by

(a) the variable index in the symbol table
(b) a path of indices to a component

This means variables can be COUPLED. In other words,
if you assign to x, you are also assigning to x.0 and x.1.
however you can also assign to just x.1 and not x.0.

To handle this we maintain liveness states of the fairy variables
ONLY. Then, we break down reads or writes to parents to
a set of individual reads or writes.

for example, if we do:

var x = box (1,2);

then x is uniquely typed, and so, we consider the x.0 and x.1
only, and consider them *separately* as uniquely typed.
If we try to do a read on x, we’re reading BOTH and so both
must be live. But we also allow reading just ONE component,
which has to be live, but the other component could be dead
without triggering an error.

Now the INTERESTING bit. In the first order our continuation
might hit a subroutine call. if it is a call to a non-child we can
ignore it (after processing the arguments!)

But if it is a child, it can READ a variable of interest.
To handle this we pass around a STACK of continuations.
Which is exactly what a real machine does. So our simulation
is very similar to real execution in this sense.

When we hit a return statement in a procedure, we’re
basically doing the same as a real return: we pop the
stack, and continue processing the new top of stack,
which is the caller’s continuation. Which, remember,
is just a list of instructions to be executed.

Similarly, a goto instruction is processed by simply changing
the list of instructions of the current continuation to the list
following the label. We cheat, by also adding two dummy labels,
one for the first instruction of a procedure, and one for the last.
So on entry to a procedure we create a new continuation,
and just “goto” the entry point after adjusting the current
continuation to bypass the call instruction when we return,
and a return is like a goto, followed by popping the current
continuation.

The interesting case is a CHEAT. If we have a conditional
goto, we have TWO paths to follow. So, we pick one of
them to follow now, and PUSH the continuation of the other
one onto the stack to do later. When the current contuation
finally returns, it pops back to the path we didn’t explore yet.

Now, the interesting stuff is PATH MERGING. When
you do a goto, you might have already processed the code
at that label. To know we keep a list of visited labels.
It is a mutable field, and, when we bifurcate on a conditional
goto, the list is SHARED.

So, in the uniqueness typing, when we hit a label a second
time, we require the current liveness state to be the same
as the first time, otherwise we BUG OUT.

In the use before initialisation check, it is harder!
in that case we logically and the variable liveness states
on first and second entry where liveness is true if live or
false if dead. Actually the encoding ONLY tracks which
variables are live in a set, so the logical and is a set intersection!

If the new state is then different to the old state, we have to
CONTINUE PROCESSING THE INSTRUCTIONS A SECOND TIME.
Or a third or fourth time!

Since the number of variables involved is fixed, and with use before
initialisation checking, a we’re ONLY interested in if we use a dead
variable, and the logical and operation can ONLY kill variables,
then the number of times we do this is bounded by the number
of variables, and we MUST reach a fixpoint. Therefore the
analysis MUST terminate.

I think it is interesting that what would appear to be more difficult,
the uniqueness analysis, because the liveness state is required
to flip-flop between live and dead, is linear, whereas the use before
initialisation analysis, in which variable states can only proceed
monotonically at a merge, from live to dead, and can only change
on assignment from dead to live, is in fact more complex!






John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 18, 2021, 11:57:20 PM11/18/21
to felix google
Not sure quite what happens here:

chip buffer
connector b
pin backtrack: %<char
pin filein: %<char
pin charout: %>char
{
var gotchar = false;
var ch: char;
C_hack::ignore(&ch);
chip reader
connector con
pin chan: %<char
{
while true do
ch = read con.chan;
gotchar = true;
done
}
….

Without the C_hack, this gives an error when ch is used, because it isn’t
set on all paths. This is because the while loop is equivalent to

start:>
if not cond goto finish;
loop_body;
goto start;
finish:>

so it is possible in general that the branch to finish is immediately
taken, and the loop body assigning to ch is never executed ..
even though in this case the condition is true, so the branch
to finish should have been eliminated. Any which way, this is ALSO a problem
in another setting:

//$ Reverse elements of an array.
fun rev[T, N:COMPACTLINEAR] (x:array[T, N]): array[T, N] = {
var o : array[T, N];
var n = len x;
C_hack::ignore(&o); // to fool use before init
if n > 0uz
for var i:size in 0uz upto n - 1uz
call set(&o,n - 1uz - i, x.i)
;
return o;
}

The problem here is, if n <=0 we just return uninitialised variable o.
Actually that’s FINE because the type will be unit, and the variable
should (eventually) be elided.n The return should reduce to returning
the empty tuple ().

But, after monomorphisation, then the WHOLE FUNCTION should be elided
also. Functions returning unit cannot be any use.

I need to track this down. Both cases should have been eliminated.
Note, the checks are ONLY being done post-monmorhpisation.

Unfortunately the debug info doesn’t show the whole function after monomorphisation,
it shows the original source. So I’m not 100% sure what is actually there :-)


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 19, 2021, 10:48:36 AM11/19/21
to felix google
So, the loop problem is bad. Note that in Felix a basic loop is NOT a block.
So you can do this:

while cond do var x = 1; done
return x;

because basic loops just do gotos. You have to use begin/end to make
blocks:

begin
while cond do
begin
var x = 1;
end
done
end


The outer begin/end encloses the whole loop, the inner one just
the loop body. This matters in similar for loops because of where
the control variable is defined.

Note, we HAVE to use such constructions to avoid problems with
service calls inside loops (service calls cannot be done with anything
on the machine stack other than the scheduler return address).

However the basic problem is more fundamental:

var x :&int; // uninitialised
while cond do x = 1; done


return x;


Now, *I* know that the loop always executes, but the compiler does not.

in the serialise case, I fudged it by initialising the naughty address variable to NULL.
Actually that MAY have been a read error! The routine actually cannot handle an
empty string input! Now, it returns a NULL address if it gets one, which is no
worse than before. Still, the counter involved should have a pre-condition that
says its not zero.

Problem is that would not help, not even a branch on zero would help, skipping
the loop, because that would require data flow analysis to determine the loop
body would always execute once.

The bottom line here is that if yo have a loop and return some value
afterwards that the loop assigns, you need to put a dummy initialisation
in to stop the use before init routine barfing.

I used

C_hack::ignore(&varname);

to do this. It fools the routine into thinking varname is assigned.
But this is really ugly!

NOTE THAT THE ROUTINE IS WORKING AS EXPECTED HERE.
I DESIGNED IT TO ENSURE INITIALISATION STATICALLY.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 19, 2021, 12:39:31 PM11/19/21
to felix google
Here’s another one.

At present, the algo doesn’t handle structs or unions. In fact BOTH are
represented by cstruct.

But with a union, you only assign ONE of the fields, and that
initalises the whole union. The algo knows nothing about this.

Variable e defined at
/Users/skaller/felix/src/packages/gui.fdoc: line 213, cols 7 to 25
212: var clock = Faio::mk_alarm_clock();
213: var e : SDL_Event;


SDL_Event is a union with an internal tag, represented as a cstruct.
This screws up in ALL the GUI code.

Otherwise all the regression tests now pass.

There is one more case, in the GUI code and probably in the
Felix tools built, involving a record which itr thinks isn’t initialised.
It’s weird because pointer is taken it seems not to see.

Variable cfg defined at
/Users/skaller/felix/src/packages/flx_profile.fdoc: line 143, cols 5 to 27
142: var profile = FlxProfile::dflt_profile();
143: var cfg : config_type;
***********************
144: copy_profile &cfg profile;


I mean there it is, the address is taken on line 144.
This thing has over 10 fields, and it thinks the fields are not initialised.
But the address taking should cause all the fields to be considered initialised
as well as the top level variable.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 19, 2021, 1:01:45 PM11/19/21
to felix google
This one is weird but may be related to the cfg problem:

Share init error: Using uninitialised shared variable
(82869:->fmacs)
Variable fmacs defined at
/Users/skaller/felix/src/packages/buildtools.fdoc: line 778, cols 7 to 48
777: // Felix platform macro
778: var fmacs = ([cmd.os+'/([^/]*\\.flxh)']);
******************************************
779:


In instruction xs<78278> := fmacs<60928>varname;
Detected at/Users/skaller/felix/src/packages/lists.fdoc: line 723 col 5 to line 730 col 6
722: //$ Convert a list to a stream.
723: gen iterator (var xs:list[T]) () = {
724: while true do
725: match xs with
726: | Snoc(t,h) => xs = t; yield Some h;
727: | #Empty => return None[T];
728: endmatch;
729: done
730: }
731: }

Obviously, the variable fmacs IS initialised! It’s a list!

I wonder if this is related to currying …??



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 19, 2021, 1:07:17 PM11/19/21
to felix google


>
> Share init error: Using uninitialised shared variable
> (82869:->fmacs)
> Variable fmacs defined at
> /Users/skaller/felix/src/packages/buildtools.fdoc: line 778, cols 7 to 48
> 777: // Felix platform macro
> 778: var fmacs = ([cmd.os+'/([^/]*\\.flxh)']);
> ******************************************
> 779:
>
>
> In instruction xs<78278> := fmacs<60928>varname;
> Detected at/Users/skaller/felix/src/packages/lists.fdoc: line 723 col 5 to line 730 col 6
> 722: //$ Convert a list to a stream.
> 723: gen iterator (var xs:list[T]) () = {
> 724: while true do
> 725: match xs with
> 726: | Snoc(t,h) => xs = t; yield Some h;
> 727: | #Empty => return None[T];
> 728: endmatch;
> 729: done
> 730: }
> 731: }
>
> Obviously, the variable fmacs IS initialised! It’s a list!
>
> I wonder if this is related to currying …??


NO NO NO! This is a REAL ERROR! Yea!!!!

println$ 'Copying platform macro ..';
for pattern in fmacs perform
CopyFiles::copyfiles(cmd.repo/'src'/'config', pattern,
cmd.target_dir/cmd.target_bin/'lib'/'plat'/'${1}',true,cmd.debug);

…..

// Felix platform macro
var fmacs = ([cmd.os+'/([^/]*\\.flxh)']);


See? The variable really is assigned AFTER use!



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 21, 2021, 12:59:22 PM11/21/21
to felix google
Ah, I have tracked down this one:

var clock = Faio::mk_alarm_clock();
var e : SDL_Event;
// dummy first event
e&.type <- SDL_FIRSTEVENT.uint32;

An SDL_event is a cstruct, which represents a C union.

But the problem is coding of the assignment:


(apply((prj0:RWptr(SDL_Event,[]) -> RWptr(uint32,[])), &e<64016>ref) :>> Wptr(uint32,[]))
<- (_ctor_uint32<65478>primfun SDL_FIRSTEVENT<65350>varname);

Now, there is clearly an address taken of variable e in there.
The problem is, I’m tracking fairy variables. The detector sees that there is
a projection of the pointer, and tries to say there’s a write to that.

This would be a disaster for a union. But that is not the problem!
The problem is that Felix is using a TUPLE PROJECTION on the cstruct.
Presumably it does that for structs as well. But structs are not split
into fairies at the moment (the split is done only for tuples, small arrays, and records,
because I haven’t passed in the symbol table to find the fields of the nominal types).

So, the algo is not finding a fairy for the specified projection.
Unlike records, struct projections use the field NUMBER not its name
so we use a proj projection not an rproj.




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 21, 2021, 4:55:40 PM11/21/21
to felix google
Arggh. So now, the use before init passes all tests and rebuilds Felix without
error so it’s time to turn it into a hard error checker.

In the process I have found (but not corrected) bugs in the uniq usage algorithm
too. The detector needs to be fixed.

Both need test cases. I have yet to figure out how to automate tests that are
supposed to fail. The output error messages are subject to change and there’s
no comprehensive set of error codes. However the compiler does have a -e switch
which causes a 0 return if there’s an error or 1 if there isn’t, designed for an
error/no error detection.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 21, 2021, 4:59:36 PM11/21/21
to felix google
Except for this weirdo ..

tools/flx_build_prep -> flx_build_prep
Share init error: Using uninitialised shared variable
(82869:->fmacs)
Variable fmacs defined at
/Users/skaller/felix/src/packages/buildtools.fdoc: line 778, cols 7 to 48
777: fpcs += cmd.os/'([^/]*\\.fpc)';
778:
******************************************
779: // setup header files to copy


In instruction xs<78278> := fmacs<60928>varname;
Detected at/Users/skaller/felix/src/packages/lists.fdoc: line 723 col 5 to line 730 col 6
722: //$ Convert a list to a stream.
723: gen iterator (var xs:list[T]) () = {
724: while true do
725: match xs with
726: | Snoc(t,h) => xs = t; yield Some h;
727: | #Empty => return None[T];
728: endmatch;
729: done
730: }
731: }

In function with code
2497 instructions, too many to list (limit 20)


Now, there was an error here before (a real one) I fixed. But the fix doesn’t
seem to have taken. Note the bad variable is fmacs, but the source is showing
fpcs. I did a make extract. This shouldn’t be possible..


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 24, 2021, 9:24:25 AM11/24/21
to felix google
So the use before init routine now runs on both pre-monomorphisation polymorphic code
and also optimised monomorphic code without barfing, however some hacks like this:

C_hack::ignore(&x);

are required in some cases to fool the algorithm into believing the variable
x shown is initialised. Not all the code is hacked so some cases will arise
later probably. In some case, you should just do an actual initialisation even if
one isn’t required.

There are some cases I don’t understand. One of them is

tools/flx_build_prep -> flx_build_prep
Share init error: Using uninitialised shared variable
(82869:->fmacs)
Variable fmacs defined at
/Users/skaller/felix/src/packages/buildtools.fdoc: line 778, cols 7 to 48
777: fpcs += cmd.os/'([^/]*\\.fpc)';
778:
******************************************
779: // setup header files to copy


In instruction xs<78278> := fmacs<60928>varname;
Detected at/Users/skaller/felix/src/packages/lists.fdoc: line 723 col 5 to line 730 col 6
722: //$ Convert a list to a stream.
723: gen iterator (var xs:list[T]) () = {
724: while true do
725: match xs with
726: | Snoc(t,h) => xs = t; yield Some h;
727: | #Empty => return None[T];
728: endmatch;
729: done
730: }
731: }while true do
done

this is just nonsense.

The main problem is when dynamically something is assigned, but statically
there’s a path where it isn’t. For example

for var i in 0..9 perform &x.i <- 1;

initialises an array x, but the algo cannot see that, since i is a variable.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Nov 24, 2021, 9:27:56 AM11/24/21
to felix google
I am now removing assignment from the compiler completely.
The storeat instruction must now be used.

Currently, assignment allows an expression on the LHS.
It also adds match cases in many places in the compiler, especially
in the use before init algo. This just complicates things.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 5, 2021, 8:53:11 AM12/5/21
to felix google
Felix currently supports uniqeness types using a combinator:

uniq T

and an coercion

box x: T -> uniq T

to construct one. When a boxed value is stored in a variable, the variable
is said to be live. No actions at all are allowed on a boxed variable,
other than to move the box out of it, leaving the variable in a dead state.

For example

var x = box 1;
var y = x;

At this point, variable x is dead, and variable y is live and contains the
box originally found in cx.

A box can be passed to a function, leaving the holding variable, if any,
dead:

var x = box 1;
var n = f x;

The function acquires the box, it now resides in the function’s
parameter, and the variable x is dead.

A variable must be dead when it goes out of scope otherwise
the box will be inaccessible and the resource its contents refer
to will leak. So this function is invalid:

fun f() { var x = box 1; return 1; }

because the box is lost, and so its contents are also.

In order to use the object refered to by a boxed value,
the the box has to be removed. This is done by the unbox operator.

var chp = boxed (malloc[char] 42);
fun clear (q: uniq &char, int n) {
var p = unbox q;
memset (p,char 0, n);
return box p;
}
chp = clear chp;

Here the function clear accepts the box, making chp dead,
then it unboxes it, clears the referent of the previously
wrapped up pointer, reboxes the pointer and returns it,
resulting in the variable being made live again due to the assignment.

Now we must be clear of the contracts involved. Unboxing a reference
is always safe because we OWN the reference uniquely. We can do
anything with it, including modify the referent, deleting it, or sharing it.
We can do this even in a pure function, because any effects cannot be
observed since we have been *promised* we’re the sole owner.

Passing a box to a function is NOT transparent: it changes the state
of the source variable. But the function itself remains pure.

Boxing a value is not automatically safe. A programmer boxing
a value is *asserting* sole ownership. In the function above,
the variable q is sharable. It is in fact shared (aliased) by passing
it to memset. However we know memset forgets it on returning.
So after memset is called, p is the sole alias again, and we are
justified in boxing its contents, because by the time the box
is possessed by another variable,m the function has returned
and the shared reference p is forgotten.

Note the proof here demands not only that the variable being
boxed is indeed solely owned, but it will not be used after
the boxing. A counter example:

var p = unbox q;
var r = box p; // OK
println$ p; // This is safe but ..
delete p;
return r; // WOOPS

It’s safe to do the println$ because it only BORROWS p.
So even though r is boxed, asserting its contents are unique
reference, the original unboxed sharable reference is indeed
shared, violating the contract. However the violation is benign
because println doesn’t reshare it.

Even the delete is perfectly fine, because we have not USED
the variable r yet, But when we return it, we have lost control
and it now happens to contain a dangling pointer..

THE POINT is that boxing is unsafe and the programmer
is responsible for using it correctly.

Note that using shared values is intrinsically unsafe!
We can delete an object even if we are not the sole owner.
This is why we might us refcounting or GC.

============

Now, a unique value can be safely passed to a function expecting a shared value:
the unbox operation is inserted implicitly. This is because uniq T is a subtype of T.
This also impacts nested uniques in tuples, for example.

Furthermore, a nested function can access a unique value and this is tracked
by the uniqueness checker if the nested function is explicitly called.
Note that a function can only make the parent uniq dead, since it cannot
assign to it, since functions aren’t allowed to have side effects.

Indeed this is one reason we need to think about borrowing!
But we’ll come to that in a minute. More of a problem is if a nested
function which accesses a parent unique is

(a) passed as an argument to another function
(b) returned

Note that if we did explicitly call a nested function which uses
the uniq, the uniq becomes dead, so we can only do it once
unless we relived the variable in between.

Felix creates closures by passing a pointer to the parent stack frame
to the child. So the issue is the same now as addressing a uniq.
The current system simply ignores all these cases. We don’t know
how many times a closure is called, and we don’t know what happens
to a pointer to a unique variable: is it used to read or write the variable and
if so how many times and in what order? We don’t know, without global
data flow analysis which tracks aliases.

============

Now the next issue: at present a List::map rerturns a uniq list.
So if we just say

var x = map f lst;

x is unique and cannot be shared. This breaks old code
written before we had unique. I have fixed the library to say

var x = unbox (map f lst);

to get back the old behaviour but only a few functions so far in the
library use uniq. There’s no problem if the variable is typed:

var x : list[int] = …

because the unboxing is implicit due to subtyping rules.
But its very inconvenient. At one time you could say

once x = …

and it meant the x was uniq. There was no uniq type constructor then.
We could allow:

shared x = …

to force the unboxing of uniqs, and

uniq x = …

to either require the RHS to be uniq OR to box the RHS if it is
not. in the latter case it is a contract assertion, rather than a type
constraint. So I don’t like that. I prefer it be an error to say

uniq x = 1; // ERROR

you have to say

uniq x = box 1; // OK

The idea is just that you’re expecting a function to return a uniq and should get
an error if it doesn’t.

===============================

Now FINALLY to borrowing. The idea of borrowing is simple enough:
it is a pain to move a uniq to a function, and force the function to move it
back in the return value AND the client to assign it:

var p = box ...
def var n, p = f p;

We have to use “def” here because it unpacks a tuple and then
allows the components to be assigned OR to initialise a new variable.

This is ugly. The idea of borrowing is that a box contents are copied
and put i a transparent wrapper which is passed to a function
WITHOUT making the variable dead. The function can then use
the referent provided it FORGETS it before returning.

in other words this is equivalent to making the variable dead,
then on return relivening it (without an assignment), Since
nothing in a single thread of control can happen to the variable,
unless the called function does it (since the caller is suspended),
this should be fine EXCEPT for pointer aliases (as always).

So lets think about

borrow T

type. If a parameter has this type a uniq or shared value
can be passed safely, if uniq it is NOT made dead.
Now the function has to unwrap the parameter to make
it accessible as usual but is then required to FORGET it
before returning. Mutation is still acceptable.

A borrowed value can be passed to another function.
It is accessible again after that function returns.
The function has to abide by the same rules.
So an unboxed value can NOT be safely passed:
the caller must be sure the function does the right thing.

Borrowed values can NOT be implicitly made shared!
Unlike uniq parameters we don’t own the referent.
On the other hand, a uniq value can always be loaned
safely, provided the borrower abides by the contract.

So uniq T is a subtype of borrowed T.

Again, a shared value can be passed safely where a borrowed
value is expected. So T is subtype of borrowed T.

SUBTYPE RULES:

uniq T -> T -> borrowed T

Note that by transitivity

uniq T -> borrowed T

Now we need some value operators. To access a borrowed
value, we can try “unwrap”. unlike unbox, this operation
is UNSAFE. This means it imposes a responsibility on the
unwrapper to FORGET the unwrapped value.

unwrap: borrowed T -> T

Wrapping is safe:

wrap: T -> borrowed T
lend: uniq T -> borrowed T

These operations are safe and can be implicit.

==============================

SUMMARY.
==========

The system I propose is based on the contract programming method.
Unlike systems like Rust, UNSAFE operations are intrinsic to the design.
An unsafe operation imposes a responsibility to obey the contract on
the user of the unsafe operation.

The safe operations, on the other hand, assume the provider did
the right thing.

The type system has two roles. First, it provides the safe implicit coercions
based on subtyping rules. Secondly, auxilliary algorithms must track
liveness states.

Borrowed values must be forgotten. This means they cannot be put on the heap.
As usual, pointers to variables, by either addressing or closure formation
are not tracked.

A borrowed parameter expresses a contract assertion that the borrowing
function will forget the parameter and it’s contents. For a local
variable, borrowed type implies that the variable will not be used
after the variable it borrowed from (if uniq) is killed.

There are thus two kinds of contracts: those that are enforced
by the compiler, including those that are involve implicit safe
coercions, and those that impose constraints on the programmer
to ensure the resulting system is sound.

UNLIKE other systems, the type system does NOT ensure soundness.
Rather, it partitions the responsibility, so that it is responsible for
enforcement of parts of a contract, and the programmer the other parts.

The resulting system is SIMPLE and the programmer responsibilities
MANIFEST. The programmer should be able to prove their part
of the contract is adhered to, so that the compiler type checks,
which prove the other parts are adhered to, result in a sound system.

All this with a caveat that the compiler type system cannot track
aliases. Explicit addressing is MANIFEST so use of the addresses
is again the programmer responsibility.

Unfortunately, CLOSURE FORMATION is NOT manifest (explicit)
and is therefore difficult for both the compiler and programmer
to account for.

Still .. something is better than nothing. We do not support
throwing out expressivity just because the resulting system
allows coding which is hard to prove correct. Restrictions just lead
to work around which are even harder to prove correct.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 6, 2021, 12:10:33 AM12/6/21
to felix google
So this now works as expected …

/////////////////
// check borrowing
proc check() {
// explicit wrapping
fun f(x: _borrowed int) : int => unwrap x;
var x = wrap 1;
println$ f x;
println$ f x;

// implicit wrapping of shared
var y = 1;
println$ f y;
println$ f y;

// implicit re-wrapping of uniq
var z = box 1;
println$ f z;
//println$ f z;
}
check();
/////////////////

There are two problems I found. If we remove the last comment,
the program fails AS EXPECTED with a uniqueness error because
I haven’t edited the uniqueness checker yet. So borrowing a uniq
still kills the variable when it shouldn’t.

The same code at the top level works: I believe this is a known bug.
Top level code should be included in an _init_ routine and checked
but the *variables* at the top level are NOT locals of the _init_
routine, but instead get put into the thread frame object.

The uniqueness checker can’t handle this at the moment.
It’s not the only routine in the compiler that doesn’t handle globals
correctly. The problem is, the routine checks every function, and looks at
the local variables of that function. But _init_ doesn’t have any.

In fact, ALL functions can get at the thread frame, but it is ignored
in all of them. This is a design issue. The problem is we have no idea
of the liveness of a global variable, except in the _init_ routine, where
it starts off dead, and then is modified by calls to other functions.
These functions are siblings of _init_, not children.

Calls to siblings are ignored because they cannot access a sisters
locals.

A solution to this is to make everything a local of _init_.
So not only are variables children of _init_ but every function
is also nested in it. But this is really bad and not tenable because
in every program, type class functions would be children of different
parents (rather than being top level).

The best solution is to BAN top level variables that are not shared.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 6, 2021, 2:59:47 AM12/6/21
to felix-l...@googlegroups.com
So now this works:

proc check() {
// explicit wrapping
noinline fun f(x: _borrowed int) : int => unwrap x;
var x = wrap 1;
println$ f x;
println$ f x;

// implicit wrapping of shared
var y = 1;
println$ f y;
println$ f y;

// implicit re-wrapping of uniq
var z = box 1;
println$ f z;
println$ f z;
kill z;
}
check();

In the third case, f borrows z, so z remains live, and has to be killed
by, for example, the kill operator (which does nothing but it uses up
the variable z which kills it).

This only works if the function domain is a borrow type, it does not
currently work if its a tuple with one component being a borrow type.
That is, something like:

fun g(x:int, y: _borrow int) => ….

will not work. It should work if we called

var z = box 1;
g(1, z)
kill z;

The problem is, the argument tuple has STOLEN z. It is then
borrowed by g. Then the tuple evaporates .. and its uniq component
is not used up. So it’s an error. the kill z is also an error because
z was stolen so is already dead. To make this work we would have to do this:

g (1, loan z)

This is so inconsistent I think you should have to always say to loan:

f (loan x)

should be required. This simplifies the checking because now there’s only
ONE function which accepts a uniq without killing it (namely the loan function).

I think loan should ONLY be applied to an actual variable though.
You can box any expression, you’re boxing a value. However
you can only loan a variable. If the argument is an expression, and loan
returns a borrowed type, the original unique cannot every be used,
so is an error.

This means “loan” is similar to address taking. I just removed a “uniq”
operator that worked like that. Heh. So actually loan should be intrinsic
operator not a function. Otherwise we have to check its argument is a variable.

Hmm. But the issue is that we decided uniq t is subtype of t which is a subtype of borrowed t.
Which means the coercion should be implicit.

The funny thing is, there IS or should be at least an explicit coercion injected
by the compiler. So actually I don’t need to check a call to a function, to see
if the parameter type is borrow. If it is the uniq is already coerced to a borrow.


hmmm .. rethink ..



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 6, 2021, 7:00:46 AM12/6/21
to felix google
Ok so this now works:

var a = box 1;
fun g(x:int, y: _borrowed int) : int => x + unwrap y;
println$ g (1,(loan a));
println$ g (1,(loan a));
kill a;

I had to make loan an operator. You can loan any variable.
If it is uniq T, the loan is borrowed T, if it is borrowed T, it stays borrowed T,
and if it is plain T, it becomes borrowed T.

It translates into a coercion exactly as if you had written one, for example:

loan a —> (a :>> _borrowed int)

In direct cases like

var z = box 1;
println$ f z;
println$ f z;
kill z;

the compiler already inserts a coercion, so you can but do not need to loan z
in this case. however tuple formation preserves type, so using a uniq variable
as a tuple component makes the component uniq, killing the variable.
Since functions always accept a single tuple argument, a loan is always
required if a variable is a component except for a unary tuple which in Felix
is just the original value.

I have to review handling of projections .. another email.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 6, 2021, 1:21:22 PM12/6/21
to felix google
So the handling of projections is weird. This works:

var b = box (1,2);
println$ b.1;

but this doesn’t:

var b = box (1,2);
println$ b.1, b.0;

because b is used twice. This works:

fun f(x: _borrowed int) : int => unwrap x;
var b = box (1,2);
println$ f b.1;

The point is there’s special handling of projections of boxed products.
This works:

var b = box (1,2);
var c : int = b.0;
println$ c;

The projection is consuming the box, but the result of the projection is NOT boxed.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 7, 2021, 8:54:16 AM12/7/21
to felix google
Now it’s all wrong.

You cannot do ANYTHING with boxed values except pass them to a function
expecting a boxed value EXCEPT THAT unboxing is implicit, so you can actually
apply a function expecting a shared value too.

However, overloading will see the boxed value. Therefore this should not work:

var x = box (1,2L);
println $ x.1;

because projection formation requires the argument x to ba a tuple, but it isn’t.
If you explicitly form a projection

var p = proj 0 of (int * long);
println% x.p;

it doesn’t work either because a projection is a special term, it is not a function.
But it can be used like one. BUT

var p = proj 0 of (int * long);
fun apl(x:int * long) => p x;
println$ x.apl;

This one WORKS due to the implicit casting away of boxing.

I REALLY got confused by this working:

var x = box (“Hello”);
println$ x.0;

That WORKS! It prints “H”. It took a while to remember that

s.0

means the first char of a string so the x is implicitly unboxed and
passed to the string indexing function! This works too:


var a = box ("Hello");
println$ a.(1..5);

Nasty Gotcha!

borrowed values, on the other hand, act sanely. There’s no implicit conversion
FROM a borrowed value. However they can be lent to a function expecting
a borrowed value, including “unwrap” which returns the value which was wrapped,
which can then be used freely provided the value isn’t saved after function return.

I was confused because the uniqueness tracker drills down into uniq values whic
are products. However, there’s no way to actually trigger that, at least on a get.
For sets there is weirdo special handling in the unificatioin engine, so subtyping
of the uniq->shared kind is respected by pointers (normally pointers are invariant).




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 7, 2021, 9:10:39 AM12/7/21
to felix google
So just to clarify the rules:

1. If you box a value you’re promising it is unique reference to something.

2. If you accept a box, you can assume it is unique. You can pass it
on or unbox it. You can do anything with the unboxed value, because you own it.
in particular, a function can modify the referent WITHOUT this being considered
a side-effect (because no one else can observe the modification)
The one thing you CANNOT do with a boxed value is nothing!
You can delete it, but you cannot ignore it because you’re the sole
owner and that would be a leak.

3. You can lend a box to a function expecting a borrowed value.
In this case the state does not change. The function accepting it
can NOT modify it because it is not the unique owner.
So a modification would be a side effect.

It can also NOT store it on the heap. It is required to forget it,
so on return, the original variable remains the unique owner.
Note, you can lend a shared variable too. Lending is always safe.
The constraints apply to the borrower.

Note you can ignore a borrowed value .. that’s certain to ensure
you forget it!

Note, a borrowed value is opaque so you have to unwrap it explicit.
This is unsafe operation, because it imposes the constraints.

4. A procedure borrowing a value CAN modify the referent.

So if you have a boxed pointer to a C array, a function like

var up = upcase (lower)

can modify is argument if it is boxed, but NOT if it is borrowed.
On the other hand a procedure

upcase(loan s)

can modify the string.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 14, 2021, 6:50:24 AM12/14/21
to felix google
So I find an issue with the uniq/share/borrow system.

Subtying is required be transitive and all paths semantically equivalent.
But consider:

U -> S // move
U -> B // copy no move (actually move and put back automatically, or, copy and forget)
S -> B // copy no move (who cares?)

The problem is the composite

U -> S -> B

moves the original U so it is dead. Whereas U -> B notionally kills it, but only temporarily.
So the U dies in the composite but not the direct operation.

I.e. U -> S ->B != U -> B contrary to requirements.

Note U -> U also moves, S -> S copies, and B -> B copies.

The subtyping coercions are all safe of course.
Unbox is safe, loan is safe, box and unwrap are not.

No idea how to solve this.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 16, 2021, 1:03:30 AM12/16/21
to felix-l...@googlegroups.com
There’s also another worrying issue. In the literature a substructural type system such as
a linear or affine system, is augmented with sharable values by marking them explicitly,
the notation used is usually

!x

This allows the normal intuitionistic logic which allows sharing to be applied to a
subset of variables.

In Felix, we started, historically, with shared variables and added uniqueness types
so that if T was a sharable type, then uniq T was a unique version of it. Recently
I also added _borrowed T as well.

Note that I have chosen to use ordinary type constructors in the compiler, and put
the logic into the type system, so it applies to ANY expression, not just variables.
The usual entity we are interested in is going to be a pointer. Note that the notion
of ownership and uniqueness applies to the reference or pointer, saying that it
owns the object it points to, that is, it is the only pointer to that object.

A key problem with this representation is that no operations can be performed
on the uniq T value, because the pattern matching in the compiler hits a brick
wall “uniq”. For example a product type

A * B * C

will have projections

proj 1: A * B * C -> A
proj 2: A * B * C -> B
proj 3: A * B * C -> C

but none of these functions can be applied to

uniq (A * B * C)

To actually use a uniqly typed value, we have to “cast away” the uniq
from the type. This can be done with a coercion, or with the unbox
operator on an expression, which is just the required coercion:

unbox: uniq T -> T

So for example if

up: uniq (+char)

is a uniquely owned pointer to a C char array, to access the array we have to do

var p : +char = unbox up;

and now p can be used to do things with the char array. The coercion is safe
because we owned the char array, in the sense we’re the only one that
has a copy of a pointer to it. The key point is that even though p is NOT
uniquely typed, it is STILL the only pointer to the char array. In fact,
the pointer `up` is no longer accessible so that claim is very true!

As you know the operation

box: T -> uniq T

is unsafe. It is justified only when we know, by local reasoning,
that we actually have the only pointer to the object. In fact,
writing the conversion is an assertion that this is the case,
on which the owner of the boxed object is entitled to rely.

A key property, somehow related to separation logic, is that if
we have two boxed values, they refer to separate objects.
Therefore modifications of one will not interfere with the other.

There is a similar operator:

unwrap: _borrowed T -> T

which is unsafe in the sense that a borroed objects must be forgotten.
The unwrapping is an assertion we will comply with that requirement.

The machinery therefore splits safety responsibility into two parts:

(a) a part the compiler guarrantees
(b) a part the programmer must guarrantee

so the type system is “conditionally sound” in the sense that
it is sound only if (a) the compiler algorithm is right tracking
moves and (b) the programmer meet6s the contract requirements.

Unlike, say, Rust, where the compiler ensures full compliance.
[in fact in Rust to get anything done you have to use code marked
unsafe to disable the checking, which comex to the same thing,
achieved by a different contract]

I am happy with uniq being opaque, that is, blocking ALL operations
OTHER than moving the object. Note that “unbox” does indeed do
a move.

i am NOT happy that borrowed values have to be unwrapped.
A function like

int strlen(p: _borrowed +char) …

should just be able to use p as if it were shared, because, there are only two
ways to use a borrowed value:

(a) loan it to yet another function
(b) unwrap it to do some stuff on the underlying object

The problem is that this code makes a copy of p.

var q = unwrap p;

so that p remains of borrow type, and the value is NOT moved,
so that q is a copy. Of course you can always do this

somefunc ( unwrap p )

and unwrap locally .. but you’d have to do that on every use.


The explicit coercion is there for a reason: the operation is unsafe.
But we already KNOW that from the type of p.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 23, 2021, 6:26:45 PM12/23/21
to felix google
I’m having lots of problems with _borrowed types. Indeed _uniq types are bad enough.

At the moment, the idea is if you have a function like

f(_borrowed T)

then if u has _uniq T type then

f(u)

will just work and not kill u. However most functions in the library were designed
with shared types in mind (because there was no uniq or borrow stuff).

I fudged println so it takes type _borrowed T. This got rid of a lot of errors.
The library at the moment makes `new T` return a unique pointer and also
some list operations like List::map return a unique list. But now this fails:

var b = darray[int]();
iter println of (int) b;

Here’s the actual type overload is looking for:

iter[] of borrowed((int)) -> void,darray[int]:TYPE

which of course fails, since println of (int) has type int -> void,
not _borrowed int -> void. In fact println has type

_borrowed T -> voi

for all T. There’s no println of (int) in the library.

AN ALTERNATIVE:

1. Get rid of borrowed types altogether. This also solves the
inconsistent subtyping problem “in vaccuuo” :-)

2. Make the user write

u.loan

which now returns a plain T.

3. The onus is on the user writing this to only call a function
accepting a shared type but which actually does forget it.

Note without the “.loan” the same function will be called!
At least most of the time, lol. Because of the implicit coercion
from _uniq T to T.

The difference is when you use “loan” the variable isn’t killed,
and when you don’t, the value is moved and the variable killed.

Note that, loan will fail if there is a function accepting only
a uniq, and call the shared variant of it if both uniq and shared
version are available. So in this case loan does impact the semantics
by changing the overload.

HMMMM…..


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 23, 2021, 8:35:35 PM12/23/21
to felix google
Yea. It isn’t going to work. I overloaded println:

proc println[T:BORROWED with Str[T]] (x:T) { fprintln (cstdout, x); fflush cstdout; };
proc println[T with Str[T]] (x: _borrowed T) { fprintln (cstdout, x.unwrap); fflush cstdout; };

This handles both shared and borrowed types. So if the argument is unique you can
lend it. The problem is, now I have to do the same for fprintln. Which calls write.
So I have to do the same for that and so on.

The problem is, borrowed is a supertype of shared but is inaccessible until unwrapped.
The kind BORROWED is universal, it doesn’t mean borrowed, it means borrowed or
shared or unique.

The reason the first T must be BORROWED is that otherwise unification fails:

T1 :TYPE < _borrowed T2

would unify with T1 = _borrowed T2, but it fails because T1 only has kind TYPE.
so cannot be set to _borrowed T2. It has to be elevated to kind BORROWED to make
this work. But T will never be set to _borrowed anything, because if the argument
is borrowed, the second overload will be selected.

Using a combinator for borrowed unfortunately means that to use it with any function
defined for only shared, and given borrowed is a super type of shared, and explicit
coercion (unwrap) is required to use the borrowed value.

In practice a LOT of functions are borrowing. Only a few are actually going to
put the argument into a heap data structure so it persists past the lifetime
of the function. But then if we annotate them as borrowing, we have to provide
unwrap the parameter every time.

The existing code solution has the same problem: you can already borrow a variable
by taking its address, which the uniqueness analyser ignores. But then you’re passing
a pointer, and have to dereference it to get at the value. Same issue with slightly
different operators. Indeed if you want the function to just use the value and it’s
shared, you’re probably wanting an overload taking shared, since that works
for some expressions.

Rethink required ….


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 23, 2021, 9:13:35 PM12/23/21
to felix google
So my original idea was that if you call a function with a uniq variable,
it is borrowed. The only time it is moved is if the function accepts a uniq type.
This means, the existence of uniquely typed variables will be transparent,
that is, all old code will continue to work.

To actually leverage the uniqueness you need to force a move by adding
an overload to a function that has a uniquely typed parameter. This will
be selected if the argument is unique, because it is more specialised.

The copying without killing is unsafe because the borrower thinks it’s ok
to put the thing on the heap. But the real problem is that there’s no way to tell
if a variable should be moved or copied on a given call, without examining
the called function type. The problem is that whilst this is easy, it is NOT
so easy if you pass a variable *containing* a unique type.

Throwing away uniqueness is only safe if, in doing so, a unique variable
is actually killed (made to go out of scope, or, at least the kill traced by
the uniqueness analyser).

Hum.

John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 24, 2021, 10:41:13 PM12/24/21
to felix google
We know composition of functors is just substitution. Felix has the C++ problem which I need
to fix, demonstrated here:

// forward functor composition (1 argument)
typedef fun \circ (f:TYPE->TYPE, g:TYPE->TYPE):TYPE->TYPE =>
fun (T:TYPE):TYPE => f(g T)
;

Amazingly this actually works:

typedef fun lst(T:TYPE):TYPE => list[T];
var x : lst int = ([1,2,3]);
var y: lst (lst int) = ([x,x,x]);
var z : (lst \circ lst) int = y;

y and z are the same type, a list of lists of int.

We note that in the forward compostition, we apply g to the input type,
to produce and intermediate type which is the argument of f. In C++ terminology,
g effects a partial specialisation of f.

In the example, f’s type argument T is specialised to lst int.

Now, here’s the problem: f might have more than one type argument.
if it has 2, it’s called a bifunctor. For example a `map` data structure has
a key and value type, so it’s a bifunctor. Another one is

typedef fun pair(A:TYPE, B:TYPE):TYPE => A * B;

Here’s another type of functor:

typedef fun diag(A:TYPE): TYPE * TYPE => A, A;

Notice the result of this one is a pair of types, aka a type tuple.
Notice `pair` maps a type tuple to a tuple type.

Of course we can easily write a composition routine:


typedef fun \circx (f:TYPE * TYPE->TYPE, g: TYPE->TYPE * TYPE):TYPE->TYPE =>
fun (T:TYPE):TYPE => f(g T)
;

which, BTW, does NOT work (ARGGH) and also, since a type function is a non-function to Felix,
cannot use the same name.

How can we generalise? We obviously need kind variables


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 25, 2021, 6:38:11 AM12/25/21
to felix google

This is now working … need an extra beta-reduction to make a pattern match succeed ..

////
typedef fun pair(A:TYPE, B:TYPE):TYPE => A * B;
typedef fun diag(Y:TYPE): TYPE * TYPE => Y, Y;

typedef fun comp(f:TYPE * TYPE->TYPE, g: TYPE->TYPE * TYPE):TYPE->TYPE =>
fun (T:TYPE):TYPE => f(g T)
;

var a : comp(pair,diag) int = (1,2);
println$ a;
///

Now, back to the point.

We can compose T->T*T with T*T -> T as shown, but the original
\circ won’t do it because the KIND of the original circ’s domain is

(TYPE -> TYPE) * (TYPE -> TYPE)

but the comp function has domain

(TYPE * TYPE -> TYPE) * (TYPE -> TYPE * TYPE)


what we actually require for composition is

(K -> J) * (L -> K)

and the result is a functor

L -> J

where J,K,L can be any categories. And of course we do this with
a kind variable (a type variable one level up that varies over kinds).

The kind variable, like a type variable, needs a constraint, for type variables
it;s something like TYPE, here it would be KIND.

KIND is not a kind. It’s a sort. :-) That’s the next level up after kinds.

Now actually, this could go on forwever, so we want to actually use TYPE,
and use context to know we really mean KIND.

Otherwise I have to do all the stuff again, unification, etc, and infinitely. :-)

So maybe this syntax:

typedef fun compose[J,K,L] (f: K->J, g :L -> K):L -> J =>
fun (t:T):T => f(g(t))
;

The kind variables could be qualified with a sort, the default is SORT,
and it’s the only sort at the moment ..though we’ll surely have SORT -> SORT popping
up and the whole story again (ad infinitium).


Note that the inner anonymous function binds to its parent context, otherwise
f,g wouldn’t be defined.

A polykinded function (wll that’s polymorphism one level up, right) has to be named.
Polymorphic functions have to be named too. Anonymous function cannot be polymorphic.
This is because there has to be a symbol table entry as the container of the children.

Felix has no way to do this:

var f = fun[T] (x:T) => x;
println$ f 1;

At least one problem here is that f is a variable and variables can’t really be polymorphic.
But more to the point, polymorphic types do not have a lambda form: for that you have
to use a functor. That is you cannot write

[T:TYPE] T * T

you can of course write:

typedef two[T:TYPE] = T * T;

but not

type[T”TYPE] (T * T)

You can, of course, if you use a type function, but otherwise you cannot have standalone
polymorhpic types.

Now the problem is a typedef like two above:

typedef two[T:TYPE] = T * T;

creates a symbol table entry two, with type parameter T and a body.
Typedef fun makes a symbol WITHOUT parameters, defined to be a lambda:

typedef fun f(…);

is the same as

typedef f = fun (…);

So a typedef entry has a slot for type variables, in the symbol table,
but we really cant steal that for kind variables because Ocaml would barf.




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 28, 2021, 10:13:43 PM12/28/21
to felix google
>
> So maybe this syntax:
>
> typedef fun compose[J,K,L] (f: K->J, g :L -> K):L -> J =>
> fun (t:T):T => f(g(t))
> ;
>
> The kind variables could be qualified with a sort, the default is SORT,
> and it’s the only sort at the moment ..though we’ll surely have SORT -> SORT popping
> up and the whole story again (ad infinitium).


So I now have a fully general concept for this. It is pure category theory, it is trivially simple,
and it is applicable to all high level programming languages including Felix, Haskall and Ocaml.


We will start with a basic purely functional model because it’s easier. There are many extensions.

Initially, we have values of some primitive types, I will use programmer notation rathe than
academic notation because it is easier to write in an email. We have some kind of expression:

x + 1

and an environment:

val x: int = 2;

which allows an actual evaluation. We use lambda abstraction to *move* the environment
into the code:

(fun (x:int) => x + 1) (2)

and beta-reduction to evaluate it:

2 + 1

and then builtin operations to actually do the calculation:

3

Apart from the primitives, we will use to combinators:

(x,y,1,4) // tuple value
int * int * int * int // tuple type
int * int -> int // function type
fun (x:int, y:int) : int => x + y // function constructor

The idea is all these forms are LEVEL INDEPENDENT but the primitives are LEVEL DEPENDENT.
In other words:

(int, int)

is a valid term, it’s a type tuple. There’s an issue with

()

What level is it? Let’s leave that for a moment. The rules are that * and -> must involve
terms at the same level. Now, our function has abstracted values but there is a requirement,
the parameters of the abstraction must be typed. And our function is, well BORING.

It provides reusability .. and please note this is the HOLY GRAIL. It is the entire point
of PL design. It means increased chance of correctness, and increased optimisation
opportunities and most importantly, the programmer has more time to make coffee.

Lets make our function MORE REUSABLE and LESS BORING:

fun (T:TYPE, U:TYPE) (x: T, y:U): U * T => y,x

Now, our function is parametrically polymorphic. It works for any types.
To do this we added another lambda binder, one level up: a binder which
generalises the types.

But what is that “TYPE” thing?? Well that’s a meta type, or kind. The rule is,

ALL VARIABLES AT LEVEL N MUST BE TYPED AT LEVEL N+1

Now note, the notations here are level independent so we can also write
type functions, not just ordinary functions:

fun (T:TYPE, U:TYPE):TYPE => U * T

Note this has the kind

TYPE * TYPE -> TYPE

Our notation is going to get unreadable unless factored so I’m going to allow this:

let name:expr<n+1> = expr<n>

and also allow the classifier on the LHS to be dropped, for example:

define tswap: TYPE * TYPE => TYPE = fun (T:TYPE, U:TYPE): TYPE => U * T

so we can more conveniently write

tswap (int, string)


and perhaps

let .. in ..

This is just sugar (for the moment). We’re putting back the environment we abstracted away
with the lambas and reified with beta reduction.

Now I ran into this one in Felix. I want to define composition of any (suitable) data functors:

define compose =
fun (J:KIND,K:KIND,L:KIND)
(f: K->J, g :L -> K):L -> J
=> fun (T:L) (t:T):T => f(g(t))



This is saying if you have a data functor g from L to K, and a following functor
f from K to J, you can compose them to get a functor from L to J which we
can define by substitution effected by beta reduction using eta-expansion.

Notice two things. First, KIND is a sort, and what we’re saying is J,K,L can be
ANY category, that is, any type level expression such as

TYPE * TYPE

This is the whole point of the kind variables J,K,L, that we allow say bifunctors like `tswap`
to be composed. Here is something we could compose with:

fun diag (T:TYPE):TYPE * TYPE => T,T

the diagonal functor (aka “copy” or “dup”).

Notice also, in the implementation, the eta-expanded variable t, must not only have
a type T, but, since we introduce type variable T, we have to give it a kind too,
namely L. Of course, we can us compose like this:

val x : compose (TYPE, TYPE * TYPE, TYPE) (tswap, diag) (int) = 1;


I think i got that right :)

Notice, the pair (tswap, diag) is a pair of *functors*.

There is no reason to stop at SORT level. Although I can’t think of a use for meta-sorts
at the moment, However one of the critical contentions here is that you can write
general formulas which are level independent. So they apply at whatever level you like.
We can certainly eliminate lower levels, for example, tswap is s pure type function, it
does not metion any concrete types or values. So at the moment I think you can
define a function from any higher level down to any lower level.

The level of a term or sub-term has to be defined by the primitives involved,
for example

1, int, TYPE, KIND

There are issues with levelless terms like ().

In general some annotations can be defaulted away, and some can be deduced. For example


val x: compose (tswap, diag) (int) = 1

should be enough, because the kinds of tswap and diag are recorded in the environment.
This makes overloading possibility.

SUBCATEGORIES
===============

Subtyping is an essential part of some languages. It is generally considered to
be some kind of “is a” notion. In set theoretic terms we want an embedding,
and category theoretic terms probably at least a monic operator.

Felix REQUIRES not only sub-typing but also sub-kinding. Here is the scenario:

In Felix, we have arrays notated:

T ^ N

As you know, and array is an exponential of the same general kind as a function:

N -> T

However they have different implementations. Functions are coded by composition
of other functions. Arrays are define element by element. In some sense they’re
intrinsically functions with a finite domain.

Now, usually we think of arrays as linear and N is just some integer constant,
but NOT in Felix. In Felix, the N there is a type:

int ^ 3

is idential to

int * int * int

in Felix, however 3 is identical to

1 + 1 + 1 = unit + unit + unit

which is a structurally typed standard enumeration with 3 cases. So in the first instance:

T: TYPE, N:UNITSUM

are the kinding constraints on array types, where UNITSUM is the kind with the types

0, 1, 1+ 1, 1 + 1 + 1 …..

in it. I write 2 = 1 + 1, 3 = 1 + 1 + 1, etc. Note

bool = 2

is a UNITSUM. So now we NEED a subkind of TYPE, to specify the constraint;

define array = (T:TYPE, N:UNITSUM) => T^ N


In fact, the actual constraint is

N:COMPACTLINEAR

which is more general. A compact linear type is 0, 1, or any compact sum or compact product
of compact linear types. This formulation allows for polyadic array acces and array shapes.
The point is now we have


UNITSUM is a COMPACTLINEAR is a TYPE

So now, we we do deduction or “type checking”, which means also kind checking and sort
checking etc etc .. our unification algorithm must also handle subcategories.

Unification is actually simple, we do a standard unification, and recursively, one level
up, also uniify the kind annotations on type variables, etc ..

The thing to note is Felix has TWO product functors not one:

U * V // standard tuple, addressable components
U \* V // compact linear product

Compact linear product functor has kind

COMPACTLINEAR * COMPACTLINEAR -> COMPACTLINEAR

Of course this is just the bifunctor. The run time product formed here packs
the compact linear values into a single machine word, and projections
use division and modulus operations, instead of machine addresses.

And now, we get into really high level stuff, because we can certainly
“unpack” any compact linear product to an ordinary one. And vice
versa IF the argument type satisfies suitable constraints ..


BUT THERE IS NO POSSIBLE WAY TO ENCODE THIS
EXTREMELY HIGH LEVEL OPERATIONS WITHOUT AN EXTREMELY
HIGH LEVEL CATEGORICAL KINDING SYSTEM.


As a final word, Haskell is clearly a LOW LEVEL LANGUAGE.
It has a rudimentary and incapable kinding system. It is NOT capable
of expressing high level operations such as those described above.

I content, Haskell can be equiped with BOTH linear and uniqueness
typing, for example, but you can NOT do this without a high level
programmable kinding system. The rules for substructural logics
cannot be applied “generically” if one doesn’t know the kind of
a type variable.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Dec 29, 2021, 7:59:06 AM12/29/21
to felix google
So .. attempts to actually implement this were a disaster.

The problem is a lot of stuff assumes symbols can be qualified by type variables.
But we have also kind variables now.

At present, there is a huge mess with type aliases. The unbound term is just

SYMDEF_type_alias of type

wihich is in some symbol table, eg a class or function, with a name and type variables.
To make matters messy there are TWO entities here:

1. A mapping from a integer index to the symbol definition
2. Possible two mappings from names to integers
1. Private
2. Public

The definition is:

(** The symbol type. *)
type t = {
id:Flx_id.t;
sr:Flx_srcref.t;
vs:Flx_types.ivs_list_t;
pubmap:Flx_name_map.name_map_t;
privmap:Flx_name_map.name_map_t;
dirs:Flx_types.sdir_t list;
symdef:Flx_types.symbol_definition_t;
}


The names inside a function, eg parameters, local variables, etc are only
in the private map. Inside a class, unless marked private, names are
in both the public and private maps.

These maps are not plain name -> int. First, there are two kinds:

1. NonFunctionEntry
2. FunctionEntry

A non-function entry is for non-functions. Duh. They must be unique. You cannot
have two variables in the same scope with the same name.

type entry_set_t =
| FunctionEntry of entry_kind_t list
| NonFunctionEntry of entry_kind_t

type name_map_t = (string, entry_set_t) Hashtbl.t

The function entry maps the name to MULTIPLE functions.

Now here’s the really trcky bit:

type entry_kind_t = {
(* the function *)
base_sym: bid_t;

(* the type variables of the specialisation *)
spec_vs: Flx_kind.bvs_t;

(* types to replace the old type variables expressed in terms of the new
* ones *)
sub_ts: Flx_btype.t list
}

So what we have here is that a name maps, not directly to a symbol, but
WITH A SPECIALISATION. For example

class X[T] { fun f: T -> T; }

might be in the symbol table but you might say

open[U] X[U * U];

and now f is available with type

f: T * T -> T * T

So the current scope has an f mapped to entry kind:

base_sym = 9942, spec_vs = U, sub_ts = U * U

where 9942 is the index of the definition of f in class X in the symbol table.

The PROBLEM is that whilst, for non-parametric symbols, we just have an empty
specialisation .. for type functions, we have KIND variables not type variables.

Now, this is just the beginning. In theory, typedefs just define aliases.
So these should be “factored out” by the time we’re binding code.

Alas, no. A HUGE mess is created, because there an alias just adds
name to a structural type, so the name just gets replaced by it.

Except for recursion. Typedefs can be recursive. The problem is
that with type schema like

typedef list[T] = 1 + T * list[T]

the recursion is easy, but what if we specialise T?
I mean we can easily have

vector[list[T]]

and now the recursion is inside the type indexes of the type vector.

So it ends up I have to temporarily made a bound symbol table entry for
nominal type aliases, in order to allow for recursion, because the recursion
requires lookup, if the recursion is not self recursion:

typedef A = … B
typedef B = .. A

Recursions can be resolved, but only self-recursion can be resolved in
one step. It’s a mess.

NOW ADD KIND VARIABLES TO THE MIX.

In principle its simple.

In practice Felix lookup is indeteminate. in other words some lookups
simply cannot work. Things like

typeof (expr)

seem innocuous. But they’re extremely hard to deal with because, to find
the type of expr, we first have to BIND the expression, which involves overloading
AND it may recursively depend on itself!

var x : typeof(x) = x;

Hmmm…. :-)



--
John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jan 1, 2022, 11:00:46 PM1/1/22
to felix-l...@googlegroups.com
So I am thinking to rebuild Felix from scratch (writing code in Felix).
The main problem is the parser. Dypgen will be almost impossible to implement.

The basic idea is to use pure categories. So for example

type int : TYPE;

declared “int” to be a primitive type which is an object of category TYPE. An arrow would
be a type function:

fun swap = (x:TYPE, y:TYPE): TYPE => y * x;

For sums types:

fun first:TYPE + TYPE -> TYPE =
| case 0 of 2 (U) => U
| case 1 of 2 (V) => V
;

This is the general form. To make this work, we need builtin operators:

product (..,..,..) // infix *
coproduct (..,..,..) // infix +
exponential (A, B) // A -> B
tuple (a,b,c)

So the first two operators take 0 or more arguments. The notations are

1 = product()
0 = coproduct()
product (A,B,..) = A * B * …
corpoduct (A,B) = A + B + …


These operations are generic. A uniform product is a functor

TYPE ^ N -> TYPE

where TYPE ^ N is the product of N copies of TYPE = TYPE * TYPE * TYPE …

The N can be deduced from the number of arguments if the functor is applied.
The categories can be deduced from the argumets if it is not a uniform functor.

Obviously, if a functor has a product argument, it can be applied to a corresponding
tuple. For example:

fun swap(A:TYPE, B:TYPE):TYPE => B * A;
val x : swap(int, double) = 4.2, 1; // double * int

Compostion and application of functors are directly related (by the Yondeda lemma I think):

(f \circ g) x = f (g x);

In particular

f x = (f \circ 1_x) ()

where 1_x is the functor 1->TYPE which picks out the value x, called the
characteristic function of x. So any application seqence can be converted
to a composition sequence applied to the unit value, and, any composition
can be turned into an application by eta expansion.

This means evaluation can be performed starting at the end of a composition
chain applied to a () and applying it, then apply the next on the left to the result,
until we end up with a value. However optimisations work best on compositions
because composition is associative .. and it’s then “native” operation of category
theory.

As well as types, we have kinds. The main kind has sort KIND.

We also need to have subcategores (i.e. subtyping). As an example in Felix:

UNITSUM < COMPACTLINEAR < TYPE

These are embeddings, that is, the subtyping monomorphism is from
a category X which is a subcategory of Y, and maps each object and arrow to
itself. In CT, we would be happy with any monic, that is, an isomorphicm
from a category P to a subcategory of the target. However this requires specifying
the isomorphism (an ambedding of a subcategory in the category it is sub of does not).

When (in particular) an arrow (fun) is specified, there may be “type variables”.
This adds an extra level of lambda abstraction. For example

fun (K1: SORT, K2:SORT): KIND =:>
fun (T1: K1, T2:K2) : TYPE => T1 * T2
;

Here two different kinds (which are values of meta-sort SORT), specify
the kinds of the two type variables T1 and T2. The motivating
example is:


define compose =
fun (J:KIND,K:KIND,L:KIND)
(f: K->J, g :L -> K):L -> J
=> fun (T:L) (t:T):T => f(g(t))


What I wanted was a rule to compose any two data structures, and that’s it.

For example a map of lists of pairs has three type parameters:

compose (TYPE* TYPE, TYPE * TYPE) (map, list, pair) (int, string, double)

is a map keyed by an int, of lists of pairs of of string and double. Here, the
kind arguments can be deduced from the specified functors map, list, pair,
and indeed type checked. Er .. well kind checked. The resulting
functor has three arguments, and is applied to create an actual type.




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jan 2, 2022, 7:03:16 AM1/2/22
to felix google
One of the things I find confusing is that if you write say

type vector[T] = “vector<?1>”;

then vector is NOT a functor, it’s an indexed type. Note it’s a nominal type.
Another case:

variant opt[T] = | Some of T | None;

Again, opt is NOT a functor, its an indexed type. Polymorphic nominal types
are always indexed types.

An indexed type like “opt” represents a collection of types, that is,
it’s a type scheme. It has no stand alone validity, ity has to always
be “applied” to some indices. However you can always define a functor:

typedef fun vector_t(T:TYPE):TYPE => vector[T];

You cannot compose indexed types, however you can *substitute*:

vector[vector[int]]

which is equivalent to composition (composition and application BOTH
work by substitution).


Now, in the case of C++ types, we cannot monomorphise them in the front end.
This has to be done in the back end code generator. So, in this special case,
we must carry the monomorphised indices along to the back end with the schema.

OTOH, type functions have to be eliminated very early, before overloading in fact.
Whereas, type schema *participate* in overloading. For example

fun f[T] (a: vector[T]) => ..
var x : vector[int];
f x; // T is deduced to be “int”

In some sense, schema are “generative” in that they generate types, whereas
functors are applicative, in the sense they *calculate* types (that already exist).

Still I cannot properly explain this stuff. In particular I don’t have a cool category
theoretic explanation for the distinction.


John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jan 2, 2022, 8:12:39 AM1/2/22
to felix google
Hi John,


Looking at LambdaPi might help if you haven't already done so. I have an implementation in TypeScript here that runs using Deno.


There are references to the original implementation paper in the code.

LambdaPi is a nice simple dependent type system, which I believe corresponds to a locally Cartesian closed category.

This paper has some more details:

https://mroman42.github.io/ctlc/ctlc.pdf


Cheers,
Keean.


--
You received this message because you are subscribed to the Google Groups "Felix Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-languag...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/felix-language/EF0142A7-9D16-4FCA-85A3-1C22A7F2D007%40internode.on.net.

John Skaller2

unread,
Jan 2, 2022, 10:59:54 AM1/2/22
to felix google


> On 3 Jan 2022, at 00:12, Keean Schupke <ke...@fry-it.com> wrote:
>
> Hi John,
>
>
> Looking at LambdaPi might help if you haven't already done so. I have an implementation in TypeScript here that runs using Deno.
>
> https://github.com/keean/lambdapi

But no examples..

>
> There are references to the original implementation paper in the code.
>
> LambdaPi is a nice simple dependent type system, which I believe corresponds to a locally Cartesian closed category.
>
> This paper has some more details:
>
> https://mroman42.github.io/ctlc/ctlc.pdf

Good paper. But as usual CT notation bends the brain :)

I’m trying to find a general syntax for functions. The actual evaluation is trivial, its just
substitution, possibly with some variables fixed using unification.

The idea is simply that when every you have a variable you have to give it a type:

fun (x:int) => …

But, we want polymorphism so:

fun (T:K) => fun (x:T) => e

You should read “fun” as “lambda:. The type variable T which types x, must itself be typed,
here with kind K. And if K itself is polymorphic, we have to announce the sort of kind variables therein.

It’s all head-normal form here. For deduction the type of e can be used to deduce T which can be used to
deduce K, maybe. The general form can start at any level like meta-meta-sortt and proceed to any lower level.

In the simplest system, the encoding can use the same unification engine at all levels, and the same
constructors (product, tuples, etc), without level annotations, provided the level is homogenous and
populated. It’s not clear what level (), the empty tuple is, or even if it matters.

Initially, I’d have no deduction. There would be some way to specify opaque entities
like the sort KIND, or the kind TYPE, or the type int. Plus subtyping rules.


Making this work in Felix is going to be very hard. It goes easily to kinded type variables
where the kinds are expressions of constants, but getting kind variables in there is
very tricky.

There’s no dependent typing involved. In general the base level of that, involving
only simple integer relations, is easy enough (see ATS2). But once you go past that,
you have to write difficult proofs as part of your program.

I have a half-belief that instead of levels, you can just do the “typing” with ANY category.
For example the category NAT, one object, an arrow for each n meaning to add n, so a
path is just a summation, can be a run time integer, a compile time integer, a type,
a kind, whatever. If you say

T: NAT

where T is a type, then NAT is a kind, but if you say

V: NAT

where V is a value, then NAT is just “unsigned int”. That’s more general than
a striated (levelled) system. Encoding it will be tricky. I’d probably have to do the
simpler system first, which will get more general by removing the repetition.

Anyhow at the moment, I could probably implement the basic calculus easily
in Ocaml or Felix, but integrating it INTO Felix is another story. The compiler
was designed to be polymorphic from the ground up (unlike other languages),
but that just means type variables not kind variables. The problem is you have
a heterogenous list of the variables at the head, with the most general, then going down
hill until you reach plain old typed function parameters (or stop earlier).


John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jan 2, 2022, 3:02:10 PM1/2/22
to felix google
Hi John,


Not sure if I am missing your point. The lambda calculus vending machine shows how simple it is to extend LambdaPi to embed any Pure Type System:


As for general function notation, giving every argument a type (kind etc), _is_ sufficient, but many dependently typed languages prefer type annotation to allow types (kinds etc) to be omitted.

So simply value arguments have types, type arguments have kinds, and kind arguments have whatever comes after kinds :-)

The restriction is then all 'sort' names are unique (type names cannot be used for kinds etc).

There is a notation where the 'kind' of 'type' is '*' (star) and the kind of kind is '[]' (box) also written '?' for ASCII keyboards.

We don't need level annotations, and we can be polymorphic over levels when using 'type/kind' variables, although LambdaPi itself is not and we have to write:

id = \(T:*)(x:T) -> x

So the identity on integers is:

(id Int)


Keean.


--
You received this message because you are subscribed to the Google Groups "Felix Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-languag...@googlegroups.com.

John Skaller2

unread,
Jan 2, 2022, 7:31:28 PM1/2/22
to felix google


> On 3 Jan 2022, at 07:01, Keean Schupke <ke...@fry-it.com> wrote:
>
> Hi John,
>
>
> Not sure if I am missing your point. The lambda calculus vending machine shows how simple it is to extend LambdaPi to embed any Pure Type System:

I’m sure. The problem is Felix compiler is written in Ocaml and is statically typed.
The compiler make provision for type variables. There’s a combinator:

type type_t = ….
BTYP_type_variable of index * kind_t

So I need this:

type kind_t =
KIND_kind_variable of index * sort_t

if I want the kind_t term to contain kind variables. This is easy enough. The problem is
that the compiler takes parsed terms and decouples them into

name, typevariables, scopedictionary, symboltableentry,

or something like it. For a function the scope dictionary is the local symbol table
of the function. The PROBLEM is that the typevariables are type variables. ONLY.
I cannot put kind variables in that slot, because they have the wrong static type.




>
> https://crypto.stanford.edu/~blynn/lambda/pts.html
>
> As for general function notation, giving every argument a type (kind etc), _is_ sufficient, but many dependently typed languages prefer type annotation to allow types (kinds etc) to be omitted.

Sure, but the syntax, and indeed, the ability to deduce types, will come after solving the main problem,
which is to actually get kinds, sorts, etc .. into the compiler so the information propagates.

The compiler actually does this:

1. Parse to Scheme
2. Evaluate Scheme
3. Translate resulting S-expressions into raw Ocaml terms
4. Desugar the raw terms into a more systematic presentation
which separates declarations from definitions
5. Build the global symbol table and change the terms so
local name spaces (eg function locals) are presented as maps
to the global symbol table
6. Binding. Make a new symbol table in which names are replaced by
discrete keys (actually just integers). This is an extremely complex phase
because it contains the stuff that does overloading. Felix has “setwise” lookup
meaning all symbols are defined recursively within a scope
7. Monomorphisation. Remove all type variables. Instantiate type classes.
8. Optimisation….blah blah backend C++ generation.



The problem is that there are 8 DISTINCT calculi there for the same terms depending on
the compiler phase.

>
> So simply value arguments have types, type arguments have kinds, and kind arguments have whatever comes after kinds :-)

Sorts come after kinds :-)

>
> The restriction is then all 'sort' names are unique (type names cannot be used for kinds etc).
>
> There is a notation where the 'kind' of 'type' is '*' (star) and the kind of kind is '[]' (box) also written '?' for ASCII keyboards.

Haskell uses * but it is totally stupid. In Felix, these kinds already exist:

UNITSUM < COMPACTLINEAR < TYPE

A unitsum is just the a simple enumeration 1 + 1 + 1 + 1 .. etc.
A compact linear type is 0, 1, or any compact product or sum of compact linear types.
These are a subkind of TYPE. The representation is an integer. Projections
are done by division and modulus.

A compact linear type term can contain a type variable of kind COMPACTLINEAR
which means the type variable can only be instantiated with a compact linear type.
Otherwise the calculus would not work.

This is also true for pointers. Compact linear products have pointers to components,
which is just a machine pointer PLUS a divisor and modulus. So three machine words,
rather than one.

So the kinding information is really essential.


>
> We don't need level annotations, and we can be polymorphic over levels when using 'type/kind' variables,

Yes, that’s what I suspected. The primitives are at a fixed level. For example a string is a run time
string. Heck, in Felix it is defined by

type string = “::std::string”;

and is not available to the compiler (because the compiler is written in Ocaml).


> although LambdaPi itself is not and we have to write:
>
> id = \(T:*)(x:T) -> x
>
> So the identity on integers is:
>
> (id Int)

Right. The notation is only slightly different to Felix.

I will note .. there is a COMPLICATION in Felix, previously mentioned:
for nominal types, including primitives, these are NOT functors but indexed types.
You can overload on indexed types, and deduce the indices (using unification)
You cannot overfload on functors. They have to be beta-reduced away before
overloading can proceed.

I am tempted to rewrite the compiler from scratch to get the kinding system correct,
but it’s a BIG job so I’m looking for a way to retro-fit kind variables.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jan 2, 2022, 8:11:44 PM1/2/22
to felix google
In particular the problem starts here in the grammar:

stypeparameter := sname ":" t[sarrow_pri] =># "`(,_1 ,_3)";
stypeparameter := sname =># "`(,_1 typ_none)";
stypeparameter_comma_list := sepsilon =># "()";
stypeparameter_comma_list := stypeparameter ("," stypeparameter)* =># "(cons _1 (map second _2))";

stypefun_arg := sname =># "`((,_1 typ_none))";
stypefun_arg := "(" stypeparameter_comma_list ")" =># "_2";
stypefun_args := stypefun_arg+ =># "_1”;

//$ Anonymous type function (lamda).
t[slambda_pri] := "fun" stypefun_args ":" stypeexpr "=>" stype =>#
"""
`(typ_typefun ,_sr ,_2 ,_4 ,_6)
“”;

All the terms t[priority] are non-terminals of the type sublanguage. So in the parser, we could use
the same terms exactly as kinds. There’s no kind sublanguage. So in principle, this grammar
already handles everything we need, provided we can interpret it with the right “level” of
terms when the S-expressions are translated to Ocaml.

An example:

fun (T:TYPE) (U:COMPACTLINEAR) =>
fun (K:TYPE) => T * U * K
;

However, this will not descend to actual functions, since the grammar
only allows a type expression after the => symbol.

At present, kind names like COMPACTLINEAR are built into the compiler and
cannot be user defined, there is no lookup table. Types, however can be named.

So there is a clear issue, even with the grammar, it doesn’t descend to “run time”
functions. Like this one:

fun swap[T:TYPE,U:TYPE] (x:T, y:U):U * T => y,x;

This works, but the type arguments (and they cannot be kind arguments), are indices
to a type schema, so swap is not a “functor” or whatever. We would like:

swap = fun (T:TYPE, U:TYPE) (x:T, y:U) => y,x;

and not, this is actual polymorphic function headed by a functor. The problem
is, it will not work with overloading. This works:

fun doswap[T:TYPE] (swap: T * T -> T * T, x:T, y:T) => swap (x,y);
doswap (swap, 1,2);

It deduces the type T = int. But if you use a lambda abstraction, it cannot work.
What makes it work is that swap is *indexed*. Type schemas are required for
*nominal* polymorphic types. You cannot beta-reduced into a nominal type.
Structural types use lambda abstraction because you CAN beta-reduce the
abstractions away .. which you must do *before* overload resolution.

Example of nominal type:

variant opt[T:TYPE] = Some of T | None;





John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jan 3, 2022, 3:53:00 AM1/3/22
to felix google
Hi John,


The thing with LambdaPi, and why it's interesting to look at the implementation, is that it internally collapses everything to one level, the value level. Type checking is simply evaluation followed by type equality.

Because you can use Higher Order Abstract Syntax (even in TypeScript where the type system is static, and definitely in Ocaml), type equality is simply quoting the value you get from evaluation (beta reduction) and comparing it to the other type. Strong normalisation means syntactic comparison is enough.

The value machinery is re-used for types, and there is no difference at all between levels, at the next level types are treated as values, kinds as types, etc.

I am not sure 'sorts' are the next level. The term 'sort' comes from logic (many sorted logics), and I believe is simply the logic equivalent of 'kind' in a type system. In type systems we have values, types and kinds, but this is probably better thought of in universes, where type levels are T[n] and U0 values:T0, U1 has T0:T1 etc.

I appreciate you have more kinds than '*', remember '*' is the kind of functions _only_. You can quite easily introduce more kinds (of type '[]').

I am not sure why you cannot overload on functors? All evaluation is beta reduction, yes you may optimise evaluation to be machine code instructions, but you are just emulating beta-reduction. 

So function evaluation is beta-reduction, and you can overload on function evaluation, so why can't you overload on functors? Overloading on functors would be 'kind' overloading, not 'type' overloading obviously.


Cheers,
Keean.




--
You received this message because you are subscribed to the Google Groups "Felix Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-languag...@googlegroups.com.

John Skaller2

unread,
Jan 3, 2022, 9:14:18 AM1/3/22
to felix google
I want to implement what you have described I think, however my calculuse is a bit richer.
Lambda only have functions A -> B. I have products and sums too (and later will add arrays).
And subtyping. But the model you present appears to be what I want. I’m just not sure how to
get it into Felix. Maybe I just make kinds and sorts the same universe as types.

Actually my thought is when you write

x : T

is actually means x is an object of *category* T. Since categories are abstract, there are no intrinsic levels.
I mean, a category is just structure. Indeed, types and kinds both allow products and sums and exponentials.

The real problem is that you have to get at the value level and generate code and execute it rather than
interpret it. But this can be done by a partial evaluator (basically, instead of interpreting something expensive,
generate and run code, JIT style as an optimisation).

====

I think there’s a misunderstanding about overloading and functors. If you name a functior,
you can have many with the same name, ande chose one by overload resolution based
on the argument it is applied to. Obviously you can only do this when you have an application.
In Felix, you can chose WITHOUT an application as well, by specifying the type the argument
would have. It’s at the kind level not type level for functors but the tech is the same: we use
a specialisation engine (unification with subtyping and subsumption).

The issue I was referring to is overloading an ordinary function, when the argument is
a functor application. The problem is, a functor is just a lambda term, so you cannot
“see inside”. All you can see is the functor argument kind, usually type. So you’re trying
to overload on

(\lambda x:T => we cant see in here) (argument)

All lambda applications look the same, you can’t overload on the term above.

Now, if you *beta-reduce* the application, then you can overload on the resulting type.
But that’s useless! Consider a function overloading on vector<T>, we need the T.
But the T is gone after beta-reduction.

By comparison, consider vector as a type scheme, so it is a nominal type,
then we can pick

fun f[T] (x:vector[T])

and find T, given an argument. The reason is vector is not an alias for a type
expression, its a nominal type, and the T can’t be eliminated by beta reduction.
The point is, vector is not abstract here. In Felix:

typedef fun vector_f(T:TYPE):TYPE => vector[T];

creates a functor vector_f from the type sheme, but, it is just alias.
You can’t overload on that. I mean

fun f[T] (x: vector_f T) ….
f (vector_f int)

You cannot “match” the vector_f in the function definition with the vector_f
in the function application, because they’re just aliases for a lambda abstraction.

You can match on what is left after beta-reduction, namely, vector[int] and find
that T = int. Because the “vector” part and the T part are separated.

It’s really hard to explain. When I worked for Barry Jay, we had an example
like

list of list of int

which is a type. The functor “list” is lost. you cannot fold over this thing
because you cannot fold over a type, you fold over a functor. The question
is *which functor*?? List? Or list of list?

The point is

fold x

cannot work, because the functor has been applied to the type, and is now
lost. You have to do the fold *before* you reduce: There are two cases:

(list . list) (int)
list (list int)

where . means composition. The functor in the first case is list of list,
in the second it is just a list.

The questin Barry asked was, “where is the data”? It actually depends
on where you put the parens. There are a pair of isomorphisms which
MOVE the parens, they’re called associators.

Sorry, I’m rambling.





John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jan 6, 2022, 8:42:03 PM1/6/22
to felix-l...@googlegroups.com
So I’m having yet another go at this. The new syntax for type functions is

typefun f[K:SORT] (T:K):T => T

Anonymous type function cannot have kind variables, just as anonymous
functions cannot have type variables.

This is because these variables are associated with a symbol in the symbol table.
Taken a while to figure it out.

Now, I know the problem I will have in advance now. Ordinary variables have types.
The types in the unbound symbol table are unbound terms. However, type variables
have to already have

1. indexes
2. fully bound kinds


This was possible before, because all the kinds were compiler known
constants or combinations of them. It is no longer the case because we
have kind variables.

So in order for this to work, we must have previously indexed the kind variables
and fully bound their sort. And they must be in the symbol table ALREADY before
type variables are added, otherwise lookup when binding the kinds of type variables
cannot work.

An extra pass is needed for this. We now have lookup in kinds we didn’t have before.
It’s only for kind variables (there are no user defined kinds as yet).



John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jan 7, 2022, 2:40:43 AM1/7/22
to felix google
Hi John,


This makes sense to me, and would be the same for user defined kinds, all the way up (or down depending on how you look at it). In the same way I assume you could have scoped type variables defined in the same way as value variables defined with 'let' or whatever.

I assume you need forwards references, if you don't then you won't actually need a separate compiler pass. For example in LambdaPi forwards references are not allowed, hence the symbol table is simply passed as an argument to the type checking function which does a recursive descent of the abstract symbol tree, and that also deals with scoping the definitions too. There is a single symbol table for all definitions.


Cheers,
Keean.





--
You received this message because you are subscribed to the Google Groups "Felix Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-languag...@googlegroups.com.

John Skaller2

unread,
Jan 7, 2022, 8:41:55 AM1/7/22
to felix google


> On 7 Jan 2022, at 18:40, Keean Schupke <ke...@fry-it.com> wrote:
>
> Hi John

Hi. It’s half working now.

>
> This makes sense to me, and would be the same for user defined kinds, all the way up (or down depending on how you look at it). In the same way I assume you could have scoped type variables defined in the same way as value variables defined with 'let' or whatever.

Yes, but there are no user defined kinds yet.

The current code (will eventually) look like

typefun thing[K1:KIND, K2:KIND] (T1:K1, T2:K2 * K2): K1 => … ;

Here KIND is a sort (and the only one at the moment). So the kind parameters are only used to specify the
kind of the type parameters. The ( ..) => .. part is a stanard type lambda form.

You can have an anonymous type lambda but it can’t have kind variables in the signature.
Just as you can have a functional lambda, but it can’t have type variables in it.

At the moment. The reason is, an arbitrary chain of lambdas couldn’t be parsed, unless Felix
had the “self-kinding” kind stuff, that is, any type would be a kind in the right context.
So the present model is stratified. It will do for the moment.

There’s more to be done, but the hard job was separating type aliases (which can have type indices)
from type functions (which have kind indices). Indicies go in the [] brackets. Ordinary functions can be
polymorphic too and have type indicies, as do polymorphic data types.

> I assume you need forwards references, if you don't then you won't actually need a separate compiler pass. For example in LambdaPi forwards references are not allowed, hence the symbol table is simply passed as an argument to the type checking function which does a recursive descent of the abstract symbol tree, and that also deals with scoping the definitions too. There is a single symbol table for all definitions.

Felix uses set-wise lookup. There are no declarations, only definitions.
So you can refer to a symbol defined anywhere in the current scope or any enclosing scope:

fun f() => g();
fun g() => 1;

So the RHS of f() calls g, which isn’t defined “yet”. This is done in one pass with recursion.
It’s VERY hard because Felix has overloading. But NOT for type functions yet!

The problem with overloading is to bind a function you need to know which one
is being called so you need to know the type of the argument, and you need to know
the type of the candidcate functions. Worse, the user doesn’t have to specify the return
type of a function, or the type of a variable.

So the algorithm does temporary partial binding on the fly. The partial binding
to find a function return type tries to calculate the type of the argument
of every return statement (but ignores the rest of the function code).

Oif course, since the expression involves function applications .. it has to do
overload resolution again .. recursively.

the real KILLER is this type:

typeof(expr)

It’s a real super bitch. Here we have invese dependent typing. A type is dependent
on the type of an expression, the evaluation of which in turn requires overload
resolution .. ARRGGHHHH……

Anyhow, recall I want to define a “generic” compose type function.
This was always possible for type functions like

A -> B followed by B -> C

and you could do it for

A -> B1 * B2 followed by B1 * B2 -> C

but you couldn’t use the same name for these because there is no overloading
for type functions.

That’s actually WHY the kind variables were needed. Note you can do it
already universally for ordinary functions because you have type variables.

I am really curious how KINDS interact with type classes.

KINDS are constraints on types. So are type classes.
Are these interchangeable? Could I have a kind ADDABLE which has
the same impact as a type class Addable?

In Felix at least the machinery is utterly different. KIND constraints,
like types, are checked immediately. On the other type class constraints
are only checked on momorphisation (well actually they’re not so much checked
as the compiler can or can’t find an instance).




John Skaller
ska...@internode.on.net





Keean Schupke

unread,
Jan 7, 2022, 9:33:09 AM1/7/22
to felix google
Hi John,


My understanding of the difference between type-classes and kinds is that they are similar (overlapping) but different. Kinds like types are closed, in that we must define all the types in a kind in a single declaration, just like we must define all the values in a type in a single declaration. Logically we can use negation (it makes sense to ask if a value is not in a type, or a type not in a kind). Type classes are open. We can at any time declare a new instance of a type-class, so we cannot use negation.

Further to this type-classes control overloading, so that only the type-class instances are considered for overloading, kinds have nothing to do with overloading in that sense, however you can have functions that are kind-polymorphic, which is a separate feature from kinds themselves. You can have a kind system with no overloading at all.

So kinds + kind-polymorphism would be like closed type-classes I think. You gain logical soundness, but lose extensibility.

Cheers,
Keean.

--
You received this message because you are subscribed to the Google Groups "Felix Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-languag...@googlegroups.com.

John Skaller2

unread,
Jan 7, 2022, 3:26:44 PM1/7/22
to felix google


> On 8 Jan 2022, at 01:32, Keean Schupke <ke...@fry-it.com> wrote:
>
> Hi John,
>
>
> My understanding of the difference between type-classes and kinds is that they are similar (overlapping) but different. Kinds like types are closed, in that we must define all the types in a kind in a single declaration, just like we must define all the values in a type in a single declaration. Logically we can use negation (it makes sense to ask if a value is not in a type, or a type not in a kind). Type classes are open. We can at any time declare a new instance of a type-class, so we cannot use negation.

Good thought.

>
> Further to this type-classes control overloading, so that only the type-class instances are considered for overloading,

That’s not actually what happens in Felix at all. Instances aren’t considered for overloading at all.
All the overloading is done between *specialised* type class functions, completely independently
of instances. Only after a class and specialisation is selected does the system check if there’s
an unamiguously most specialised instance. So actually, there are two stages of overloading,
one on type classes, and then again on instances.

Stage one is done with ordinary overloading. Note, the class is chosen first, then the function
within it. Stage two occurs during monomorphisation.

> kinds have nothing to do with overloading in that sense, however you can have functions that are kind-polymorphic, which is a separate feature from kinds themselves. You can have a kind system with no overloading at all.

Actually I do use kinds for overloading. For example:

fun f[T:COMPACTLINEAR] (x:T) => …
fun f[T:TYPE] (x:T) => …

f(true) // calls the first one because bool = 2, which is compact linear
f(1) // calls the second on because int is not compact linear

COMPACTLINEAR is a subkind of TYPE. This is actually the *reason* Felix needs kinds.
The problem is that the representation of a pointer to a compact linear type requires
three machine words: one word for the machine address, and a divisor and modulus.

I also now do some experiments with BORROWED and LINEAR kinds. I think I have not
succeeded in my experiments with uniqueness typing. I am using combinators.

uniq int
borrowed int

and subtyping rules:

uniq < shared < borrowed

but this is unsound because the conversion uniq to shared to borrowed is semantically different
from uniq to borrowed so it isn’t transitive. A uniq value can be shared, which moves the uniq value.
So teh original variable is put in “dead” state. A shared variable can be borrowed which copies it.
But if a uniq value is borrowed, it is copied, but remains in “live” state (and we trust the borrower
not to steal it).

All the presentations I’ve seen start with a substructural type system and mark shared types.
I’m starting with shared types and marking substructural ones.

The reason for the kinding here is that a complex data type may contain a unique one,
for example, and has kind UNIQUE, even though the actual type does not “start with the
uniq combinator”.

Im thinking of discarding my effort and switching to the uniqueness and linear typing
being attributes of pointers. So only pointers can be considered references which
are shared or unique or linear..



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Jan 9, 2022, 5:41:53 PM1/9/22
to felix-l...@googlegroups.com
Weirdness. An anonymous type function is just a type. Of course it can only be
applied. However you can typedef one:

typedef f[T] = fun (T:TYPE):TYPE=>T;

which is a serious problem. What should happen is the RHS is alpha converted

fun (U:TYPE):TYPE=>U

so there’s no conflict with the LHS T. Normally the T would be deduced by unification
on application, but it’s not clear if this happens (I’ll have to check). So you’d have to specify it.

The problem of shadowing isn’t new, you can nest type lambdas.

With type functions there’s shouldn’t be a problem becaue the scheme parameters
are kinds not types. But that leads to the issue how to detect a use:

f[k] (T)

The problem is whether k is a kind or a type: we need to know *in the parser* before
type type checking can see. So it has to be syntax driven. If k is a kind, then T must be too!
If k is a type, T must be too. But there’s no way to tell. So we’re going to need this:

f[<k>] (T)

for kinds … grrrr. In which case this would be sane:

f[<k>][int] (T)

Note,

f[<k>](K)

means to replace kind variables occurring in K with the correponding
kind in k.

HMMMM..






John Skaller
ska...@internode.on.net





It is loading more messages.
0 new messages