RTTI/GC

13 views

Skip to first unread message

John Skaller2

unread,

May 12, 2019, 11:16:32 AM5/12/19

to felix google

When Felix compiler see a heap allocation for an object which is a product A x B,
it uses ScanByOffsets technology to construct an array of offsets into the whole
objects of all the pointers, by concatenating the offset array for A with the
offset array for B with the size of A added to each of B’s offsets.

This is done recursively into in the static nesting structure of products to produce
a single linear array of offsets.

Now, for a user defined primitive lifted by C such as a C++ list of T,
we can write a scanner for C++ list which also knows where the points
in a fixed T are.

The first problem here is that there’s no way to *dynamically* pass in the RTTI
of T. The interface of a scanner is:

typedef void *scanner_t(collector_t*, gc_shape_t *, void *, size_t, int);

The void* there is the object to be scanned, with type given by gc_shape_t.

struct GC_EXTERN gc_shape_t
{
char const *cname; ///< C++ typename
::std::size_t count; ///< static array element count
::std::size_t amt; ///< bytes allocated
finaliser_t *finaliser; ///< finalisation function
ValueType *fcops; ///< first class ops
void const *private_data; ///< private data passed to scanner
scanner_t *scanner; ///< scanner function
encoder_t *encoder; ///< encoder function
decoder_t *decoder; ///< encoder function
gc_shape_flags_t flags; ///< flags
size_t allocations;
size_t deallocations;
};

There IS provision there for passing extra data to the scanner: the scanner is included
in the shape object, at it is passed the shape object as an argument, so it can get the
private_data.

So the scanner for a C++ List<T> where T is a Felix struct containing pointers,
can be done by putting the shape of that struct in the private data. The same
scanner then works for all C++ lists (the scanner, trivially, is just a loop from
begin() to end() with the iterator locating the data).

But there is no way for the user to set the private_data, and the user cannot trivially
construct as shape object. The ability to specify the data type for a container
is required. Note that the scanner is then a FUNCTOR in that it works
for all data types, that is, it is dynamically polymorphic. The problem is largely
a matter of syntax for specifying the private data. Note that some data structures
might contain more than one data type so just passing a single shape is not enough,
but an array of shape pointers is enough.

This leads to a second problem. At present in Felix, the compiler CANNOT handle
objects CONTAINING a primitive type with a user specified scanner. There
are two cases only supported: the primitive is atomic and is not examined,
or, the primitive is a pointer to an object in the Felix heap.

Pointers to C objects are ignored. If the pointer is to a C++ allocated List
containing Felix objects we’re screwed. We need to be able to scan such
containers even if Felix didn’t allocate them. There’s no “scannable but
not allocated by us” category.

But the main point is that to include such “foreign” objects in Felix data
structure you HAVE to use a pointer, and the Felix representation is then
marked “gc_pointer” so the compiler knows it is a pointer. And you HAVE
to allocate the object with Felix GC otherwise, the pointer will be thrown
out and not scanned. The point is you cannot put the expanded object
into the Felix object, only a pointer.

To FIX this, a new RTTI form is required. I’ve done a lot of work trying
to get dynamic information on types available. The “fcops” pointer
in the shape object points to this:

class ValueType
{
virtual size_t object_size_impl()=0;
virtual size_t object_alignment_impl()=0;
virtual void dflt_init_impl (void *)=0;
virtual void destroy_impl (void *)=0;
virtual void copy_init_impl(void *, void *)=0;
virtual void move_init_impl(void *, void *)=0;
virtual void copy_assign_impl(void *, void *)=0;
virtual void move_assign_impl(void *, void *)=0;
public:
size_t object_size() { return object_size_impl(); }
size_t object_alignment() { return object_size_impl(); }
void dflt_init(void *dst) { dflt_init_impl(dst); }
void destroy(void *dst) { destroy_impl (dst); }

void copy_init (void *dst, void *src) { copy_init_impl(dst,src); }
void move_init (void *dst, void *src) { move_init_impl(dst,src); }
void copy_assign(void *dst, void *src) { copy_assign_impl(dst,src); }
void move_assign(void *dst, void *src) { move_assign_impl(dst,src); }
};

For C++ objects here’s an implementation:

template<typename T>
class CxxValueType : public virtual ValueType
{
size_t object_size_impl() { return sizeof(T); }
size_t object_alignment_impl() { return alignof(T); }
void dflt_init_impl(void *dst) { ::dflt_init<T>((T*)dst); }
void destroy_impl(void *dst) { ::dflt_init<T>((T*)dst); }
void copy_init_impl(void *dst, void *src) { ::copy_init<T>((T*)dst,(T*)src); }
void move_init_impl(void *dst, void *src) { ::move_init<T>((T*)dst,(T*)src); }
void copy_assign_impl(void *dst, void *src) { ::copy_assign<T>((T*)dst,(T*)src); }
void move_assign_impl(void *dst, void *src) { ::move_assign<T>((T*)dst,(T*)src); }
};

Now here’s the point:

// object does NOT own the product description array
// should use a shared pointer thing I guess
class ProductType : public virtual ValueType
{
size_t n;
ValueType **cp;
public:
ProductType (ValueType **p, size_t m) : cp(p), n(n) {}
~ProductType();
size_t object_size_impl() override;
size_t object_alignment_impl() override;
void dflt_init_impl (void *) override;
void destroy_impl (void *) override;
void copy_init_impl(void *, void *) override;
void move_init_impl(void *, void *) override;
void copy_assign_impl(void *, void *) override;
void move_assign_impl(void *, void *) override;
};

This ValueType allows a product to be constructed dynamically.
This is dynamic typing! The constructed object can be faithfully
copied etc, the same way C++ does it: elementwise.

NOTE: these products are expanded! That is, they’re actually descriptions
of a single contiguous object, there’s no “boxing” involved. The point of
this work is that FPL’s are WRONG to use boxing, its just a cheat.
One can still have code calling a function with more than one
data type, and still copy the data, even if the data type is not
known until run time, so in particular its size isn’t known.
We just use RTTI to allocate and copy the object.

Now here’s the POINT: the GC needs to be able to handle products too.
In particular you might have A x B, with A using ScanByOffsets like a normal
Felix objects, and B being a C++ list. Not a pointer to a C++ list, an actual list.
Using a *product* form of the RTTI similar to the ValueType, we can scan
the object using TWO scanners, first the one for A, and then the one for B,
passing B the address of the object plus the size of A.

In particular the “stuff” in the shape object really belongs in the ValueType
object, and the shape object can be eliminated.

Note that we still want the compiler to GENERATE a composite RTTI object
NOT use a product if possible because it’s faster.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

May 17, 2019, 3:14:23 AM5/17/19

to felix google

OK so I’m doing the STL container stuff.

Step 1 is to generate the array of shapes required. It looks like this:
Test code:

////////////////
type vector[T] = "::std::vector<?1>"
requires Cxx_headers::vector,
property "functor"
;

ctor[T] vector[T] : 1 = "::std::vector<?1>()";
var pd = new 12.3;
var v = vector[double]();
var pv = new v;

C_hack::ignore pv;
C_hack::ignore pd;
/////////////////

Notice the property “functor” for the type. This actually means the primitive
is an STL container.

//OFFSETS for complete abstract finalisable type _poly_65300t_65365=vector[double]:TYPE instance

//FUNCTOR, arg types = double

extern ::flx::gc::generic::gc_shape_t *_poly_65300t_65365_value_type_shapes_index_65300_instance_65365[1];

FLX_FINALISER(_poly_65300t_65365)

static ::flx::gc::generic::gc_shape_t _poly_65300t_65365_ptr_map = {
"tvec::_poly_65300t_65365",
1,sizeof(_poly_65300t_65365),
_poly_65300t_65365_finaliser,
&_poly_65300t_65365_fcops,
_poly_65300t_65365_value_type_shapes_index_65300_instance_65365,
0, // scanner
_poly_65300t_65365_encoder, // encoder
_poly_65300t_65365_decoder, //decoder
::flx::gc::generic::gc_flags_default,0ul,0ul
};

Notice the array declaration for _poly_65300t_65365_value_type_shapes_index_65300_instance_65365

The name is currently long winded. The actual definition is given later:

extern ::flx::gc::generic::gc_shape_t *
_poly_65300t_65365_value_type_shapes_index_65300_instance_65365[1]=
{&double_ptr_map};

vector of double has one type argument, double, so the array has a pointer
to the shape object for double. Notice in the test code I heap allocated
a double to make sure the shape object was generated. This shouldn’t be necessary
and is to be fixed soon.

Notice also the array had to be declared extern because C is a BULLSHIT piece of CRAP.
I need to forward declare either the array of its contents OR do a topological sort assuming
the dependencies are acyclic and I’m too lazy. The actual shape objects have internal
linkage. I don’t actually need the array, just its address.

Step 2 is to ensure the relevant shapes are actually generated without having
to trick the compiler by doing a dummy heap allocation.

Step 3 is easy but depends on step 4 and 5. We need to set the scanner
to a specific instance of a generic STL container scanner. This is just
a loop using an iterator from begin() to end(), which finds a reference
to the value type of the container. We take the address and use that
to scan the object. This looks up the shape for that type in the array
at index position 0 for a normal sequential container. I used an array
for associative stores might need to scan the key and value separately
or something. Note a COPY of the key or value is just fine for most
data types because we’re not garbage collecting the key or value,
we’re only examining it for pointers. (what they point at is collected perhaps).

As an optimisation, we can perhaps check at the start if the value type
actually has any pointers in it. If not, we can skip the whole scan.

Step 4 is to write the scanner. It’s a template function and its trivial.
The compiler, in step 3, has to generate a pointer to an instance
of the template function, it depends on the container type.

We may need to “force instantiate” the template to avoid C++
bullshit with “up in the air” templates that get instantiated at link time.
Not sure how that would work with dynamic loading!

Step 5 is the UNKNOWN. Its the only problematic and hard bit.
Currently the GC has a function to scan an object at an address.
But it checks to see if there’s an associated shape first, which it needs
to do the scan. If there isn’t, it’s a foreign pointer an ignored.

This is no good. In an STL container the Felix objects are embedded
in some object the container allocates so they’re not allocated by Felix,
and must not be collected .. but they DO have to be scanned.

This is the same issue when we synthesise, at run time, a product type.
In that case we have two expanded objects in adjacent store and we
have to scan both, even though the second one is not in fact separately
heap allocated.

So I need a function “scan_only” which is given the address and shape,
it will scan the object (or subobject) unconditionally. In theory this
should be trivial, I just need to make sure I have a suitable function.
Note there’s no way to avoid duplication scanning such subobjects.
I don’t know how it can happen, but to stop it if it does requires
an extra Judy array to track it. Felix objects are marked as processed
by setting (or clearing) the low bit in the Judy array that maps the address
to its shape (the bit in the shape pointer).

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

May 17, 2019, 8:28:39 PM5/17/19

to felix google

Well! It’s done .. except for proper testing :-)

Here’s the test code:

First, we’ll use vector for our tests.

///////////////////////

type vector[T] = "::std::vector<?1>"
requires Cxx_headers::vector,
property "functor"
;

ctor[T] vector[T] : 1 = "::std::vector<?1>()";

proc push_back[T] : &vector[T] * T = "$1->push_back($2);”;

///////////////////////

We don’t need many operations! We’ll make a vector of double:
//////////

var v = vector[double]();
var pv = new v;

push_back (pv,1.2);
push_back (pv,2.3);
push_back (pv,7.97);
//////

and a vector of int

/////////
var v2 = vector[int]();
var pv2 = new v2;
C_hack::ignore pv2;
///////////

and a vector of string

///////////
var vs = vector[string]();
var pvs = new vs;
push_back(pvs, "Hello");
push_back(pvs, "World");
////////////

and a vector of pointers to vectors of string:

/////////////
var v3 = vector[&vector[string]]();
var pv3 = new v3;
push_back(pv3, pvs);
push_back(pv3, pvs);
push_back(pv3, pvs);
push_back(pv3, pvs);
push_back(pv3, pvs);
//////////

and we’ll just run the collector.

/////////
collect();
/////////

The compiler debugging says:
/////////
functor primitive: abstract type vector[RWptr(vector[string]:TYPE,[])]
functor primitive: abstract type vector[string]
functor primitive: abstract type vector[int]
functor primitive: abstract type vector[double]
////////

The generic STL container scanner (a template in the compiler header files)
has some debugging prints and says:

////////
stl_container_scanner,
loc=0x7f915d603190 shape=tvec::_poly_65318t_65479@0x10a693ad8, size=5,
value shape=rtl::address@0x10a54ef38 value_scanner=0x10a5e5710
.. invoking element scanner for address 0x7f915d603230
.. invoking element scanner for address 0x7f915d603238
.. invoking element scanner for address 0x7f915d603240
.. invoking element scanner for address 0x7f915d603248
.. invoking element scanner for address 0x7f915d603250
stl_container_scanner,
loc=0x7f915d603150 shape=tvec::_poly_65312t_65473@0x10a693a18, size=0,
value shape=rtl::int@0x10a54eda0 value_scanner=0x0
stl_container_scanner,
loc=0x7f915d602c70 shape=tvec::_poly_65309t_65469@0x10a6939b8, size=3,
value shape=tvec::double@0x10a693888 value_scanner=0x0
stl_container_scanner,
loc=0x7f915d603170 shape=tvec::_poly_65315t_65476@0x10a693a78, size=2,
value shape=tvec::_a17385t_65475@0x10a6938f0 value_scanner=0x0
////////

The cnames are understood from the header file:

//PRIMITIVE 65309 INSTANCE 65469: vector[double]:TYPE
typedef ::std::vector<double> _poly_65309t_65469;

//PRIMITIVE 65312 INSTANCE 65473: vector[int]:TYPE
typedef ::std::vector<int> _poly_65312t_65473;

//PRIMITIVE 65318 INSTANCE 65479: vector[RWptr(vector[string]:TYPE,[])]:TYPE
typedef ::std::vector<_poly_65315t_65476*> _poly_65318t_65479;

//PRIMITIVE 65315 INSTANCE 65476: vector[string]:TYPE
typedef ::std::vector<_a17385t_65475> _poly_65315t_65476;

Notice the C++ vector is scanned! You can see the ascending addresses.l
The actual scanner, minus debugging, is:

// STL CONTAINER SCANNER
template<typename C>
void *stl_container_scanner(
::flx::gc::generic::collector_t *gc,
::flx::gc::generic::gc_shape_t *container_shape,
void *location,
size_t nobjects,
int recdepth)
{
auto object_shape = ((::flx::gc::generic::gc_shape_t * *)(container_shape->private_data))[0];
auto object_scanner = object_shape->scanner;

if (object_scanner) {
auto & container = *(C*)location;
for (auto & v : container)
object_scanner (gc,object_shape,&v,1,recdepth+1);
}
return nullptr;
}

Pretty cool, that works for ALL STL sequences. I left the template specialisation up to C++.
I probably should manually instantiate it, I have no idea what hell breaks loose with dynamic
loading.

NOTE: I was worried I didn’t have a function to scan a non-felix allocated object,
including subobjects of objects (including of Felix ones). Turns out ..
the RTTI’s scanner function is exactly such a function!

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

May 18, 2019, 11:34:42 PM5/18/19

to felix google

> On 18 May 2019, at 10:24, John Skaller2 <ska...@internode.on.net> wrote:
>
> Well! It’s done .. except for proper testing :-)

So now, more serious work.

1. Consider:

type stl_vector = "::std::vector<?1>”;
proc push_back : stl_vector * t = "$1.push_back($2);";

This raises two issues. First, Felix at the moment requires the vector to be boxed.
This is because is you put a vector in product type, eg struct, a pointer, provided
declared gc_pointer, will be traced. If a vector got put directly into the struct,
the compiler, noting its a functor, is screwed, because AT PRESENT it can only
generated RTTI using scan_by_offsets scanner; the compiler generates the
offset table using its knowledge of the C++ representation of the components,
and allows them to be sequenced AND nested. So it works fine for tuples
which have, as components, other tuples: the compiler knows the representation
is effectively flat, and just uses offsetof() macro to calculate the offsets of all
pointers contained, which it knows from the type.

However for this to work with an expanded vector as a component, the
compiler has to instead use the product scanner which is given an array
of pairs consisting of the component offset, plus the RTTI pointer for its type.
Then it can scan the object, by scanning each subobject.

The compiler CANNOT do this at the moment. Apart from actually writing the
code there’s an issue deciding if all the components are scan_by_offsets in
which case we want to use a *single* scanner with concatenated offset tables
with that one scanner per element. It’s an optimisation but it’s crucial for performance.
I’m not sure how to detect if “scan_by_offsets” is enough.

So it can be done, but it could be a bit messy. One way forward on this is to
move the scanner from a C function to a C++ virtual in fcops because
its type, ValueType, already has a product instance. However the type doesn’t
provide the information “this can be optimised”. Virtual dispatch to trivial
components (that contain no pointers) is a pain, whereas a NULL pointer
for a C function is detectable.

Anyhow that’s the first issue. The “functor” attribute applies to the unboxed type.
TO use it, you have make a heap copy with operator new. If you stick one in the
middle of a structure, or even use one has a local variable, Felix will screw up.

The second issue is that I want to write:

v.push_back a;

but i have to write

push_back(v,a);

I can easily fix this:

proc push_back[T] (v:vector[T]) (a:T) => push_back(v,a);

where overload resolution should pick the right procedure. The advantage
is also that I can form a closure over v and this closure has a polyadic type:

poly_push: T -> 0

that is, it is independent of the data functor .. there’s no mention we’re
pushing onto a vector, it could be a list of something.

I once tried to do this:

proc push_back : stl_vector -> t = "$1.push_back($2);";

This meant, to implement the above, i.e. replace the -> with *,
but then generate a wrapper so we got the more OO like notation.
This could work for functions too, although its more risky eg

fun vget: vector[T] -> int -> T = …

we just fudge all the arrows except the last one. It’s messy though,
consider:

ctor[T] vector[T] : int -> vector[T] = “ .. “

actually that’s a bug! The idea was to make a vector of some length,
but for a “ctor” the return type is automatic, so the above actually means:

fun _ctor_vector[T]: int -> vector[T] -> vector[T]

Note that C/C++ functions like

fun f: A * B -> C

do not actually accept a value of type A * B, they are C functions, so the first
argument is type A, and the second B. It’s a hack already .. so another hack
using -> isn’t so bad in itself. :-)

The main problem of this hackery is .. what if you REALLY wanted a C function
to take a Felix tuple value as a single argument? Well you can do that with
annotation $t (I think it is) in the C definition. What if you REALLY wanted to
pass a Felix function as an argument to C? After all it’s a pointer and C code
can use it fine (in fact callbacks ..).

But now the SERIOUS problem: these two issues are coupled.

push_back(v,a)

requires the first argument be a vector, not a pointer to a vector.
And that means the argument has type

vector[T] * T

which as just explained Felix cannot handle yet. But the problem is worse
because the general rule is you have to use a pointer to support mutation,
but you should pass by value for read access.

But if I make vector a pointer, i have the same problem I have with varray,
which is a pointer underneath. To be clear, the objective of pass by value
is to ensure a uniquely owned local copy which can safely be modified.
However we really do NOT want to copy a big object, if, despite the fact
we can, we actually don’t. In C++ const reference handles this well.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

May 19, 2019, 5:17:54 AM5/19/19

to felix google

So this code now fails as it should for the moment:

type vector[T] = "::std::vector<?1>"
requires Cxx_headers::vector,
property "functor"
;

ctor[T] vector[T] : 1 = "::std::vector<?1>()";

proc push_back[T] : &vector[T] * T = "$1->push_back($2);";

var v = vector[double]();
var pv = new v;
push_back (pv,1.2);

Fatal error: exception Flx_cal_type_offsets.NestedFunctor(_, "vector[double]:TYPE”)

However

var pv = new (vector[double])());

works fine. The reason is the variable v in the original code is a subobject
of the thread frame, and the compiler cannot generated offsets for the thread
frame now, because it contains a subobject which requires a custom scanner.

So the next job is make it work! I will note in this case .. the custom scanner
would be a noop anyhow, since “double” doesn’t contain any pointers.
That’s an additional optimisation.

Originally, the plan was to throw a wobbly as above, instead of skipping
over the subobject (which is what the code did before!).

But actually on thinking about it there’s another solution: generate the
scan_by_offsets exactly as before and run it and then ALSO run an array
of scanners on the offending subobjects. This is much faster than running
a scanner on each subobject .. but it’s also harder to organise!

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

May 21, 2019, 6:35:19 AM5/21/19

to felix google

UPGRADE
=========

So the scanner is done now. Serialisation is not complete.
This is a MAJOR upgrade to the GC. So I need to explain it.

By default, primitive types. i.e. those lifted from C/C++ are considered
to not contain pointers. They DO have destructors, unless marked “pod”.

You can mark a primitive “gc_pointer” which says the type is actually
a pointer.

For other cases, you can supply a custom scanner.

PREVIOUSLY the custom scanner would only be invoked if the type was
Felix heap allocated. In that case, the type gets an RTTI (shape) object which
specifies the scanner, and when the *Felix* pointer to the object is seen,
we know its shape and can invoke the custom scanner.

That stiill works but what you could NOT do was nest a value of that
type in a larger object. This now works for

(a) a variable in the thread frame (global store)
(b) a variable in any heap allocated object including
a local variable of a function or procedure, the argument
of a variant constructor, or a component of a new allocated
object

CAVEAT
======

The one thing that does NOT work is if the variable is on
the machine stack. This is because the type of the machine stack
is not known. It is scanned conservatively for pointers, but there’s
no way to find a non-pointer C++ object with non-felix pointers
contained in it that point to an object that contain Felix objects.
For example, an STL container.

If such a STL container has a value which contains Felix pointers, the
Felix GC will NOT find those pointers and so it make garbage collect
the object they point to. There is no possible way around this in a C/C++
compatible environment.

FUNCTORS
==========

There is also a special property “functor":

type vector[T] = "::std::vector<?1>"
requires Cxx_headers::vector,
property "functor"
;

which says the type is an STL container with value type the
type parameter. The effect is to automatically ensure instances
of type T have shape objects generated AND to automatically
provide a scanner for the data type which uses begin() end() iterators
to scan the type for values, it then scans each subobject of type T
found with the scanner for T.

The scan_by_offsets scanner will notice a vector primitive
BY THE PREVIOULY MENTIONED UPGRADE and invoke it.

In other words you can now use STL containers freely in Felix
code as first class values, you don’t have to use pointers to
them (although of course you can).

NOTE ON CAVEAT
===============

The caveat above seems a real killer. It’s NOT!
It’s unpleasant, but you have to remember Felix rarely uses
the machine stack because the machine stack must be empty
in order to perform a service call.

Felix WILL allocate objects on the machine stack is possible.
It’s always prefered over heap allocation when inlining isn’t possible.
In some cases you can register pointers as roots to protect them.

But there is a universal guarantee. Just heap allocate the STL container
OR any object that contains it. That works, even if the pointer is on the
machine stack.

I may look at ways to improve this, eg an annotation on functions that forces
heap allocation.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Jul 9, 2019, 10:34:35 PM7/9/19

to felix google

I’ve been thinking how to properly construct polymorphism. The way it is done in
Type Theory (especially with the Lambda Calculus background) seems bogus.
Category Theory is used in all modern accounts as the description language,
but the construction isn’t done from first principles.

Lets start with M, the category of monomorphic types. It consists of a set of primitive types,
and three primary constructors: finite products, finite sums, and an exponential. Its Cartesian
Closed. Some authors throw in a natural numbers object.

So now, a one variable polymorphic type is just a functor, for example:

list: M -> M

says “give me some type ‘a, and I will make you a list of ‘a”. We can collect all these
functors, and treat them as arrows in another category. However, some polymorphic types
have two variables, eg an associative store has key and value. These are bifunctors like:

map: M * M -> M

We want them as well, so in general we have to add objects M^2, M^3 … to our category:
all finite products of M. This includes U=M^0, the unit category with a single object and no arrows
other than the identity.

Lets call this category of functors P. It’s important to observe the rule of U. We want M,
the category of monomorphic types and functions, to be embedded in P because in all
sane high level programming languages like Haskell and Ocaml momomorphic and polymorphic
types are jumbled up. Also we, as category theorists HATE application!! We want composition!!

So instead of ‘int’, we are going to use the functor Int: U -> P but call it “int” anyhow.
Mathematicians call this “abuse of terminology” and programmers call it “implicit conversion”
or “syntactic sugar”.

This means to make a list, we no long apply list to int. Instead we *compose* them.
I’m writing this in reverse polish with whitespace operator for composition here:

int list = list_of_ints

Note that list_of_ints is a functor U -> P but again we think of it as a monomorphic type.

There are four enhancements required to P. In Felix, you can bind C++ templates to
polymorphic types, so we have to be able to add new primitive functors, for example:

vector: U -> P

This is similar to adding new primitive functions in M by using bindings to C++.
In other languages, the primitives are part of the run time library or a bindings to C.
Or they may also be defined in the language but their definition is hidden by some
abstraction mechanism.

The second enhancement is obvious but I’m not aware of any language that can do it:
since we added all the finite products of M, we should add finite sums as well!!
I mean, why not?

The third enhancement is a real killer. No language I know of implements this.
Haskell is the weakest. Ocaml is trying. C++ is actually the most advanced.
Note Felix can’t do it. We should add exponential objects.

In M, an exponential object D -> C is roughly the set of all functions from D -> C
treated as values. In the value calculus a special operator is required to use
a function value:

eval (f, a)

applies f to a. Syntactic sugar is common so you just write f(a), the same as if
f were a function. However the exponent is NOT a set, its a type, which has no
elements in category theory. Instead, we use:

f’ : 1 -> (D->C)

This operator is an arrow from the unit type to then exponential (D->C), so it picks
out an “element” of D->C, namely the one corresponding to function f. Do not confuse
the first -> there with the second one, the second one is just part of the name of the
object (NOT an operator). you can look in Wikipedia for exponential objects.

Now here is the thing: in P, if we add exponentials, it implies FIRST CLASS FUNCTORS.
No language I know of has this. It means you can have a variable which can hold
either a list, a set, a map, or some other functor as a value an you can do things with it.
AT RUN TIME.

You can do this in most dynamic languages including Python and Scheme.
In Felix you can *deconstruct* an arbitrary container with an iterator, which can
be used by a fixed function, but you cannot construct one dynamically, only statically.

You CAN have compile time functor variables though. In fact you cannot define
a Monad without them, since a Monad accepts a functor as an argument.

But there’s MORE. The final enhancement required is to add new objects
to the category P. So far P only contains combinations of M. But in M, we can
add new primitive types, so we have to be able to do the same thing in P.

When we add an object to P we have to remember ITS A CATEGORY
ON THE SAME LEVEL AS M.

One category we can add is N, a subcategory of M. Another one may
be distinct: BOOL. That’s the category that models bool, and has
object, staticbool, and functions operating on it, and its products and sums etc.

Felix has staticbool. In C++ this would be called constexpr bool. Its the compile time
only boolean type used for assertions. It is NOT a type. BOOL is a new kind.
Of course you could embed BOOL into M (and constexpr does this, its extremely
powerful operator in C++ transcending all other languages because it makes
automatic embeddings of kinds from compile time to run time!)

So P is beginning to look like, rather than a category of polymorphic types,
to be a catgeory or kinds.

the two critical things here are

(a) first class functors
(b) a *unified* category of types, data structures, and kinds

This can be done with a partial evaluator. In other words, every program
is just an extension of the compiler. There IS a language which can already
do this although it is not statically types: SCHEME.

—
John Skaller
ska...@internode.on.net

Reply all

Reply to author

Forward

0 new messages