John Skaller2

unread,

Dec 7, 2018, 9:13:09 AM12/7/18

to felix google

Thread termination logic:

—
John Skaller
ska...@internode.on.net

Shayne Fletcher

unread,

Dec 7, 2018, 10:09:35 AM12/7/18

to felix-l...@googlegroups.com

Nice!

--
You received this message because you are subscribed to the Google Groups "Felix Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-languag...@googlegroups.com.
To post to this group, send email to felix-l...@googlegroups.com.
Visit this group at https://groups.google.com/group/felix-language.
For more options, visit https://groups.google.com/d/optout.

ThreadTermination.jpg

John Skaller2

unread,

Dec 7, 2018, 9:48:03 PM12/7/18

to felix google

> On 8 Dec 2018, at 02:09, Shayne Fletcher <shayne.fl...@gmail.com> wrote:
>
> Nice!
>

But not so simple. At a reasonably high level, Felix has two subroutines:

run_until_blocked
run_until_complete

The second one keeps going until there is no more work to do (for this thread).
That includes no pending async requests.

The first one runs until there is no work that can be done *immediately*.

Why the dichotomy? Well, as you know, Felix is all about control inversion.
You write a thread .. Felix translates it into callbacks. (The callback is
a continuations resume() method).

Now, running to completion is what you want your mainline program to do.
However Felix is designed to be *embedded* in C, as well as being able to
embed C into Felix.

In particular, it is common to have a C program, such as an X-Windows
application using Xlib, that is unfortunately forced into being written as
callbacks driven by Xlib’s framework. Almost all these kinds of frameworks
are loops that call client callbacks with messages, the framework captures
the events and sends the message.

So we need to be able to run Felix for a while, and when there’s nothing
*immediate* to do, suspend it and return control to the master framework’s
event handling loop. When other stuff is idle, it will call Felix, provided
the programmer registers a suitable callback.

So Felix own framework itself hase to be turned into a callback.
The machinery is a bit messy, there are actually 4 callbacks,
three for setup and one to do the work.

Note that if you spawn a new pthread inside Felix THAT thread runs
all the time. Only the external event loop’s callback has to suspend
and resume.

In addition, whilst suspended, the thread cannot respond to a world_stop
request, so it has to actually tell the collector it is stopped, and un-tell (sic)
the collector when it resumes.

Furthermore, the client C/C++ code into which Felix is embedded needs to be
able to communicate with Felix code. For this reason there’s a special
service call “multi-write” which writes data to *multiple* channels at once.
These will be connected to coroutines that are suspended when the
write is done: the write is done from *outside* Felix. It reschedules the
readers so next time Felix is resumed, there’s work to do that wasn’t
there before. Its a kind of async I/O. (multi-write is available inside
Felix as well, its very useful for a clock telling lots of agents in a game
that its time for the next action).

So now, the thread termination condition is more difficult: not to implement,
yes, that could be tricky .. the difficulty now is much worse: its a specification
problem. Not even a design issue: the question is:

If multiple threads are servicing a queue .. what does it
actually MEAN to run until blocked? Should the other threads keep running?

The actual impact is that when querying the async read queue, run until
complete blocks until there is something to get out of the queue,
assuming there is a pending job (otherwise we’re done). However when
running run until blocked, it calls a different instruction on the queue
that always returns, whether or not there’s something to fetch.
If something got fatched .. processing continues, otherwise we’re
blocked and we have to yield to the master framework.

The logic was designed assuming one pthread. Note if we terminate
all the threads .. we have to decide how after resuming following
a suspension how to restart them.

When you spawn a pthread normally it uses run until complete.
Its only the main thread that uses run until blocked.
I have to check but, the driver is the thing responsible for
controlling this. The driver could run until blocked, in which
case it would just loop around and resume immediately.
So the flag to say, run until complete is an optimisation.

—
John Skaller
ska...@internode.on.net

Keean Schupke

unread,

Dec 8, 2018, 3:23:21 AM12/8/18

to felix google

Hi John,

If you are using a fixed sized hardware thread pool (sized for the number of CPU cores) then threads never terminate.

If you are considering the suspended async tasks, you don't care about termination because they consume no resources, except preventing the GC of resources they may reference if they ever resume. There is not really a point at which we can say an asynchronous task will never resume automatically, as this is determined by semantics outside the scope of the programmer. As such it is impossible to automatically do this, but if the programmer is sure an asynchronous task is no longer needed they should be able to 'cancel' it which means it's result is never going to be observed, and any referenced resources can be freed.

Keean.

John Skaller2

unread,

Dec 8, 2018, 9:01:18 AM12/8/18

to felix google

> On 8 Dec 2018, at 19:23, 'Keean Schupke' via Felix Language <felix-l...@googlegroups.com> wrote:
>
> Hi John,
>
> If you are using a fixed sized hardware thread pool (sized for the number of CPU cores) then threads never terminate.

Sure they do. Felix threads are detached. The process terminates when
all the threads do.

At present if you use a thread pool, its implicitly created, but unfortunately
you have to manually stop it or the process won’t terminate.

However the thread pool used by the coroutine scheduler to elevate them
to microprocesses is unrelated to the thread pool.

The design at the moment is that te thread pool defaults to 1 thread per CPU
I think, or to 8. I can’t remember. But you can set it to whatever you want.
There’s a system thread pool, but you can create one if you want.

For the coroutines, at the moment, the client has to spawn the extra threads.

In other words, Felix doesn’t use a fixed set of threads, one per CPU.
Its up to the programmer to make choices. I may change that after
experiments and measurements.

>
> If you are considering the suspended async tasks, you don't care about termination because they consume no resources, except preventing the GC of resources they may reference if they ever resume. There is not really a point at which we can say an asynchronous task will never resume automatically, as this is determined by semantics outside the scope of the programmer. As such it is impossible to automatically do this, but if the programmer is sure an asynchronous task is no longer needed they should be able to ‘cancel' it which means it's result is never going to be observed, and any referenced resources can be freed.

I’m not sure if you can cancel an async task at the moment.
Worse, if you do socket I/O it will not return until the job
is done or it gets an error. There’s no timeout. That needs to be fixed.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Dec 8, 2018, 9:20:06 AM12/8/18

to felix google

Ok, I think I have figured out the some of the semantics for top level threads.

There are three adjectives for Felix threads:

(a) mainline thread
(b) primary thread
(c) secondary thread

The mainline thread is a primary thread. It is the thread that was running when
your program starts.

The non-mainline primary threads are those made by spawn_pthread.
The secondary threads are those made by spawn_process.

Every thread has an async scheduler and sync scheduler. However
secondary threads shared fibre list with one primary thread.

Now, we need secondary threads to die when there’s no work to do,
and none coming, however primary threads must return.

So, the mainline thread can be run two ways:

(a) blocking mode
(b) returning mode

In blocking mode, if there are pending async jobs, and no active threads,
a thread hangs on the async queue whereas in returning model,
it returns. This means it gets suspended and Felix yields to an external
event loop. On re-entry, the thread is resumed.

If there are active jobs but no work in the job queue, the mainline
thread must return. However no secondary thread may return
until there are no running threads, and no pending async.
Instead they pauses and retry, until this condition is met.

So things to note: by these rules, the mainline thread ONLY
may return, but leave secondary threads waiting on async.
So actually, the mainline thread can return even when there
are running jobs. Its not at all clear why the mainline thread
wouldn’t return at ANY time if there are other threads to carry on
the work.

To preseve the existing semantics, the mainline thread shouldn’t
return until there are no active threads. At this point, if there
is no pending async, all threads return, if there is pending async,
the secondary thread wait. This design has merit too, but the previous
one allows the event loop to run more quickly.

There is still a problem. If the mainline thread returns, the secondary
threads can still run out of all work and terminate.

This sounds OK but it isn’t because in the external event loop,
a multi-write can put jobs back into the queue.

Note that for nested schedulers at present there’s a new queue,
and a new sync scheduler but no new async scheduler.
At present a nested job cannot spawn pthreads, processes,
or async jobs.

—
John Skaller
ska...@internode.on.net

Keean Schupke

unread,

Dec 8, 2018, 12:31:31 PM12/8/18

to felix google

Hi John,

I would simplify that, and have the runtime entry code fire up a thread pool of 1 thread per core (overridable by a command line parameter) and have that pool exist all the time the program is running. In that way the programmer doesn't have to think about hardware threads at all. The programmer simply schedules asynchronous tasks, which either can be in parallel or are sequential, and should not have to worry whether there are one, eight, or sixty-four hardware threads (there would be 64 on an AMD Threadripper 2990 for example).

Keean.

John Skaller2

unread,

Dec 8, 2018, 5:15:27 PM12/8/18

to felix google

> On 9 Dec 2018, at 04:31, 'Keean Schupke' via Felix Language <felix-l...@googlegroups.com> wrote:
>
> Hi John,
>
> I would simplify that, and have the runtime entry code fire up a thread pool of 1 thread per core (overridable by a command line parameter) and have that pool exist all the time the program is running. In that way the programmer doesn't have to think about hardware threads at all. The programmer simply schedules asynchronous tasks, which either can be in parallel or are sequential, and should not have to worry whether there are one, eight, or sixty-four hardware threads (there would be 64 on an AMD Threadripper 2990 for example).

Three issues with that.

First as noted previously, Felix as a systems programming language should not
pre-empt decisions of the developer using it. Its a toolbox. If you want to do
imperative programming you can. If you want to do functional programming you can.
If you want to program with multiple pthreads, you can.

Instead, there *is* a system thread pool, defaulting to a number of threads
related to the number of CPUs. But its not fixed for a good reason: the Felix
program might not be the only process running.

Instead, the system should be *capable* of roughly what you describe,
but not enforce that design.

Second, there are heavy constraints on synchronisation when using a thread pool.
A job in a pool is cannot be allowed to block. So instead, every synchronisation
device has to be replaced by an equivalent one that works in a pool. As noted
previously, in the extreme form of this, we’re effectively rewriting the operating
system. Maybe I can do it better than the real OS and maybe not :-)

Third, Felix is designed to be embedded in C, as well as embedding C.
More particularly, it is designed to be embedded in an external event loop.
This is absolutely necessary in some contexts, for example, you might
want to do Windows graphics programming inside an event management
library, or X windows within Xlib. Or use SDL2 under iOS and Android.

To make this work there are two considerations. First, Felix mainline
has to suspend regularly, because its running as an “idle-task” in the
external event loop.

Secondly it has to synchronise with the event loop. If you’re doing
crap like OpenGL, for example, and you want to do graphics on
the current display, it HAS to be done from the main thread.
Similarly in Windows, the message queues are “per thread”.
X is thread safe and better designed in some ways but if you’re
writing SDL2 code, you have to make it work on all platforms.
SO even though you can let non-mainline threads keep running
whilst the main thread has yielded to the event looop, there
are constraints on what they can do: no graphics, for example.

In other words as an application language, you cannot just run
“the program” in a thread pool. In fact Felix doesn’t generate
programs. It generates libraries.

So the bottom line is, even if your design is optimal for a dedicated server
program, that is only one use case. The hard thing is to make that
use case work fast but still support the others.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Dec 9, 2018, 7:08:12 PM12/9/18

to felix google

OK so for top level (non-nested) scheduler:

There are 4 kinds of Felix theads (note: there are other threads such as demux event
polling threads and also in parallel mode, the GC can run extra threads).

1. mainline

This is the Felix thread created out of the standard driver’s mainline thread,
usually the one the OS started the process with. It constructs the Felix
environment, runs until there is nothing to do, cleans up, and returns
control to the OS. The mainline thread is not usually permitted to terminate
until all other threads have terminated (because it would destroy their
environment, including the garbage collector!)

2. embedded

This is not currently used. It is the kind of thread the client specifies
when embedding Felix in an event loop. It differs from the mainline
thread in that it the associated Felix thread can suspend and resume.
When it suspends the Felix thread object terminates but the environment
is preserved. Whilst suspended, the physical thread is running the
external event loop. Other Felix threads can continue to run .

3. pthread

The kind of thread made by spawn_pthread. It gets its own scheduler
and scheduler list. Async I/O constructs a dediced demux thread.
It operates the same as a mainline thread except that termination
does not tear down the whole Felix environment, it just dies.

4. process

The kind of thread mad by spawn_process. It gets its own scheduler
but SHARES the scheduler list with its caller. The sharing is transitive
and accumulative. This is how you run multiple coroutine concurrently.

TERMINATION RULES

The general rule is set of related Felix threads must terminate together.
Threads are related if, and only if, they share the same coroutine queue.
In a set of related threads all but one must be process threads, thesse are SLAVES.
The other thread is the MASTER.

To make this work is tricky. So I am trying this rule:
if there is

(a) no work being done
(b) no synchronous work to do
(c) no pending async jobs

then all process threads terminate and the other thread returns.
If its a pthread it also terminates but the mainline thread has
to do cleanup first.

Its quite tricky to meausre when there is no work being done and
no synchronous work to do efficiently.

When there are pending async jobs then

1. IF the MASTER thread is not embedded,
it BLOCKS waiting to fetch one when it completes

2. IF a thread is embedded or process
if work could turn up then
it spins with a delay waiting for work
otherwise it terminates

Work is possible if:

(a) a thread is workikng
(b) the sync queue is populated
(c) there are async jobs pending
(d) the MASTER is embedded

The really hard thing is deciding wnen a thread is working.
The rule is simple: a thread is working if, and ony if, the “ft” variable of the
sync scheduler is non-null. When a fibre finishes and it becomes NULL
the sync scheduler tries to fetch another fibre to run. The thread is
considered to continue to be working whilst it tries to do this.

So basically all that’s required is to start the count at zero,
and increment it when ft is set to non-NULL, and decrement it
when it becomes NULL.

The hard but is the optimisation. If it is NULL only transiently
we don’t want to decarement and then increment the count.
We only want the count to be decremented when a scheduler
finds there is NO work to do other than maybe async work.

If the working count is zero AND the asyc pending count is zero,
tha set of related threads is finished IF and ONLY IF it is
a mainline or pthread set.

Embedded .. I don’t know for sure yet.
In this case threads should just keep running until explicitly
killed. The reason is, the client external event loop is
allowed to *inject* work into Felix from outside. There’s a
special operation for this. As a motivating example,
a bunch of fibres is running a game, shooting each other,
making noises and drawing. Then they all complete,
and the embedded system suspends and yields to
the event loop. Whan 4 ms has elapsed, the master
framework injects a clock pulse with a multi-write that
wakes up all the fibres, and then resumes running Felix.

4 ms is 25 FPS, about the right frame rate for a low end game.

Anyhow the difficulty is I have functions like “get fibre from queue”
that can return either a fibre or NULL if there aren’t any.
So I can’t really tell with an explicit test, in each case.
SImilarly routines exit. The problem is, this stuff happens
concurrently.

The current code has a race.- A thread says its not working
and delays a bit, then goes back to trying to work, where it
says its working. All the process threads do this. So most
of the time they’re spinning saying, “I’m not working” and
occasionally, even when there’s no work to dom they’re
saying “I’m working” before they actually find some work.
Looking for a job is a job in itself :-)

The problem is, we need some random luck for all
the threads to be saying they’re not working, including
the master .. before any of them can terminate.
And of course they can’t terminate when they’re sleeping.
And they’re working when they’re not sleeping … :-)

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Dec 10, 2018, 12:05:53 AM12/10/18

to felix google

Well finally .. all tests pass. And:

///
~/felix>flx concurrent_coroutines.flx
Spawned
Spawned
New thread
New thread
New thread
New thread
New thread
New thread
Elapsed 7.49796
Elapsed 0.685071
~/felix>flx mul
Starting
Single thread Done 21.6451
Thread pool Done 3.21698
~/felix>flx build/release/share/demo/threadpoolex1.flx
Naive mul elapsed 2.72407 seconds
Naive Parallel mul elapsed 0.398374 seconds

Verified
Using thread pool's pforloop

Smart Parallel mul elapsed 0.41521 seconds
Verified
pfor mul elapsed 0.4002 seconds
Verified
//////

i have not checked the logic for processes (concurrent coroutines).
I am NOT sure that the atomic updates actually include barriers.
I’m not sure there are no races.

I am sure nested schedulers still only allow synchronous ops.

But the numbers look good, finally. Concurrent coroutines, 8 times faster than single threaded.
Thread pool based multiply: mul, anout 7 times faster with multiple threads.
Using system thread pool, same: around 7 times faster.

So .. its time to commit. :-)

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Dec 10, 2018, 1:49:07 PM12/10/18

to felix google

> On 10 Dec 2018, at 16:05, John Skaller2 <ska...@internode.on.net> wrote:
>
> Well finally .. all tests pass. And:
>
> ///
> ~/felix>flx concurrent_coroutines.flx

> Elapsed 7.49796
> Elapsed 0.685071

So I checked the results, got a bug, got different timings .. its not clear what’s
happening. The problem is the timing and result found depend on exactly
when the result printer runs which is not determinate with the test.
It would be good if I could “run” the tests but “run” doesn’t allow spawning processes
at the moment.

I tried a second version in which the helper processes are spawned FIRST
not afterwards. This test ALSO included the time to create an ftread, which uses
malloc.

This version of the test got the right results.

But the SHOCKER was the timing. Ithought it would be slower because malloc()
was included in the timing. Here’s the result:

Single thread: Elapsed 0.000173

Huh?

With helper threads:

Elapsed 2.3e-05

That’s SO FAST the measurement is effectively zero.

Hum. The problem appears to be that STL list is SLOW. In the concurrent
version of the test, the fibres get eaten up by the helper threads faster than
they can be created. So the STL list is always almost empty, and malloc()
is simply recycling the memory from STL list. (The Felix objects are
garbage collected and the collector is not running).

I need a more robust test. With the collector running, the numbers don’t
change but they must be lying because the collector reports program
time of 5 seconds (44% in the GC which is pretty good).

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Dec 15, 2018, 4:01:03 PM12/15/18

to felix google

Well … this code:

proc atest() {
println$ "A test";
spawn_pthread { println$ "I'm a pthread!"; sleep 2.0; println$ "Pthread awakes"; };
sleep 1.0;
println$ "Hey, I woke up";
}

println$ "Running async test";
async_run atest;
println$ "After running async test";

println$ "Running async test";
async_run atest;
println$ “After running async test";

WORKS! Instead of “run” you say “async_run”. This is a MAJOR upgrade.
I haven’t tried it with concurrent coroutines yet. There’s a still a problem
with them (performance sucks).

There is a bug: the async scheduler is currenly leaked. In the RTL it
is ref counted. I could fix that with an explicit delete, but I think I’ll
just fix the RTL to garbage collect it.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Dec 19, 2018, 8:39:08 AM12/19/18

to felix google

So its not yet working but I am close to a design. So here’s what happens:

1. In a main program, the environment is set up, the mainline thread executes,
it returns, and it tears down the environment. Process complete. The thread
must suspend but remain registered when async events are pending.

2. In an event loop, the thread exits without cleanup and suspends
as soon as there is no work to do, leaving pending async events.
The driver must resume the thread repeatedly until there are
no pending async events, in this case the thread exists
and cleans up.

3. A thread can spawn another thread with spawn_pthread.
That thread has its own scheduler and async queue.
It exits only when there is no work and no pending events.
It doesn’t cleanup (only the main thread is allowed to do that).

4. The main thread must wait until all other threads are completed
before cleanup.

5. Any thread can create and run a nested scheduler.
Doing this is a subroutine call. The thread cannot return until
there is no work to do on that scheduler.

6. Any thread can create a helper process thread.
It runs with the same async scheduler and queue
as its creator. Processes cannot return until there
is no work to do and no async pending.T

7. The thread that creates them also cannot return until all the
child processes it created have returned. In effect,
the master thread must join all te slaves.

8 Note that if a slave process creates another one it
is owned by the mastrer not the slave so the creator slave
does not wait for it, the master does.

9. See again (5), slaves can run nested schedulers too.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Dec 20, 2018, 10:15:05 PM12/20/18

to felix google

~/felix>flx concurrent_coroutines; flx concurrent_coroutines2; flx concurrent_coroutines3
Schedule before launch
Done Serial kk = 200, elapsed=12.4906
Done Concurrent kk = 200, elapsed=3.78415
Schedule after launch
Done Serial kk = 200, elapsed=12.4911
Done Concurrent kk = 200, elapsed=4.57675
Spawn after launch
Done Serial kk = 200, elapsed=12.5106
Done Concurrent kk = 200, elapsed=5.71336

Each test spawns 6 helper processes, so for my corei7 with hyperthreading,
that’s 7 threads. Each coroutine is eating CPU by calculating ack(3,40),
where ack is Ackerman’s function. 200 runs are done.

Schedule uses spawn_fthread which puts the jobs at the END of the queue.

Spawn uses spawn_fibre which starts running the job immediately,
pushing *itself* onto the head of the queue.

Exactly why the results differ is beyond me! In these tests, almost all the time
is spent doing Ackerman’s function, so the context switching time is
irrelevant. The GC time will be insignificant.

Note that if you spawn_fibre first then launch the helper processes
it should be the same as for serial (or slower).

—
John Skaller
ska...@internode.on.net

unread,

Dec 30, 2018, 4:34:22 PM12/30/18

to felix google

ok so my next trick is to look at the cost of service calls.

at present you write

_svc variabe

where variable contains the service call. The address of the variable is put into
the current continuation which then returns control to the scheduler.

The type of a service call is a variant with the standard representation,
a _uctor_ with an integer tag for the service call number and a pointer
to the argument. Here’s read and write on an schannel:

/*7*/ | svc_sread of _schannel * &address
/*8*/ | svc_swrite of _schannel * &address

The second argument is a pointer to the location containing the address
value to be read or the place to put the address value being written.

The argument is heap allocated according to the rule for variant formation.

It works like this:

noinline proc svc(svc_x:svc_req_t) {
var svc_y=svc_x;
_svc svc_y;
}

proc read[T] (chan:schannel[T], loc: &&T) {
svc$ svc_sread$ C_hack::cast[_schannel] chan, C_hack::reinterpret[&root::address] (loc);
}

proc write[T] (chan:schannel[T], v:T) {
var ps = C_hack::cast[root::address]$ new v;
svc$ svc_swrite$ C_hack::cast[_schannel] chan, &ps;
}

The write operation make a copy of the argument to be written on the heap,
since schannels can only read and write addresses.

It shouldn’t need to do that if:

(a) the value fits into an address
(b) its a unique pointer already

So basically at present doing a write requires 2 allocations and doing a read requires 1.
Using the new allocation free schannel operations and queue service I got this
with a null test sending 50K integers along a pipeline length 50.

~/felix>FLX_MIN_MEM=1000 flx concurrent_pipeline3
List length = 49152
Serial Elapsed 10.288
Concurrent Elapsed 37.12

If I watch the mac CPU HIstory monitor running this the concurrent version CPU usage bars are
filled with red. The monitor uses green for user and red for system use. Only 4 CPUs work.
There’s no red in the serial version. The red has to be thread pre-emptions related to
contention of locks on allocation.

The GC uses a system mutex to serialise allocations. Allocation use malloc which itself
is thread safe and probebly uses a system mutex for serialisation as well (although
modern mallocs have thread local storage to allocate from, using that pool there would
be no serialisation required).

Note that in Felix GC locks are acquired even if the program is entirely serial.
There is a serial GC, in fact the GC is a serial object which is made thread
safe in a derived class with wrapper methods that do the locking.

Unfortunatly wrapping malloc in a spinlock is not acceptable because
malloc is an unknown. On the other hand the management data structure
is a JudyLArray with guarranteed bounded performance so a spinlock
for it would be appropriate on access .. but of course insertions still
have to use malloc.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Dec 30, 2018, 4:57:07 PM12/30/18

to felix google

>
> ~/felix>FLX_MIN_MEM=1000 flx concurrent_pipeline3
> List length = 49152
> Serial Elapsed 10.288
> Concurrent Elapsed 37.12

Deleting collector total time = 54.79076 seconds, gc time = 123.09077 = 224.66%

Interesting :-)

Without GC:

~/felix>FLX_MIN_MEM=2000 FLX_REPORT_COLLECTIONS=1 flx concurrent_pipeline3
[FLX_REPORT_COLLECTIONS] Collection report enabled
[FLX_REPORT_GCSTATS] GC statistics report enabled
[FLX_REPORT_COLLECTIONS] Collection report enabled
[FLX_REPORT_GCSTATS] GC statistics report enabled
List length = 49152
Serial Elapsed 10.3565
Concurrent Elapsed 23.7794
Deleting collector total time = 34.93044 seconds, gc time = 0.00000 = 0.00%

The concurrent version is still slower.

Note that in this test I *EXPECT* the concurrent version to be slower.
This is because its a null test. The routines in the pipeline do nothing
except read and write an int. No calculations. So ALL the time is spent
in the system. So most of the time is going to be spent contending locks
and doing allocations.

I added a forced collection between the serial and concurrent tests:

~/felix>FLX_MIN_MEM=2000 FLX_REPORT_COLLECTIONS=1 flx concurrent_pipeline3
[FLX_REPORT_COLLECTIONS] Collection report enabled
[FLX_REPORT_GCSTATS] GC statistics report enabled
[FLX_REPORT_COLLECTIONS] Collection report enabled
[FLX_REPORT_GCSTATS] GC statistics report enabled
List length = 49152
Serial Elapsed 10.18
[Gc::collect] Program requests collection
[flx_gc:gc_profile_t] actually_collect
actually collected 14893302 objects, still allocated: 2 roots, 49896 objects, 816276 bytes
[Gc::collect] Collector collected 14893302 objects
Concurrent Elapsed 15.9588
Deleting collector total time = 34.12926 seconds, gc time = 7.60181 = 22.27%

So 15 million objects are being created. In this test, the service call object only
need to be created once, and I can fudge that in the test code: at present the service
call is inside the procedure’s read/write loop. But the addresses of the data to be read/written
are the same each iteration so the service call object can be lifted out of the loop (the call itself
can’t be of course).

Also I can bypass the “new” operation on the data to write with some hackery.

It actually quite good above: only 50% overhead for the concurrent version.
however the overall time of 10 seconds is absurd, that’s at least 1000 times
too slow. We’re only using 50K values, and a pipeline of length 50.
It should only take a few microseconds to send a value down the pipe.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Dec 30, 2018, 10:50:41 PM12/30/18

to felix google

> On 31 Dec 2018, at 08:57, John Skaller2 <ska...@internode.on.net> wrote:
>
>
>>
>> ~/felix>FLX_MIN_MEM=1000 flx concurrent_pipeline3
>> List length = 49152
>> Serial Elapsed 10.288
>> Concurrent Elapsed 37.12
>
> Deleting collector total time = 54.79076 seconds, gc time = 123.09077 = 224.66%
>
> Interesting :-)
>
> Without GC:
>
> ~/felix>FLX_MIN_MEM=2000 FLX_REPORT_COLLECTIONS=1 flx concurrent_pipeline3
> [FLX_REPORT_COLLECTIONS] Collection report enabled
> [FLX_REPORT_GCSTATS] GC statistics report enabled
> [FLX_REPORT_COLLECTIONS] Collection report enabled
> [FLX_REPORT_GCSTATS] GC statistics report enabled
> List length = 49152
> Serial Elapsed 10.3565
> Concurrent Elapsed 23.7794
> Deleting collector total time = 34.93044 seconds, gc time = 0.00000 = 0.00%

As promised I hacked it to be functionally equivalenty but do no allocations.
Actually the service request object constructors do an allocation but it is
only done once, not inside the loop.

The hack transfers the integers directly instead of a pointer to a heap allocated
value. So we can see how fast the RTL machinery is.

Are you ready???

~/felix>FLX_MIN_MEM=1000 FLX_REPORT_COLLECTIONS=1 flx concurrent_pipeline3

[FLX_REPORT_COLLECTIONS] Collection report enabled
[FLX_REPORT_GCSTATS] GC statistics report enabled
[FLX_REPORT_COLLECTIONS] Collection report enabled
[FLX_REPORT_GCSTATS] GC statistics report enabled
List length = 49152

Serial Elapsed 0.249465

[Gc::collect] Program requests collection
[flx_gc:gc_profile_t] actually_collect

actually collected 295204 objects, still allocated: 2 roots, 49746 objects, 813200 bytes
[Gc::collect] Collector collected 295204 objects
Concurrent Elapsed 0.248457
Deleting collector total time = 1.58544 seconds, gc time = 0.16338 = 10.30%

The serial code is 50 times faster, the concurrent code 100 times faster,
and, faster than the serial code too. So its not the 1000 times I hoped for ..
but its still pretty awesome.

The code is below. This kind of transfer (zero allocations) is good
for ANY pointer and any integer small enough to fit in a pointer.

Observe the svc_sread/svc_swrite type constructors that make service
request objects and the _svc primitive that does the actual operation.
The Felix compiler translates _svc into a service call by setting
the pointer to the request into the continuation object, saving
the program counter, and returning. The scheduler does the request
then calls the resume() method which jumps to the point after
the request and continues.

It’s easy to overload read/write so they transfer pointers directly.
Its not clear how to avoid repeadedly allocating the request object tho.
The user shouldn’t need to see that.

//////////////////////////////
proc makem (var lst:list[address]) (r: (out: %>address)) () {
var x : address;
var wreq = svc_swrite (C_hack::cast[_schannel]r.out, &x);
proc dowrite() { _svc wreq; }
next:>
match lst with
| Empty => ;
| head ! tail =>
x = C_hack::cast[address]head;
lst = tail;
//println$ "elt=" + x.str;
dowrite();
goto next;
endmatch;
}

proc slow (seq:int) (r: (inp: %<address, out: %>address)) () {
var x : address;
var rreq = svc_sread (C_hack::cast[_schannel]r.inp, &x);
var wreq = svc_swrite (C_hack::cast[_schannel]r.out, &x);
proc doread() { _svc rreq; }
proc dowrite() { _svc wreq; }
while true do
doread();
//println$ "rout " + seq.str + " read " + x.str;
// assert(C_hack::cast[uintptr]x == C_hack::cast[uintptr]1);
dowrite();
done
}

proc countem (r: (inp: %<address)) () {
var x : address;
var rreq = svc_sread (C_hack::cast[_schannel]r.inp, &x);
proc doread() { _svc rreq; }
while true do
doread();
//println$ "result=" + x.str;
done
}

var lst = ([C_hack::cast[address]5,C_hack::cast[address]2,C_hack::cast[address]3]);
for(var i = 1; i<15; ++i;) lst = lst + lst;
println$ "List length = " + lst.len.str;

proc check(concurrent:bool)
{
async_run {
if concurrent do
spawn_process {; };
spawn_process {; };
spawn_process {; };
spawn_process {; };
spawn_process {; };
spawn_process {; };
spawn_process {; };
done
/*for(i = 1; i<1000; ++i;)*/ {
//if i%100 == 0 perform
//println$ "i=" + i.str;
#(makem lst |->
slow 1 |->
slow 2 |->
slow 3 |->
slow 4 |->
slow 5 |->
slow 6 |->
slow 7 |->
slow 8 |->
slow 9 |->
slow 10 |->
slow 11 |->
slow 12 |->
slow 13 |->
slow 14 |->
slow 15 |->
slow 16 |->
slow 17 |->
slow 18 |->
slow 19 |->
slow 20 |->
slow 21 |->
slow 22 |->
slow 23 |->
slow 24 |->
slow 25 |->
slow 26 |->
slow 27 |->
slow 28 |->
slow 29 |->
slow 30 |->
slow 31 |->
slow 32 |->
slow 33 |->
slow 34 |->
slow 35 |->
slow 36 |->
slow 37 |->
slow 38 |->
slow 39 |->
slow 40 |->
slow 41 |->
slow 42 |->
slow 43 |->
slow 44 |->
slow 45 |->
slow 46 |->
slow 47 |->
slow 48 |->
countem);
};
};
}

var start = time();
check(false);

var elapsed = time() - start;

println$ "Serial Elapsed " + elapsed.str;
collect();

start = time();
check(true);

elapsed = time() - start;

println$ "Concurrent Elapsed " + elapsed.str;

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Jan 1, 2019, 6:59:50 PM1/1/19

unread,

John Skaller2

unread,

Jan 25, 2019, 10:43:30 PM1/25/19

to felix google

I am tired of trying to get Appveyor to build Felix. The stupdi system is too slow
for a standard build, forcing all sort of rubbish like saving the built version
on Github .. something contrary to the purpose of Github ..just to save time
on the build. Similarly, there’s a win64 binary version of Ocaml on GitHub.

In general, CI servers are all a load of rubbish. YAML is a heap of drivel.
Ci servers need

(a) a real control language
(b) transient storage so they can work like a real build system
(c) much better interactive interfaces
(d) inter-project synchronisation

Felix is a moderately large, moderately complicated system.
CI servers as they are at the moment are good for builing small
libraries written in C and not much else.

Felix builds on my core i7 Windows box in about 30 minutes.
It times out on Appveyor. Felix builds on my Mac in about 20 minutes.
It times out on Travis sometimes.

Appveyor doesn’t bomb a project if the log gets too long but
Travis does. This is ridiculously moronic crap.

~/felix>ls -R build/release/* | wc
5926 5624 94821

The log limit on Travis is 1000 lines. Felix built, as shown,
consists of around 6000 files.

In principle Felix can be built in pieces, but there are dependencies
and CI servers have no sane way to manage or refer to them.
Instead they use the brain dead idea of just building everything
from scratch every time.

As it is, I want to add a LOT more tests to the build.
And I also want to start adding performance tests which necessarily
must take enough time to get reasonable meaures of the time take:
IMHO about 1 second is necessary. I’m adding uniqness support to
some existing libraries, starting with list (on top of the test case
libraries ustring and ucstring) and it’s necessary to check the
performance upgrades this enables are actually happening.

There are also a significant number of optimisations in the compiler
missing or commented out. Again, there’s not much point to an optimisation
if you cannot check it is working. In some cases this requires historical
data to at least prevent performance regressions. This is hard to organise.
Tests change. New ones are added. The right way to do this is to
run the current tests on OLD versions of Felix as well as the current
one to show the improvements (if any :)

Uniqueness typing should allow Felix to clobber both Ocaml and Haskell
on a significant class of operations. Haskell is going to get linear typing
at some point.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Jan 26, 2019, 10:17:51 AM1/26/19

to felix google

Its tricky to dream these up. I hope the answer is correct:

/////////////////////////////
// List performance test
var y = ([1,2,3,4]);
for (var j=0; j<14; ++j;) perform y = y + y;
println$ "Length " + y.len.str;
collect();

begin
var x = y;
var start = time();
for (var i = 1; i < 16 ; ++i;) perform
x = rev x;

var elapsed = time() - start;

println$ "Len = " + x.len.str + ", time = " + elapsed.str;
collect();
end
begin
var x = box y;
var start = time();
for (var i = 1; i < 16 ; ++i;) perform
x = rev x;

var elapsed = time() - start;

println$ "Len = " + x.len.str + ", time = " + elapsed.str;
collect();
end
///////////////////////

So we make a list of 64K elements, then reverse it 16 times, and call the GC.
Each rev should copy the list.

Then we make the list unique, and repeat.
It should reverse the list in place.

Here are the numbers:

~/felix>flx lst
Length 65536
Len = 65536, time = 3.91266
Len = 65536, time = 0.0106959

Here’s the reason its 400 times faster:

~/felix>FLX_REPORT_COLLECTIONS=1 flx lst

[FLX_REPORT_COLLECTIONS] Collection report enabled
[FLX_REPORT_GCSTATS] GC statistics report enabled
[FLX_REPORT_COLLECTIONS] Collection report enabled
[FLX_REPORT_GCSTATS] GC statistics report enabled

Length 65536

[Gc::collect] Program requests collection
[flx_gc:gc_profile_t] actually_collect

actually collected 262136 objects, still allocated: 2 roots, 65539 objects, 1048808 bytes
[Gc::collect] Collector collected 262136 objects

^^^^^^^^^^^ after initial list construction

vvvvvvvvvv start first test timer now

[flx_gc:gc_profile_t] Threshhold 10000000 would be exceeded, collecting
[flx_gc:gc_profile_t] actually_collect
actually collected 345258 objects, still allocated: 2 roots, 139868 objects, 2238096 bytes
[flx_gc:gc_profile_t] New Threshhold 10000000
[flx_gc:gc_profile_t] Threshhold 10000000 would be exceeded, collecting
[flx_gc:gc_profile_t] actually_collect
actually collected 308095 objects, still allocated: 2 roots, 195612 objects, 3130000 bytes
[flx_gc:gc_profile_t] New Threshhold 10000000
[flx_gc:gc_profile_t] Threshhold 10000000 would be exceeded, collecting
[flx_gc:gc_profile_t] actually_collect
actually collected 345760 objects, still allocated: 2 roots, 171883 objects, 2750336 bytes
[flx_gc:gc_profile_t] New Threshhold 10000000
[flx_gc:gc_profile_t] Threshhold 10000000 would be exceeded, collecting
[flx_gc:gc_profile_t] actually_collect
actually collected 357624 objects, still allocated: 2 roots, 154087 objects, 2465600 bytes
[flx_gc:gc_profile_t] New Threshhold 10000000
[flx_gc:gc_profile_t] Threshhold 10000000 would be exceeded, collecting
[flx_gc:gc_profile_t] actually_collect
actually collected 366522 objects, still allocated: 2 roots, 140740 objects, 2252048 bytes
[flx_gc:gc_profile_t] New Threshhold 10000000
[flx_gc:gc_profile_t] Threshhold 10000000 would be exceeded, collecting
[flx_gc:gc_profile_t] actually_collect
actually collected 307658 objects, still allocated: 2 roots, 196267 objects, 3140488 bytes
[flx_gc:gc_profile_t] New Threshhold 10000000
[flx_gc:gc_profile_t] Threshhold 10000000 would be exceeded, collecting
[flx_gc:gc_profile_t] actually_collect
actually collected 345432 objects, still allocated: 2 roots, 172374 objects, 2758192 bytes
[flx_gc:gc_profile_t] New Threshhold 10000000
[flx_gc:gc_profile_t] Threshhold 10000000 would be exceeded, collecting
[flx_gc:gc_profile_t] actually_collect
actually collected 348674 objects, still allocated: 2 roots, 163159 objects, 2610752 bytes
[flx_gc:gc_profile_t] New Threshhold 10000000
Len = 65536, time = 4.0285

^^^^^^^^^^^^^^^ first test finished

vvvvvvvvvvvv final collection

[Gc::collect] Program requests collection
[flx_gc:gc_profile_t] actually_collect

actually collected 289631 objects, still allocated: 2 roots, 131077 objects, 2097432 bytes
[Gc::collect] Collector collected 289631 objects
Len = 65536, time = 0.0105109

^^^^^^^^^^^ second test finished

vvvvvvvvvvv final collection

[Gc::collect] Program requests collection
[flx_gc:gc_profile_t] actually_collect

actually collected 131071 objects, still allocated: 2 roots, 131078 objects, 2097456 bytes
[Gc::collect] Collector collected 131071 objects
Deleting collector total time = 4.89595 seconds, gc time = 2.29151 = 46.80%

Now again with more memory allocated:

~/felix>FLX_REPORT_COLLECTIONS=1 FLX_MIN_MEM=1000 flx lst

[FLX_REPORT_COLLECTIONS] Collection report enabled
[FLX_REPORT_GCSTATS] GC statistics report enabled
[FLX_REPORT_COLLECTIONS] Collection report enabled
[FLX_REPORT_GCSTATS] GC statistics report enabled

Length 65536

[Gc::collect] Program requests collection
[flx_gc:gc_profile_t] actually_collect

actually collected 262135 objects, still allocated: 2 roots, 65540 objects, 1048832 bytes
[Gc::collect] Collector collected 262135 objects
Len = 65536, time = 2.00956

unread,

Mar 7, 2019, 7:18:12 AM3/7/19

to felix google

Hi John,

I have thought for a long time that rvalues and lvalues are confusing, but they serve a purpose in C/C++.

One simple way to resolve this is to make all variables immutable, and only allow mutation of "objects" that is things pointed to. This makes sense because to be mutable, and to have an identity (as in aliasing) something must have an address.

So if you want a mutable counter you would need to do:

var *x = 0

(*x)++

Using a 'C' like notation. You then have pointer to linear(unique) mutable, pointer to immutable and pointer to aliased mutable as different types.

You could reduce the restriction on variables from immutability to linearity, and this would still work I think.

Keean.

--
You received this message because you are subscribed to the Google Groups "Felix Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to felix-languag...@googlegroups.com.
To post to this group, send email to felix-l...@googlegroups.com.
Visit this group at https://groups.google.com/group/felix-language.
For more options, visit https://groups.google.com/d/optout.

John Skaller2

unread,

Mar 7, 2019, 10:33:49 AM3/7/19

to felix google

> On 7 Mar 2019, at 23:18, 'Keean Schupke' via Felix Language <felix-l...@googlegroups.com> wrote:
>
> Hi John,
>
> I have thought for a long time that rvalues and lvalues are confusing, but they serve a purpose in C/C++.
>
> One simple way to resolve this is to make all variables immutable, and only allow mutation of "objects" that is things pointed to. This makes sense because to be mutable, and to have an identity (as in aliasing) something must have an address.
>
> So if you want a mutable counter you would need to do:
>
> var *x = 0
> (*x)++

Actually this IS what Felix does logically, though not with that notation.

In Felix you store a value in a variable using the procedure

_storeat ( &x, v);

which is notated more conveniently by

&x <- v;

or by

x = v;

So the interpretation is that “x is an immutable address” but it is spelled “&x”.
When you write “x” in an expression you really mean “*&x”.

> Using a 'C' like notation. You then have pointer to linear(unique) mutable, pointer to immutable and pointer to aliased mutable as different types.

However Felix does not do that. If you write:

var x = box 42;

then x has the type uniq int, and &x has the type &(uniq int). That’s actually a read-write pointer.
The notation &<x has type &<int which is a read-only pointer (and &>x has type &>int which
is a write only pointer).

Now, this is NOT quite the same as what you said. But I think probably what you said
is correct.

I think the difference is that in Felix you can have a read, read-write, and write-only
pointer to the same variable, and the variable can be of type uniq T, or some other T.

If I understand correctly, Rust variables are immutable unless you say

“mut”

Lots I don’t like about Rust but the way it does unique typing does work.
I actually don’t want full enforcement because of the compilcations.
The idea is that uniqueness is just a device to *allow* the programmer to represent
the two sides of an exclusive ownership contract, that is, a contract where
a functions says “I want to own that location exclusively so i can modify it”
which is currently written

fun f (x: uniq list[int]) => …

for example a uniquely owned list, and the caller says

f (box x)

to specify that the list x is exclusively owned and the ownership is being transfered.

In C++ expressions are (usually) rvalues, and you can use ::std::move(x) to specify
a variable x is to be moved. However I don’t think T&& is actually a type in C++,
that is, if you call a template

template<class T>
void f(T x)

with an rvalue “int” than T is stil int. You have to write T &&x to capture ownership.

Anyhow the problem in Felix is that the “basic” language is like C++ or Algol in that
by default variables are addressable and the pointers can be shared and anyone with
a pointer can read or write. This is true at the moment EVEN IF the type is uniq.
Only the actual variables are tracked by the control flow analysis.

Actually I’m quite happy with how it works for monomorphic types.
in particular a unique value can be passed to a function accepting a non-unique
type, for example

fun f(x:int) => …
f (box 42)

works fine because uniq T is a subtype of T. The problem is with

fun f[T] (x:T) =>

sets T to “uniq int”.

Roughly I think Felix has the same bug as C++ “const”. In other words, it should
be impossible to decide T is uniq anything. The polymorphic function above
should specify that x is not unique. If you want uniq, you should write it:

fun f[T[ (x: uniq T) => …

But I can’t do that because “uniq” is a type combinator (constructor).

Your solution has to be correct: “uniq” should be a property of a pointer,
not the value pointed at. The same as “const”.

> You could reduce the restriction on variables from immutability to linearity, and this would still work I think.

The problem is how to “do it right” without breaking too much code. I changed operator new in Felix
to return uniq. Don’t use it much so no drama. But in principle with the current model ALL
constructors should return uniq.

Felix already has

val x = 1;

Note “val” not “var”. Vals are not immutable, they’re “single assignment”.
But its close enough:

for(i in 1..20) do
val x = 1; // OK, x is reset each iteration
….

At some stage I also had

once x = 1;

to mean x had to be used exactly once. But that doesn’t work.

Felix handles *components* of product types that are unique.
For example

var x = (42, box 42);

this works fine. The first component is non-unique, the second uniq.
Felix tracks the uniq second component by pretending its a variable.
But it can only do that *because* values can be unique.

Sub-components of “variable as address” are also addresses.
Felix does that by overloading projections so there are both
value projections (fetch component) and pointer projections.

—
John Skaller
ska...@internode.on.net

Keean Schupke

unread,

Mar 7, 2019, 10:52:28 AM3/7/19

to felix google

Hi John,

I wonder, 'val' would be 'variable' and 'var' would be a pointer to a mutable 'object' with automatic dereferencing. Val's because they are immutable would never need to be unique (linear). If you make a 'var' unique it would be a property on the pointer because every access (read or write) is a dereference.

The key point is that something mutable must have an address, so therefore it semantically behaves like a pointer to a mutable object, not the mutable object itself. We can see this if we think about object identity and passing by reference.

I guess this is going to break anything with type annotations, but you could provide an automatic type upgrade with a new language version release that rewrites all the current types into the new equivalent, as Apple do. I think this would leave the value level syntax alone?

Keean.

John Skaller2

unread,

Mar 7, 2019, 10:56:41 AM3/7/19

to felix google

Just to be clear: the *problem* with variables being pointers, so they can be

* pointer to movable objects
* pointer to shared objects
* pointer to immutable object

is products. If a variable is a tuple or record or struct type, that the whole product
has one of the above properties. Whereas with uniq type, its component by component.
The value typing splits the variable into sub-variables with distinct types.
This can’t work if the pointer type has the unique/shared/immutable property instead
of the value type.

Indeed, as I understand it, Rust has no products. It has functions with multiple parameters.
Each function has is also a specific kind of product functor. This is a disaster.
Products are fundamental.

—
John Skaller
ska...@internode.on.net

Keean Schupke

unread,

Mar 7, 2019, 11:03:32 AM3/7/19

to felix google

I don't think that's the case, it depends on whether you have deep immutability or not.

Marking pointers is always shallow. So the uniqueness would only apply to the object directly pointed to.

As a product must contain 'objects' not values, then that means the pointers in the product are immutable (you cannot swap objects in and out of the product) but you can mutate them, unless that pointer in turn is immutable/unique etc.

With values there is no such thing as a pointer or a reference, a value must be exactly that, so a product of values is itself an immutable value.

In theory there is no reason why you can't have in-place immutable values in an object, and whether you can change those values depends on the mutability of the object containing them, but of course the values them selves are immutable.

I may not have explained very well, does that make sense?

Keean.

John Skaller2

unread,

Mar 7, 2019, 11:10:05 AM3/7/19

to felix google

> On 8 Mar 2019, at 02:52, 'Keean Schupke' via Felix Language <felix-l...@googlegroups.com> wrote:
>
> Hi John,
>

> I wonder, 'val' would be 'variable' and 'var' would be a pointer to a mutable 'object' with automatic dereferencing. Val's because they are immutable would never need to be unique (linear). If you make a 'var' unique it would be a property on the pointer because every access (read or write) is a dereference.
>
> The key point is that something mutable must have an address, so therefore it semantically behaves like a pointer to a mutable object, not the mutable object itself. We can see this if we think about object identity and passing by reference.
>
> I guess this is going to break anything with type annotations, but you could provide an automatic type upgrade with a new language version release that rewrites all the current types into the new equivalent, as Apple do. I think this would leave the value level syntax alone?

Well Felix itself, and me, are the only users. And I could always invent a new language.

I certainly agree with the idea mutable means addressable, at least underneath.
In Felix vals cannot be addressed. They can change, but they’re not really mutable.
Its more like, in a loop, the val is “reset* to a new value, rather than modified.

in fact Felix has a loop:

rfor i in ….

The ‘rfor” loop is actually a tail recursive procedure, so any vals in the loop body
really are immutable, that is, they’re like “let” variables in Ocaml. The “reset”
is actually assigning to a different address in a different frame because of the
recursion, unless the compiler self-tail-rec optimises the loop.

Sigh. Its actually an accident that I discovered vals are not immutable,
they were meant to be.

The key property of vals is lazy evaluation. That is, the initialiser of a val
is allowed to replace any use of the val. Whereas vars are guarranteed
to be eagerly evaluated, i.e. when control flows through them.

the problem in any programming language is most “things” have multiple
factors each with multiple options, but you cannot give the programmer
all the combinations or their brains would explode. So you have to pick
the most useful ones. So in Felix we have “val” and “var”.

There is also a “const” but it is a C binding .. to an expression which need not
be const. Again, it was meant to be const:

const x = “42”;
const y = “::std::time()”; // woops, not const at all :-)

“cexpr” would be a better name for it.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Mar 7, 2019, 11:25:29 AM3/7/19

to felix google

> On 8 Mar 2019, at 03:03, 'Keean Schupke' via Felix Language <felix-l...@googlegroups.com> wrote:
>
> I don't think that's the case, it depends on whether you have deep immutability or not.
>
> Marking pointers is always shallow. So the uniqueness would only apply to the object directly pointed to.
>
> As a product must contain 'objects' not values, then that means the pointers in the product are immutable (you cannot swap objects in and out of the product) but you can mutate them, unless that pointer in turn is immutable/unique etc.

But products are values. For example

(10, 20)

is a coordinate on a screen. Its a single value.

>
> With values there is no such thing as a pointer or a reference, a value must be exactly that, so a product of values is itself an immutable value.

In Felix, it works really well: the type above is int * int. Its a value. Its immutable.
However if you put it in a variable its mutable:

var z = (10,20);
&z.1 = 21; // z is now 10,21

That’s because the projection 1 is overloaded so it applies to values AND pointers.

Now consider:

var z = (10, box 20);

Here the type of the value is

int * uniq int

If you do this:

var u = z;

its fine! And now you can do this:

var one = z.0;

That’s fine! But this is an ERROR:

var two = z.0;

and Felix catches it. The point is, its NOT the variable that’s uniq or not,
and that’s why the above works. you can do this:

&z.0 <- 99;

and that’s NOT an error. The component was dead because we moved its
value to “one”. So now you can assign it again.

Felix keeps track of the state of every component of every variable
but it ignores address taking because that’s too hard (except
for the LHS of &x <- v which it knows is an assignement).

The point is, values are “immutable” because they don’t HAVE an address.
But if you store a value into a variable OR if you construct a value
on thje heap, you get an address, and then you can (potentially) modify
them, unless the type system disallows it.

uniq is not handled by the type checker, its a completely separate control
flow analysis done long after type checking, overload resolution,
and monomorphsation, in fact it is done after inlining and optrimisations
as well (which means the error diagnostic are useless .. but that’s another issue :)

—
John Skaller
ska...@internode.on.net

Keean Schupke

unread,

Mar 7, 2019, 11:28:16 AM3/7/19

to felix google

My simple system is this (you may notice it works pretty much like arrays)

Write an immutable value like this:

let x = 3

We write an object like this:

let x = [3]

print x[0]

You can mutate the contents (change the 3 for another value).

We write a 'value' product like this:

let x = (3, 4, [5])

So 3 and 4 are constant, you can swap the contents of the third element for a different value, but not change the identity of the object (the address pointer is not changeable)

You write an object product like this:

let x = [3, 4, (5, 6)]

So you can mutate 3 and 4, but not 5 or 6, but you can swap out the pair (5, 6).

You only need one declaration:

let x = 3

let y = [4]

All the information is carried in the type of the variable.

Only drawback is that you must de-reference an object to get the value:

print(x, y[0])

Perhaps there should be a short-handed for the first element of an object?

My view is that if you get this right, it should be simple, but that's one of the hardest things in language design to get right. Lvalues/rvalues are a mess. The above is my best effort so far.

Keean.

Keean Schupke

unread,

Mar 7, 2019, 11:40:16 AM3/7/19

to felix google

Regarding:

> &z.0 <- 99;

Surely this should be an error because 'z' is a value, swapping an object for the value 10 should be illegal as it is not boxed.

If we think of it like this:

A value is a sequence of bytes, like an int32 is 4 bytes. Let's say we have the product of two integers, that's a sequence of 8 bytes. If this is immutable is may not have an address, it could be in a register, there is any no way to address part of it and change its contents.

If we have a pointer to those 8 bytes that says they are immutable there should be no way to change any of those bytes.

If instead we have a product of two integers and an object, then on 64 bit system the object will be an 8byte pointer. The product would be 16 bytes and the whole thing would be immutable, you cannot change the pointer in the product to point to another object, but we can mutate that object (if this pointer is not also immutable).

I am not sure what Felix is doing differently that would allow the number 10 to be replaced in that product, does it store all values externally from the product?

Keean.

John Skaller2

unread,

Mar 7, 2019, 12:03:04 PM3/7/19

to felix google

> On 8 Mar 2019, at 03:28, 'Keean Schupke' via Felix Language <felix-l...@googlegroups.com> wrote:
>
> My simple system is this (you may notice it works pretty much like arrays)

Yes, I like it, and its pretty similar to Felix actually.

>
> Write an immutable value like this:
>
> let x = 3
>
> We write an object like this:
>
> let x = [3]
> print x[0]
>

In Felix roughly you would use a pointer:

val x = 3;
val x = new 3;
print *x;

For the same 3 statements.

> You can mutate the contents (change the 3 for another value).
>
> We write a 'value' product like this:
>
> let x = (3, 4, [5])

val x = (3,4,new 5)

The type is int * int * &int

> Only drawback is that you must de-reference an object to get the value:
>
> print(x, y[0])
>
> Perhaps there should be a short-handed for the first element of an object?

In Felix, for the x above, to get the third component it would be

*(x.2)

or

x*.2

and to assign to it

x.2 <- 42

No short hand for first component. It looks like our solutions are isomorphic.
However Felix has vars as well. With a var ALL the components are assignable.
The translation is simple, so vars don’t add anything:

var v = 3,4, new 5;
val x = &v;

however none of this handles move/unique .. :-)

We have mutable and immutable and a way to get from immutable to mutable.
Also when you use a var in an expression .. it is now a value and immutable.

> My view is that if you get this right, it should be simple, but that's one of the hardest things in language design to get right. Lvalues/rvalues are a mess. The above is my best effort so far.

Yes, its nice, and AFAICS the same as Felix except for notation. The problem is how to extend it
to handle moves.

BTW: Barry Jay once said to me, that the way a language handles variables
*characterises* the language.

In Ocaml, all variables are immutable. but records can have mutable fields.
References ML style are derived from that:

type ‘a ref = { mutable contents: ‘a }

i.e. a reference is just a record containing a mutable field. However, Ocaml BOXES
(almost) all values, and they’re immutable. So “ref” is immutable, it is of course
a pointer due to boxing. Clever. In the core language, mutable fields are the only
departure from purely functional semantics.

The problem with “move” semantics is that languages don’t have the syntax which allows
the variable whose contents are moved out to “go out of scope”. In fact, its worse:
if you have no product functor, the only thing you can do with functions is compose
them linearly, and, all values are then unique and thus mutable. Its actually products
that allow “expressions” in the form of trees, and variables exist ONLY to allow
sharing. A variable that only gets used once isn’t necessary, just replace the one
use with its initialiser.

So making “uniq” variables is in some sense oxymoronic.

—
John Skaller
ska...@internode.on.net

Keean Schupke

unread,

Mar 7, 2019, 12:19:45 PM3/7/19

to felix google

I am not sure I see the problem. If a unqiue pointer is stored in a product, you can only mutate the object as much as you like, but you can only copy/move the pointer itself once. The type system would have to enforce that nothing ever tries to copy/move the pointer again.

With a mutable product you could swap the pointer with a different pointer as an additional operation not possible with an immutable product.

Keean.

John Skaller2

unread,

Mar 7, 2019, 12:20:51 PM3/7/19

to felix google

> On 8 Mar 2019, at 03:40, 'Keean Schupke' via Felix Language <felix-l...@googlegroups.com> wrote:
>
> Regarding:
>
> > &z.0 <- 99;
>
> Surely this should be an error because 'z' is a value, swapping an object for the value 10 should be illegal as it is not boxed.

In the example, z is a var:

var z = …

This means, the name of the variable is spelled &z, and plain z actually means the variable
dereferenced.

The notation is hard to explain. The variable is an address. Its a sop to C programmers
that the spelling ”z” means what is stored at the address whilst “&z” actually means
the address itself.

The “&” there isn’t an operator. Its part of the name. Of course its called
“the addres of operator” but you can only use it with a variable.

I know its confusing. Your notation reflects the reality better but mine is closer
to conventions used in C and Algol.

In Ocaml, if x is a reference than

!x

is the contents, and

x := v

is how you assign to it.

> I am not sure what Felix is doing differently that would allow the number 10 to be replaced in that product, does it store all values externally from the product?

Felix does exactly the obvious thing. When you write:

var z = (10,20);

then its two 32 bit values on an x86_64 at consecutive addresses and both are mutable.
If you write:

val v = (10,20);

then v is just a name for the pair (10,20). There may or may not be addressable store.
Typically there will be store somewhere, but it could be both values are in registers.
v can act like a C macro.

If you say

new 42

then what you get is a pointer to heap allocated store containing 42. So there
are TWO ways to get mutable store: using a named variable or using the heap.
Actually there are other ways, using library functions, but “new” is a system intrinsic.

Hope that makes sense. There’s no magic. No lvalues, or, perhaps more
precisely only variables are lvalues, there are no lvalue expressions. ****

**** well actually Felix does allow

&*&x —> &x

but its really an optimisation.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Mar 7, 2019, 12:40:41 PM3/7/19

to felix google

unread,

Mar 8, 2019, 2:49:56 AM3/8/19

to felix google

That makes sense. I still don't agree about the subtyping because unique should be a property of the container. A value cannot be 'unique' because it's immutable.

let t = Uniq 3

y = t

z = t // error 't' is empty

t = 4

t = 5 // error if linear type 't' already full, okay if affine type.

Note: linear types must be used exactly once, affine types can be used zero or one times.

I don't like this syntax, because uniqueness is a property of the pointer (or container) and not the value. It does not make sense on the RHS. Better would be:

let t : uniq int = 3

t = 4

It's clearer what is going on. Even better in my opinion would be:

let t : uniq [int] = [3]

t[0] = 4

As now the container is explicit, we could make unique an operator again to get rid of the type annotation:

let t = uniq [3]

t[0] = 4

This is probably my preferred option for clarity.

Cheers,

Keean.

John Skaller2

unread,

Mar 8, 2019, 10:33:49 AM3/8/19

to felix google

> On 8 Mar 2019, at 18:49, 'Keean Schupke' via Felix Language <felix-l...@googlegroups.com> wrote:
>
> That makes sense. I still don't agree about the subtyping because unique should be a property of the container. A value cannot be 'unique' because it's immutable.

I don’t disagree, I was just telling what the Felix implementation does.
In functional programming there are no containers. Even a list is a value.
Although a unique list doesn’t make sense, it does make sense
for the representation, which clearly does consist of objects.

It’s the representation that matters, uniqueness is only of any interest
because it supports optimisations.

—
John Skaller
ska...@internode.on.net

Keean Schupke

unread,

Mar 8, 2019, 10:59:57 AM3/8/19

to felix google

If something is a "functional" language variables should be immutable, and hence there would be no linear types.

We can introduce mutable state to a functional language using the state monad. This works in a similar way to my examples, the variable is still immutable and contains a reference type:

f = do

x <- newRef 13

x has type "Ref Int" not int. newRef would be in the state monad having type "a -> State (Ref a)"

The problem with the state monad is that monads don't compose (with other state spaces). We can solve this with "Freer Monads" as in "Eff" which provides a row-polymorphic effects monad.

So I think you end up at a similar place for n functional languages too.

Keean.

John Skaller2

unread,

Mar 8, 2019, 11:08:15 AM3/8/19

to felix google

So this is the situiation.

First, Felix started off as a C++/Algol style language with a simplifying rule:
all types must be first class (regular/copyable etc etc).

Of course, uniquness typing breaks the rule.

What I’m still puzzling over is that it works well for monomorphic types.
This is the definition in the library:

https://github.com/felix-lang/felix/blob/master/src/packages/unique.fdoc

There is also support in the unification algorithm which makes Uniq T
a subtype of T, and *special weird hacks* for pointers.
The uniqueness checking is done late by control flow analysis.

With this, we can do this:

https://github.com/felix-lang/felix/blob/master/src/packages/ucstring.fdoc

The synopsis is:

ctor : string -> ucstr
ctor : +char -> ucstr (unsafe)
proc delete : ucstr
fun len : ucstr -> size
fun set : ucstr * int * char -> ucstr
fun reserve : ucstr * size -> ucstr
fun append : ucstr * ucstr -> ucstr
fun append : ucstr * &ucstr -> ucstr doesn't consume second arg
fun + : ucstr * ucstr -> ucstr
fun + : ucstr * &ucstr -> ucstr doesn't consume second arg
proc += : &ucstr * &ucstr -> ucstr modifies first arg, doesn't consume second
fun erase : ucstr -> slice[int] -> ucstr
fun insert : ucstr -> int * ucstr -> ucstr inserts second arg into first at pos
fun dup : ucstr -> ucstr * ucstr destructive dup
fun dup : &ucstr -> ucstr * ucstr nondestructive dup

Now what makes this work is

// abstract representation
private type _ucstr = new +char;

// make it uniq
typedef ucstr = uniq _ucstr;

The ucstr is a standard C string, i.e. a NTBS, wrapped with uniq,
however the fact it is a NTBS is hidden, the user cannot get at
the char pointer: the type +char is a C char array, but the new operator
on types provides abstraction: ONLY the code inside the UniqueCString
class knows its a char pointer.

So, a ucstr is a unique VALUE. However the *representation* is a pointer.

The key point is some operations like appending can use mutators due
to the uniqueness. The ustr is ALWAYS unique, its impossible to share
the pointer (except by cheating).

This type was an experiment, its not an efficient string representation.
But the point is IT WORKS.

So Keean is both right an wrong. Unique does make sense for values.
The class above proves it. But also, it doesn’t make sense except that
the representation is a pointer, and uniqueness is used to allows
mutations on the representation which are referentailly transparent
IN OTHER WORDS the public type acts like a value in that the
mutations are hidden.

The core problem is that for *arbitrary* types a uniq type combinator
is nonsense. But its the only way to actually track which pieces
of a structured value like a product move instead of copy.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Mar 9, 2019, 6:42:00 AM3/9/19

to felix google

A possible solution is to simply disallow the replacement of a type variable with
a uniq type.

That fixes

fun dup[T] (x:T) => x,x;

So if you want to call a poly fun on a uniq type you have to unbox it first.
if you don’t like that, you can do a specialisation:

fun dup[T] (x: uniq T) => x.copy, x;

This has no impact on monormophic code. Also, this would still work:

fun f(x:T) => ….
fun g(y: uniq T) => f x;

because uniq T is a subtype of T. What that means is that
the unboxing operation will be done automatically with a coercion.

The HARD bit: products.

BTW: at present, pattern matches cause an issue. Variants can’t
have uniq types as arguments. This is not because of logic but
because I haven’t figure it out. :-) Basically variant arguments
are immutable *unlike products* where you can write to a
product that is stored in a variable or on the heap.

Perhaps more interesting .. pattern variables are, I think,
automatically uniq, because they’re not allowed to recur
in patterns, i.e. patterns are intrinsically linear. But match
*handlers* can use a pattern variable more than once.
Hmmm.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Mar 9, 2019, 6:49:09 PM3/9/19

to felix google

> On 9 Mar 2019, at 22:41, John Skaller2 <ska...@internode.on.net> wrote:
>
> A possible solution is to simply disallow the replacement of a type variable with
> a uniq type.
>
> That fixes
>
> fun dup[T] (x:T) => x,x;

So actually .. the rules I propose are already implemented.

fun f[T](var x:T) => x,x;

begin
var y = box 42;
var z = f y;
println$ z;
end

This works. But remove the var, and it fails. I checked the generated
code. The coercion is being inserted. But the function is inlined and
the val lazily evaluated:

z = unbox y, unbox y;

which of course uses y twice. Actually this is safe but only in this case
because there’s no mutation.

flow: normal= processing for y<69465> := (box<69347>primfun[int] 42);
1@3 apply((prj0:RWptr((int^2),[]) -> RWptr(int,[])), &z<69466>ref) <- (y<69465>varname :>> int);
1@3 ()
flow: normal= processing for apply((prj0:RWptr((int^2),[]) -> RWptr(int,[])), &z<69466>ref) <- (y<69465>varname :>> int);
Once error: Using uninitialised or already used once variable
(69475:->y)
Variable y defined at
/Users/skaller/felix/z.flx: line 4, cols 3 to 18
3: begin
4: var y = box 42;
****************
5: var z = f y;

In instruction apply((prj1:RWptr((int^2),[]) -> RWptr(int,[])), &z<69466>ref) <- (y<69465>varname :>> int);

Notice, the code reads:

&z.0 <- y.int
&z.1 <- y.int

so there are two uses of y.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Mar 9, 2019, 7:23:47 PM3/9/19

to felix google

Here’s one I do not understand:

This passes:

fun erase (var x:ucstr) (sl:slice[int]) : ucstr =>

This fails:

fun erase (var x:ustr) (sl:slice[int]) : ustr =>

Its exactly the same string type in both cases.

// abstract representation
private type _ucstr = new +char;

// make it uniq
typedef ucstr = uniq _ucstr;

// abstract representation

private type _ustr = new +char;

// make it uniq
typedef ustr = uniq _ustr;

The second class is supposed to become a counted string but so far
I have just copied the code and changed the type name.

0@427 return erase'2<18206>closure;
0@427 (69789:->x)
flow: function return
Once error: Once variables unused!
(69789:->x)

The compiler is actually correct. Its a HOF, so the code is equivalent to

fun erase (x) => return (fun (sl) => …);

In other words, x is in fact not used. There’s no control flow using it.
The use is inside the returned closure. The flow analyser cannot handle this
case at the moment. There is an implicit address taken in the closure to
the frame of erase containing x and thus indirectly a pointer to x itself.

The question is .. why does this *correctly* fail .. when the previous identical
function does not. Everything works if I comment out the counter string
class.

The actual function is ALSO problematic:

fun erase (var x:ustr) (sl:slice[int]) : ustr =>
match sl with
| Slice_all => set (x,0,char "")
| Slice_from idx => set (x,idx, char "")
| Slice_from_counted (first,len) => strmov x (first,first+len)
| Slice_to_incl incl => strmov x (0,incl)
| Slice_to_excl excl => strmov x (0, excl - 1)
| Slice_range_incl (first, last) => strmov x (first, last+1)
| Slice_range_excl (first, last) => strmov x (first, last)
| Slice_one pos => strmov x (pos, pos+1)
;

Here x is correctly used in all branches. However Felix doesn’t
know that the matching is exhaustive and adds an MATCH_FAILURE
branch at the end .. which doesn’t use x. You can prevent this I think
with a final wildcard branch .. which also wouldn’t use x unless you wrote

| _ -> C_hack::ignore(x)

However this isn’t the issue. Note that with a HOF there REALLY is a problem.

var closure = erase x;
closure (1..20);
closure (1..10);

The two calls cause x to be used twice.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Mar 10, 2019, 9:36:51 AM3/10/19

to felix google

> On 10 Mar 2019, at 10:49, John Skaller2 <ska...@internode.on.net> wrote:
>
>
>
>> On 9 Mar 2019, at 22:41, John Skaller2 <ska...@internode.on.net> wrote:
>>
>> A possible solution is to simply disallow the replacement of a type variable with
>> a uniq type.
>>
>> That fixes
>>
>> fun dup[T] (x:T) => x,x;
>
> So actually .. the rules I propose are already implemented.

>
> fun f[T](var x:T) => x,x;
>
> begin
> var y = box 42;
> var z = f y;
> println$ z;
> end
>
> This works. But remove the var, and it fails. I checked the generated
> code. The coercion is being inserted. But the function is inlined and
> the val lazily evaluated:
>
> z = unbox y, unbox y;
>
> which of course uses y twice.

Fixed. I was checking the argument type for uniquness (not just the whole
type but all its components as well eg int * uniq int is not uniq but contains
a uniq type).

But this is wrong:

coerce[uniq int -> int] (box 42)

has type int. I needed to check the types of all sub-expressions.

Fixed. The above works now without the var. Uniq part forces
eager evaluation. There may be other cases i haven’t found.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Mar 15, 2019, 10:57:51 AM3/15/19

to felix google

Felix now chains together nominal type coercions:

//////
struct XX { a : int; };
struct YY { b: double; }
struct ZZ { c: string; }

// XX -> YY
supertype YY (x:XX) => YY (x.a.double + 0.76);

// YY -> ZZ
supertype ZZ (x:YY) => ZZ (x.b.str + "!!");

proc showYY (x:YY) { println$ "YY.b = " + x.b.str; }
proc showZZ (x: ZZ) { println$ "ZZ.c = " + x.c; }

var xx = XX (23);
showYY xx;

showZZ xx;
//////////

Here XX is a subtype of YY which is a subtype of ZZ.
To pass the XX value xx to showZZ, which has parameter of type XX,
we have to coerce XX to YY and then that to ZZ.

If there is more than one coercion chain, Felix picks one of the
shortest ones. The program must ensure all coercion chains
are semantically equivalent.

The purpose is that, if you say decide

tiny < short < int < long < vlong

then a function accepting a long will accept a tiny, short,
int, or long. If you want you can add extra coercions
to bridge multi-step coercions for performance reasons but
the semantics must be the same.

Overloading in Felix at the moment DOES NOT CARE ABOUT THE LENGTH
OF THE COERCIONS CHAINS. For example, if you have an argument
of type short, and functions of type int and long, it will pick the function taking
the int, but not because it is closer, but because int is a subtype of long,
and so is more specialised. It picked the shortest chain.

But consider A is a subtype of B is a subtype of C is a subtype of D,
and A is also a subtype of X, then if you have an X and a D function
and call with an A argument it is ambiguous because neither of X and D
is a subtype of the other. It won’t pick X, just because less coercions
are required to get there.

Not that still, at the moment, only *monomorphic* nominal types can be
handled. Polymorphic nominal types should be trivial provided
they have the same number of type variables in the same order eg

supertype P[T] (x:B[T]) => …

HOWEVER consider the question: is B[int] a subtype of P[long]??

Before we can answer that we have to know if B[int] is a subtype of B[long],
in other words, B is *covariant*. Here is a counter-example:

struct B[T] { f:T -> string; }

In this case B is contravariant because -> is contravariant in its first argument,
the function domain (and covariant in the second). And, if that were instead:

struct B[T] { f: T -> T; }

then B is invariant, since it has to be both co- and contra-variant. Most P/L’s that
handle variance require the compiler to calculate it if the data structure is visible,
or specify it if it is abstracted, eg Ocaml.

At least to start I think this is too hard, so we will have to insist in invariance.
In this case, we could actual just write the supertype rule without any
type paraeters at all. The existing algorithms would then “just work”.

—
John Skaller
ska...@internode.on.net

unread,

Mar 17, 2019, 5:59:27 AM3/17/19

to felix google

So now it seems to work, and now:

supertype vlong: long = "(long long)$1";
supertype long: int = "(long)$1";
supertype int : short= "(int)$1";
supertype short : tiny = "(short)$1";

is in the library. This means mixed mode *signed* arithmetic now works
using integral promotion. Note, Felix still ties to add tiny’s together
so its not quite the same as C (but the calculation is done in C anyhow).

It’s not clear if unsigned integers are subtypes.

More precisely, unsigned integers are nonsense in the first place.
They only exist because of limitations on early computers when C was designed.

For floats, the rule is counter intuitive:

ldouble < double < float

The biggest float with the biggest range and most precision is a subtype of smaller floats.
The reason is: if you have an iteration with a convergence critreria using floats,
then you can safely use a double, throwing away precision .. and get the same result.
It doesn’t work the other way.

All this really leads to a difficulty question: what the heck IS a subtype anyhow??

You are not going to like the answer!! The correct answer is: its any dang thing you please,
provided the subtypes chosen are consistent. [The technical term is “coherent”].

One coherence rule is transitivity.

So now consider: could we have float < double < ldouble?

I think probably, yes. The integer rules is based on the “isa” notion,
that is, the denotational semantics of small integers say the values are
a subset of bigger ones, so they’re subtypes.

On the other hand, a record with extra fields over another is a subtype,
so the coercion throws away information. So we have two quite different
rules, one throwing away information, and one based on preserving it
via an embedding.

Rules have consequences. For example

&<int is a subtype of &<long

because read-only pointers are covariant and int is a subtype of long.

Is this sustainable?????

Certainly for Abstract pointers it’s no problem, but for machine pointers,
I suspect it cant work. For abstract pointers, you just fetch the int
and promote it to long. There’s no way to actually implement that
with machine pointers, because static casts just change the type,
which is *always* wrong unless the types are layout compatible,
which int and long are not. The pointers ARE, but the deref on
a non-matching pointer is undefined.

So Felix says machine pointers are invariant (all kinds).

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Mar 24, 2019, 7:26:49 PM3/24/19

to felix google

> On 17 Mar 2019, at 20:59, John Skaller2 <ska...@internode.on.net> wrote:
>
> So now it seems to work, and now:
>
> supertype vlong: long = "(long long)$1";
> supertype long: int = "(long)$1";
> supertype int : short= "(int)$1";

> supertype short : tiny = "(short)$1”;

> So now consider: could we have float < double < ldouble?

But the real question is: should we have ldouble < double < float.

Because that is the “correct” subtyping rule. The C rule is actually wrong.

First, lets review the integer rule: given functions f(T * T) where T is integral,
a mixed call like f (int, long) selects the smallest type T such that T * T matches.
That’s long * long. So the calculation is carried out with the LEAST precision possible
because subtypes have less precision that their supertypes.

Now consider f (T * T) with the correct subtyping rule for floats. Here, the subtyping
coercion *demotes*, that is, throws away precision. A call goes to largest type T
such that T * T matches, so that the calculation is carried out with the MOST
possible precision. This isn’t what you think:

f ( 1.0, 1.0L ) // doule, ldouble

is carried out at double precision, by demoting the ldouble to double. In C, that would
be done at ldouble precision. So basically if you have a high speed single precision
float calculation and feed in a double value, it is demoted to single precision.

The justification is that, promotion introduces precision that doesn’t exist in the
data and this can lead to false results. Typically we have numerical algorithms
which are repeated to refine a result until a convergence condition is met.
A loop that converges in float domain may run forever in double domain because you’re
trying to get precision that simply doesn’t exist.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Mar 25, 2019, 11:54:39 PM3/25/19

to felix google

Felix has too many confusing integer types. I’m tutorialing compact linear types.

My first observation is a long standing one: with compact linear types
the exact unsigned integers uint8,16,32,64 are not required,
the types 2^(8^n) for n=1,2,3 will do instead, i.e. uint8 and 2^8
have the same representation anyhow, and since uint8 is abstract,
we could just define maths on 2^n instead.

However more nastier is the fact Felix allows compact linear types
to be coerced to int. But not other integer types. But cl_t is uint64_t
in the RTL.

I’m thinking to do this:

uintptr, size are synonyms
intptr, ssize, ptrdiff are synonyms

address is distinct but the same size as both

Actually it should be called baddress because derefs return bytes.
Which is yet another integer type. :-)

We also should have waddress which returns waddresses, i.e. an aligned
pointer to a machine word.

These assumptions are safe on all modern processors. The distinction between
pointers and machine word sizes was necessary on the x80286 family
due to segmentation, on MS DOS and with Windows 3.1. AFAIK, all modern
processors are either 32 bit linear addressing or 64 bit linear addressing.
The x86_64 is actually a biatch because int is 32 bits even though pointers
and registers are 64 bits.

Compact linear types also have a problem: they’re universally 64 bits (even on
32 bit platforms). But we should use shorter lengths if we can for space efficiency,
in particular bool is 64 bits on a 64 bit platform in Felix, but only a single byte for
most C++ implementations, forcing me to introduce cbool as well. For values
it makes no difference but for pointers it does.

Felix is designed for desktops, laptops and servers .. not microcontrollers,
but it should work on most modern phones and tablets, and game consoles,
since these generally have either x86 or ARM processors which are standard
linear addressing machines. Code pointers don’t matter because Felix doesn’t
have them anyhow.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Mar 27, 2019, 7:53:10 AM3/27/19

to felix google

This now works:

/////////////
typedef point = (x:int, y:int);
typedef coloured_point = (x:int, y:int, colour: int);
typedef elevated_point = (x:int, y:int, z:int);

fun step_right[T] (a : (x:int, y:int | r2:T)) => //NOTE label r2 here
//let r = (a without x y) in
(x=a.x+1, y=a.y | a.r2)
;

var cp : coloured_point = (x=0, y=0, colour=42);
var ep : elevated_point = (x=0, y=0, z=100);

println$ cp.step_right._strr;
println$ ep.step_right._strr;
////////////

The change is that you can now name the type variable in the polyrecord
parameter to get all the fields (as an opaque value) other than the ones
listed so that a.r2 above is equivalent to the commented out variable r.

The extension ONLY works for fetching values, that is, you cannot
assign to or take the address of a.r2 even if the parameter ‘a’ is
declared var. The reason is that after monomorphisation, the fields
actually in r2 need not be contiguous, they will be sorted along
with the fields x and y. Although assignment could be implemented,
it is hard and not worth the effort. What is implemented is just syntactic
sugar for something you could already do.

Some bugs may be found. The implementation required adding the identifier
to the bound type, however it is actually an alias and not part of the type.
So code doing unification or type comparisons has to ignore it.
The field is stripped out during monomorphisation, indeed, polyrecords
are eliminated during monomorphisation.

The primary hassle is that the polyrecord constructor function is used to eliminate
polyrecords, by shifting known fields to the left of the vertical bar | but it cannot
do this if the RHS of the | has a label.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Apr 8, 2019, 12:28:49 AM4/8/19

to felix google

I just turned on the unsigned 128 and 256 bit ints.
I have split the library up a bit, separating float, complex, int, quaternion, and random.

I want to try adding 128 and 256 bit signed ints .. using Felix to do it, instead of C++.
Addition is trivial .. the operations are identical (signed 2’s complement and unsigned
use exactly the same machine instruction to do addition).

Multiplication can be done by testing the sign, multiplying the absolute values,
and then negating if necessary. Signed ops can give the wrong answer if
the positive values are too big or the negatives too small. The nasty cases
are things like

1 * MINVAL
2 * (half-minal)

The result should be MINVAL. MINVAL is a 1 bit followed by all 0s. The two’s
complement is the complement + 1. The complement is a 0 bit followed by all 1’s.
Add 1, and we get a 1 bit followed by all 0’s again. Multiple by 1, value unchanged.
Now 2’s complement again, and we get the right answer.

So I should prove a theorm that

s1 * s2 = sgn(s1) * sgn(s2) * abs(s1) * abs (s2)

IF

abs(s1) x abs(s2) is representable

The * ops in the first line are computer operations, the x in the second line
is the mathematical multiplication.

The idea is roughly that if the absolute values considered as unsigned values
are representable as a *signed* value, then the formula produces the
correct result. The NASTY case is when the result is MINVAL, which is
representable, but there is no corresponding representable absolute
value.

Division is harder. We have to use the same trick: do the division of absolute
values, then adjust based on the sign. C rounds the dividend towards zero.
The remainder has to be adjusted accordingly.

The funny thing is exhaustive testing is possible by applying the algorithm
generically to unsigned char. The number of cases is small enough to check
every case. If it works for 8 bits, it has to work for any number of bits 8 or above.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Apr 10, 2019, 12:48:27 AM4/10/19

to felix google

So this has exposed some interesting issues.

The subtype coercion registryis expressed with abstract types if necessary.
When the abstract types are downgraded to their representations, there’s
a problem: the representation need not be a nominal type. The registry
can only handly monomorphic nominal type subtyping. So the downgrade
can fail.

But there’s worse. Coercions are only implemented late, after downgrading.
If the downgrade succeeds, we end up coercing the representation types
instead of the abstract types. This will give the wrong result if the downgrade
is not structure preserving. And it isn’t, in the case of

type int128 = new uint128;
type int256 = new uint256;
supertype int128: int64 = “uint128_t($1)”;
supertype int256: int128 = “uint256_t($1)”;

The problem is that the promotion of unsigned integers just adds
zero bits at the high end, but to promote signed integers,
we have to do sign extension. The problem is the second case,
because int128 downgrades to uint128, and thus the promotion
is then not sign extending.

The problem is hard to solve because the coercion solver sees
only downgraded types, i.e. unsigned integers for sizes 128 and 256,
and the coercion rule for them is wrong for signed types.

To be explicit .. the coercion is represented as

coerce (x, t)

and is not replaced by the function which would do the right job,
but one that does the wrong job, because there really is a coecion
for the unsigned types as well and its different.

The solution is to remove the coercions *before* removing abstract
types. Also care is needed because C++ is going to see the unsigned
types here, so we have to make sure we don’t just use a cast as
I did (we actually have to call a sign-extension promotion function
on the unsigned int).

I have no idea what happens if I expand the coercions first,
it might break something else.

HOWEVER I’m not getting the wrong result! I’m getting a CRASH:

flxg: frontend ........................... : 0m33.9
flxg: bind ............................... : 0m0.1
[flx_opt]; Polymorphic Uniqueness Verification .............0s
[flx_opt]; Pre-monomorphisation user reductions ............0s
[flx_opt]; Finding roots ...................................0s
[flx_opt]; Monomorphising ..................................0s
[flx_opt]; Verifying typeclass elimination .................0s
[flx_opt]; Simplify requirements ...........................0s
[flx_opt]; Downgrading abstract types to representations ...0s
[flx_opt]; Verifying abstract type elimination .............0s
[flx_opt]; Removing unused symbols .........................0s
[flx_opt]; Uncurrying curried function .....................0s
[flx_opt]; Generating wrappers (new) .......................0s
[flx_opt]; Set SVC funs inline ............................0s
[flx_opt]; Inlining ........................................0s
[flx_opt]; Post-monomorphisation user reductions ...........0s
[flx_opt]; Remove unused symbols ...........................0s
[flx_opt]; Expanding Coercions (new) .......................0s
[flx_opt]; Stripping Lambdas (new) .........................0s
[flx_opt]; Eliminate dead code .............................0s
[flx_opt]; Do stack call optimisation ......................0s
[flx_opt]; Mark heap closures ..............................0s
[flx_opt]; Very Late Uniquness Verification ................0s
[flx_opt]; optimisation pass complete ......................0s
flxg: optimse ............................ : 0m0.1
flxg: lower .............................. : 0m0.0
flxg: codegen ............................ : 0m0.0
55
-55
110
false
true
-3025
Calculating dividend
Shell terminated by signal SIGILL

and I know why: there’s an infinite loop due to infinite tail recursion.
Hmmm.

—
John Skaller
ska...@internode.on.net

Reply all

Reply to author

Forward