Web site

John Skaller2

unread,

Dec 19, 2020, 12:59:13 PM12/19/20

to felix google

It’s not a pretty sight site but the old web site is running again on a new server,
thanks to Async P/L for sponsoring the AWS instance!

http;//felix-lang.org

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

May 6, 2021, 11:11:51 AM5/6/21

to felix google

The nurtl repo now contains a hopefully working real time RTL.
Closures aren’t handled just yet but I know how to handle two core cases.
New pthreads can be spawned with separate active sets. Concurrent access
to a single active set is not implemented because I have not decided on termination
conditions yet.

Before doing that I want to get async working. The current Felix machinery sucks.
It’s messy and not real time.

Here’s an initial thought on how to make it work, thinking of an alarm clock.

In order to set an alarm, a clock must first be constructed.
It is an object which is running in a separate pthread.
Then, a client calls a method on the clock to obtain an asynchronous
channel endpoint.

Now, a write on the channel will stall the fibre making the request until the time
written on the channel has been reached. Then the write is satisfied in the usual way.

However we cannot use a normal write or normal channel because the clock thread
is asleep in a condition variable. So we need a special async write SVC which
does a normal write on the channel and ALSO sends the clock conditional variable
a signal, waking it up.

When the clock wakes up, it first checks a run flag. If the flag is false,
the clock object is destroyed, including all channel endpoints, and
the thread exits. It’s not clear how to set the flag. Perhaps it is a reference
count of those holding access to the clock. Not sure. Needs more thought!

If it is all OK, the clock thread then puts requests into a priority queue.
to do this, it reads from the request channel until it is empty.
this requires a special read operation (a normal read will stall if a channel
has no writers).

Once empty it checks the time and pops the top of the priority queue
until it reaches the end or finds a request that hasn’t timed out yet.
Now, for all the fibres that are ready to run, they get put back on their
owning active sets.

Finally the thread goes to sleep again, setting the timeout to wake it
up when the next alarm is due, or, forever if there is none.

An asynchronous channel is actually two objects then: an ordinary channel,
plus a routine which can send a signal to the condition variable.
An async write does an ordinary write and sends a signal. That’s it!

IMPLEMENTATION DETAIL

It’s possible to set things up so all channels have a signal method,
Then the handling is totally transparent, an ordinary write will send
a signal too, but the override of the virtual signal method will simply
do nothing. This costs a dummy method call. It’s not clear if that
is so bad. The cost of a data transfer at the moment is a triple
indirection which is really bad.

Another method is to install an extra pointer in a channel.
If nullptr an ordinary channel, otherwise it’s a signalling routine.
This is faster but make channels bigger .. but then a virtual dispatch
also costs a vtable pointer and the nullptr check takes time too.

A separate instruction is most optimal, and a separate channel type.
However it has a fatal flaw, it removes transparency.
In other words some fibre that requires a set of channels to connect
to the world must know in advance which ones might be async.

I think the transparent option is the way to go initially.
There is a similar issue with inter and intra thread channels:
I/O in the same thread doesn’t require spinlocking but it is
there for all I/O (and currently can’t be optimised away).

Async reads will work similarly.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

May 13, 2021, 12:41:24 AM5/13/21

to felix google

Ok so, I am now believing channels should be abstract. Here’s why:

There are currently two distinct types of channels: synchronous and asynchronous.

The difference is this:

An asynchronous channel contains a condition variable and
associated mutex. A client will use the channel to write a pointer to a service
request. This is the old “svc_general”. The service provider is a “hardware device” of
some kind.

Lets consider a clock because that’s what we’re working on.
A clock is an *object* which runs a separate pre-emptive thread (pthread)
to manage timeouts. It maintains a priority queue of pairs:

alarm_time, fibre

It has a method “get channel endpoint” which returns an endpoint of its
service request thread the client can write on. The client must:

(a) find or construct the clock object (and start it if constructed)
(b) fetch and endpoint to the service request channel (alarm set)
(c) write the alarm time down the channel

Writing the alarm time has to do two things on top of a normal
synchronous write:

1. Increment an async count in the active set which owns the fibre
so the pthreads running that active set do not terminate whilst an
async request is pending

2. signal the channel condition variable to wake up the clock

The clock then adjusts the priority queue of alarms and goes back to
sleep until the next alarm is due to go off. Termination conditions
are yet to be decided: if there are no pending alarms the clock
can NOT just terminate (because the initialisation method starts the clock
before an alarm can be set). This is an open issue.

When the alarm goes off, the clock removes the record from
the priority queue, puts the fibre back on its owning active set,
decrements the async count of that active set, and signals
the active set. Active sets also have a condition variable which pthreads
running it wait on.

=========================

So now: the fundamental principle of our system is that by using channel I/O
continuations can be made MODULAR. It is the same principle exactly
as functional abstraction. With functional abstraction, the function parameters
abstract away (a) the implementation details of the function from the client
AND (b) the use of the result by the client from the function.

With coroutines, replace the word “parameter” with channel.

The idea is you write code which reads and and writes from channels
which has NO IDEA what is connected at the other end of the channel.
You don’t know where the input data you read comes from, nor where
the output data goes to. The coroutine interfaces similarly abstract
away the implementation details of the coroutine from the clients.

in a global setting this means you can “freely” connect any components
with whatever topology you need to get the job done, selecting the components
themselves semantically.

What does this mean? Well just suppose we have a routine that needs
to “read data”. We connect it to a routine that gets the data from a file.
We use a synchronous channel to parcel out the data line by line.

Ok now the stupid boss wants to get the data from a REMOTE file.
OOPS! We need a socket now.

So here’s the point:

NO PROBLEM. Just replace the source from file with a source
from network device. Make sure to use an ASYNCHRONOUS CHANNEL
for the connection now!

Abstract DEMANDS the client reading the lines has no idea this has happened.

**** THEREFORE ALL CHANNELS MUST HAVE THE SAME TYPE ****

on the other hand, the async channel must put the reader asleep until
there is actually data written: the old channel was synchronous but the
new one is asynchonous

*** THEREFORE CHANNELS MUST HAVE DIFFERENT TYPES ****

Ouch! How can we resolve this???

There are two solutions.

TWO: the hard one first: use parametric (template) polymorphism.
ONE: easy: use object orientation.

TWO is much harder to implement but will result in faster code.
But there’s another rule:

*** THE BEST PROGRAMMER IS THE LAZY PROGRAMMAR ***

So I’m going to try ONE first:

struct channel_t … // abstract
struct st_channel_t : channel_t // single threaded synchronous channel
struct mt_channel_t: channel_t // thread safe channel
struct async_channel_t : channel_t // async channel (always thread safe)

The abstract channel can contain the fibre pointer.
The st_channel does push and pop without locks
The mt_channel does push and pop with locks

And async channel does the signalling on I/O which the others
do not.

Note the effect of this is that the scheduler only DISPATCHES read/write operations.
And only two svc are available. The channels have to do the actual work because
the work is channel dependent.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Jul 4, 2021, 3:31:16 PM7/4/21

to felix google

The C++ allocator system is a woeful mess. In the nurtl repository I have a new system
for allocators. It is intended for general and real time use.

The base abstraction is trivial, there are two methods:

virtual void *allocate(size_t)=0;
virtual void deallocate(void *, size_t)=0;

To constructy an object call

void *operator new(size_t amt, allocator_t& a) { return a.allocate (amt); }

Now things get a bit messy. To delete an object there are two functions:

template<class T>
void delete_concrete_object (T *object, allocator_t *a) {
object->~T(); a->deallocate(object, sizeof(T));
};

template<class T>
void delete_csp_polymorphic_object (T *object, allocator_t *a) {
size_t amt = object->size();
object->~T();
a->deallocate(object, amt);
};

The first function uses the type of the pointer to calculate the
object size for you. The second function requires the object
have a method size() that returns the size of the complete type.
Note because this is a moronic C++ template system, it suffices
that the base class define

virtual size_t size()const=0;

that is, there is no need for a universal base. In C++20 there might
be a workaround: C++ is stupid in not providing this method automatically
for all polymorphic objects (i.e. ones with a virtual destructor) since the RTTI
necessarily contains that information (otherwise destructors wouldn’t work).

The onus is on the client to call the right function. Correct deallocation of
non-polymorphic heap allocated derived class types is impossible in general.
Luckily such types are also totally useless unless referenced by their
concrete type (since there is no virtual dispatch).

=======================================================

So what I have shown above is only the FIRST APPROXIMATION.

In the second approximation, I have made allocators reference counted
and provided a dedicated smart pointer type:

struct alloc_ref_t {
….

The new and delete methods above also support this type.
Note, it can have a nullptr.

========================================

But that is ALSO an approximation.

Here is the actual design. I hope you read this far because the stuff above
is boring and pointless. The next bit is novel.

Allocators can have a PARENT which is another allocator. It can be null.
We use, of course, the smart pointer alloc_ref_t. This ensure the parent
allocator exists as long as the child does.

So what’s the point??

There are several applications. The main one is ordinary delegation:

1. A thread safe allocator uses spinlocks around calls to its parent
which is not thread safe.

2. A debugging allocator to any other allocator.

3. A statistics allocator. This is VITAL. It keeps track
of the maximum number of blocks of each size allocated
at any one time but then calls malloc_free or the C++ allocator.
The destructor saves the data to a file. The data is REQUIRED
to help set up the real time allocator which has a fixed number
of pre-allocated blocks of each size and crashes is you ask
for too many (the crash is essential for performance, the allocator
checks nothing).

But there is another type of allocator which is very important:
a SUBALLOCATOR.

A suballocator grabs large blocks from its parent, and suballocates them.
The current nurlt system_allocator suballocates in a radical
way: it calls its allocate method of its parent exactly once during construction
and deallocate on destruction. It then suballocates that block according to
a map of blocks size/count pairs fed to it on constuction.

However this is not the only possible kind of suballocator.

Most importantly, if a sequential (single threaded) computation
needs temporary storage for, say, a list, we can dynamically
construct a non-thread safe (read: blindingly fast) suballocator
parented on our current thread safe allocator. This avoids the locks.

Indeed in some cases an ARENA allocator can be used.
This is a brain dead allocator that simply bumps a pointer to do
an allocation and totally ignores deallocation requests.

return p+=n;

is pretty dang fast! When finished, the destructor returns the arena
to the parent.

=============================================

AND FINALLY. The real crux of the problem!!!!!!!!!!!!

HOW DO WE DELETE AN ALLOCATOR?????

There are two answers.

1. Store the allocator required IN the allocator.
2. Use the parent.

The second option is simpler in principle though less general,
however it may complicate usage.

The problem is that if we use the system allocator, for example,
we have to reserve space for the child allocator AND everything
that child allocator needs, this is THREE kinds of object:

(a) the top level allocator object
(b) infrastructure required to implement the operation
(c) the actual memory required

The stuff in (b) and (c) is handled by the allocator itself
in its constructor, destructor, and operation, as required,
but for (a) someone else has made the allocator object.
But the deletion is typically by suicide or, more or less
equivalently, by a smart pointer. So the alloc_ref_t smart
pointer has to know how to delete an allocaator.

Of course it will call

delete_csp_polymorphic_object(
allocator_to_delete,
allocator_to_delete_with // which is what?
);

The sane answer is

delete_csp_polymorphic_object(a, a->parent)

More precisely if the allocator is not null AND its
parent is not null use the parent, otherwise use
the C++ system allocator (i.e. just call delete).

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Jul 7, 2021, 9:48:26 AM7/7/21

to felix google, Gordon Childs

So now it seems to go. Here’s the design:

An allocator has two main methods:

virtual void *allocate(size_t)=0;
virtual void deallocate(void *, size_t)=0;

and contains two members:

alloc_ref_t parent;
::std::atomic<size_t> refcnt;

where an alloc_ref_t is a reference counting smart pointer to allocators.
The parent is a reference to the allocator that constructed this allocator object,
it is used by the smart pointer to this alloocator to delete it.
The parent can be nullptr which means the C++ standard allocator called
by a new expression was used to construct it, and, a delete statement is used
to delete it.

The REQUIRED way to define and construct allocators is like this:

alloc_ref_t malloc_free_debugger = new(malloc_free) debugging_allocator_t("Malloc", malloc_free, malloc_free);

That is, the argument of the new operator MUST be passed to the constructor as the parent.
Some allocators delegate to others, in this case the SECOND allocator is the one used for delegation:

struct debugging_allocator_t : allocator_t {
alloc_ref_t delegate;
char const *tag;
debugging_allocator_t(char const *tg, alloc_ref_t p, alloc_ref_t a) : tag(tg), allocator_t(p), delegate(a) {}
~debugging_allocator_t() override {}
virtual size_t size()const override { return sizeof(*this); }

void *allocate(size_t n) override {
auto p = delegate->allocate(n);
::std::cerr << " " << tag << "++Alloc " << p << "[" << n << "]" << ::std::endl;
return p;
}
void deallocate(void *p, size_t n) override {
::std::cerr << " " << tag << "--Dealloc " << p << "[" << n << "]" << ::std::endl;
delegate->deallocate(p,n);
}
};

There is another delegating allocator I wrote:

statistics_allocator_t(char const *tg, alloc_ref_t p, alloc_ref_t a) : filename(tg), allocator_t(p), delegate(a) {}

which produces statistics:

~/nurtl>cat Sysalloc.stats.txt
24: 6
32: 8
40: 1
80: 1
88: 1
104: 1
144: 1
160: 1
168: 1

which isi useful for calculating the block sizes and number of blocks needed for some
allocators. The way these are used is like this:

// bootstrap allocator
alloc_ref_t malloc_free = new malloc_free_allocator_t; // parent C++ allocator
alloc_ref_t malloc_free_debugger = new(malloc_free) debugging_allocator_t("Malloc", malloc_free, malloc_free);

// system allocator
alloc_ref_t system_allocator_delegate = new(malloc_free_debugger) system_allocator_t(malloc_free_debugger,malloc_free_debugger, reqs);
alloc_ref_t system_allocator_debugger = new(malloc_free_debugger) debugging_allocator_t("Sys", malloc_free_debugger, system_allocator_delegate);
alloc_ref_t system_allocator = new(malloc_free_debugger) statistics_allocator_t("Sysalloc.stats.txt", malloc_free_debugger, system_allocator_debugger);

// initial process will use the system allocator
alloc_ref_t process_allocator = system_allocator;

// creates the clock too
system_t *system = new system_t(system_allocator);

csp_run(system, process_allocator, (new(process_allocator) init(nullptr))-> call(nullptr, &inlst, &outlst));
::std::cerr << "RUN COMPLETE" << ::std::endl;
delete system;

What I’ve done here is make a malloc_free allocator and wrap it with a debugging allocator.
Then I used that to make a system allocator, and wrapped THAT in a debugger as well
and then wrapped THAT in a stats allocator.

The reference counting ensures an allocator remains live when it is required by another
allocator. Note delegating allocators need TWO allocators, one is the parent,l
and one is the allocator the allocator suballocates. The system allocator above is using
malloc/free to create a single arena which it suballocates.

Because we’re using reference counting, the graph of allocators must form a
Directed Acyclic Graph (DAG).

In addition, the usual rule may apply: everything allocated should be deallocated,
BUT it may not. For example an arena allocator allocates from a fixed block by
just bumping a pointer, and ignores deallocation requests. When none of the
allocated store is required, the allocator is deleted, and the single block it allocated
returned to the allocator the provided it.

However programs should be written with little knowledge of the allocator
used: that should usually be a performance tuning issue. Hard to make
hard and fast rules.

Some more allocators may be written. On experiment is: write single threaded
allocator then provide two wrappers:

1, Spinlock protected
2. Mutex protected

Another interesting use of an allocator is a program timer. This isn’t
timing the allocator. It’s timing the whole life of the allocator, which is a cute
way of measuring the lifetime of a system.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Aug 1, 2021, 11:52:45 PM8/1/21

to felix google, Gordon Childs

Ok so I have no added some chips: a source, sink, and transducer.

To combine these into a pipeline as in Felix we might want this:

source >> transducer >> sink;

No need for manual construction of channels. To make this work, however, we need
higher order functions. In particular curried functions. For example if we have a source with arguments
like

source (func, outchan)

we have to actually make a device like:

oursource = source(func)

first before binding the channel, so that the arguments to the >> operator are uniform.

To make this work, i have to change the calling protocols. A chip is usually going to become
a fibre as above. So it doesn’t need a caller argument. Then the idea is to return a temporary
object from an operator() method call, which has a pointer to the object, and then also
an operator() method call, so we can do this:

source (func) (outchan)

and in particular:

auto device = source(func);

so that device now accepts a channel. For transducers two channels. In C++ we could use
ordering of the channels to decide which is input and which output or follow Felix, and use
a struct so we have names (which is much safer is we have untyped channels as we do
at the moment).

The main hassle is memory management. If we use something like:

struct source_setter {
source *chip;
source *operator()(F func) { chip->func = func; return chip; }
};

everything works, but the source_setter object had better not hang around, and should
only be used as a temporary object. So a weak pointer as shown is right, but the programmer
must not abuse it.

Using templates .. well it doesn’t matter about the typing. The ony requirement on sources would
be that the final call method accept a single channel endpoint as an argument which it assigns
into the object.

—
John Skaller
ska...@internode.on.net

John Skaller2

unread,

Aug 10, 2021, 9:00:01 AM8/10/21

to felix google

The latest commit is a trivial but utterly radical advance.

// connect in pipline
{
auto system = fibre->process->system;
system->connect_sequential (&prod->outchan, &boun->inchan);
system->connect_sequential (&boun->outchan, &tran->inchan);
system->connect_sequential (&tran->outchan, &cons->inchan);
}

The system method connect_sequential (concurrent, async) connects two
“pins” of type chan_epref_t with a sequential (concurrent, async) channel.

That’s half what’s needed. The other half is to connect a channel to a pin.
That’s required if several pins connect to a single channel. That would complete
the C++ equivalent of Felix circuit statement (but for untyped channels and
without error checking).

—
John Skaller
ska...@internode.on.net

Reply all

Reply to author

Forward