borrowing unique value

7 views
Skip to first unread message

John Skaller2

unread,
Feb 10, 2020, 9:40:57 PM2/10/20
to felix google
This appears to work:

//////////////
fun borrow[T] (p: &<(uniq T)):T => p*.unbox;

proc test() {
var s = ucstr "hello";
println$ s&.borrow;
s = set (s, 0, char "e");
println$ borrow (&s);
delete s;
}
test();
////////////

Note the two borrows are the same, &. is just a convenience infix operator.

The Felix uniqueness system ignores taking the address of a variable
of unique type. Of course this is unsafe, but tracking the pointer is hard.
The alternative would be to ban it.

So the function then cheats, by dereferencing the pointer, and casting
away the uniqueness. Note, it takes a read only pointer, not that this
matter here. It may be quite sane to allow writing to a borrowed pointer!
In fact, you can write to it anyhow.

So the advantage of the borrow function above is merely a convenience,
and, in particular, documenting that we’re borrowing.

Note, in Felix, no claim is made for guarranteed correctness of unique handling.
In fact, the methods working at the primitive level with a boxed value cannot
do anything without unboxing it. The idea is simply to hide these methods
in a class and abstract the type, so the abstraction prevents writing further
unsafe methods.

The ucstr type is not abstract. The _ucstr type is abstract, and ucstr
is just a boxed version of it:

// abstract representation
private type _ucstr = new +char;

// make it uniq
typedef ucstr = uniq _ucstr;

The user cannot get at the underlying char pointer. However the type _ucstr,
despite being private escapes. Private types cannot be named, but they
can escape and be used. Otherwise “borrow” wouldn’t work.

=====

NOW the idea is .. can we make a safe borrow? By safe I mean, the compiler
barfs if you borrow a dead value. This would be a special kind of pointer,
a borrow pointer. When a borrow address is taken, the uniqueness checker
now *does* take notice. Taking a borrow address, like an ordinary address,
does NOT “consume” the variable. But unlike an ordinary address, it DOES
check for liveness.

========

What about writing? Well, really we should have read only, write only, and read write
borrows.

Write only borrows are interesting. The condition seems to be that the variable
being address must be dead. Since we’re borrowing and can only write,
we must write, and then we can make the variable live again. This only
makes sense if the pointer is passed to a procedure.

Now, just consier, assignment to a dead variables livens it.
And assignment is a procedure so

a = b;

actually means (in theory):

storeat(&>a, b);

Although storeat is in the library the prinitive _storeat it maps to is a compiler
intrinsic. Now at present the uniquness checker recognises

* assignment
* the storeat operator, when the first argument is the address of a variable

and the LHS of the assignment variable of the variable in the special form
has a unique type. In that case, the variable must be dead to start and
ends up live afterwards.

If we had a write only borrow pointer type, it could process it similarly.
Now we can defer operating by saving the pointer and writing later BUT
the variable must be dead when the address is taken and becomes
live afterwards. The onus is on the programmer to use it correctly,
that is, to do the actual write because the variable is used.

just to be clear the difference is this: at present, if you do:

var x : uniq thing; // dead ..
var px : &x; // x is still dead
px <- something; // x is still dead

The system cannot track which variable px points at. So the address
taking is ignored. But compare:

var x : uniq thing; // dead
var px = writeborrow x; // x is now live
px <- something; // x is still live, system doesn’t know px points at x

=========

Read write borrows? Well one has to assume the read is first, then the wriet.
So the start condition is live, and the final condition is live too.

=========

Subtyping. A distint pointer type? Or can we just examine of the pointers
point at a uniq thing???

The latter has the same problem we have already with type variables
that LINEAR kind was introduced to fix. If the base type variable
has kind TYPE, it cannot be uniq, and so a pointer to uniq T,
where T is a type, ensure decoding. If T is LINEAR, we don’t know.
So in this case we have to ban operation. In other words you can
only borrow something explicitly uniq. [But you can still take
an ordinary address .. and pay later .. :]




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Feb 10, 2020, 9:45:11 PM2/10/20
to felix google


> On 11 Feb 2020, at 13:40, John Skaller2 <ska...@internode.on.net> wrote:
>
> This appears to work:
>
> //////////////
> fun borrow[T] (p: &<(uniq T)):T => p*.unbox;

Of course .. I already have this:

// peek inside the box without changing livenes state
fun peek[T] : &<(uniq T) -> T = "*($t)”;

The implementation is different but the effect is the same.
Hmm.. :-)


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Feb 10, 2020, 10:20:07 PM2/10/20
to felix google
So we have a problem as follows: a match statement is desugared directly
into a assigning the match argument to a variable with a secret name,
then a conditional goto chain (testing the pattern and jumping to the next
case if it fails, or executing the statements otherwise, followed by a jump
to the end of the match body.

This means a uniquely typed match value will be hopefully correctly
checked for usage in each branch.

Match expressions, however, work by wrapping all the above in a
function with unit argument, replacing each branch expression with a return statement,
and then just calling the function. This kind of thing is necessary because you cannot
have gotos and so forth in the middle of an expression (well, we could use ?: from
C but the result would be unreadable).

The PROBLEM is that calling the function is ignored by the uniqness checker,
EVEN THOUGH it is a nested function. Its a direct call. Direct calls to nested procedures
are not ignored.

The thing about procedure calls is the control flow is explicit. In an expression,
we can’t be sure. I recently added a check, that an expression didn’t read a unique
variable twice. There’s no need for a write check (modulo generators??) because
expressions can’t have side effects.

The problem is reading a unique variable DOES have an effect.

I think the problem might go away if I make the function with a parameter,
and then apply it to the match argument. The application then correctly
constitues a use.

But this only fixes match. Let/in also fails, probably because it desugars
to a match. Actually AFTER optimisation, the inline function is probably
removed, and then uniqueness checking passes. Problem is its already failed.

A temporary fix might be for the polymorphic checker to only issue a warning.
At the moment, I have to record all functions of the form

fun f (...) => …

that can operate on uniques.



John Skaller
ska...@internode.on.net





John Skaller2

unread,
Feb 11, 2020, 1:29:23 AM2/11/20
to felix google
Ok, i changed the impl to pass the match variable to the match function,
in the hope that this means it will be detected as used.

Maybe, maybe not. It didn’t help because the match function now
reports variable not used. This is because prior to any optimisation,
the match checker if a closure call, and the match handler also.

The problem is pervasive: functions referencing variables from
containing scopes.

Actually its not clear how to cope with the fact the match checker will
access the match variable, making it dead in the scope of the match
handler.

In the statement match it works by luck: because the match checker is
a closure it doesn’t notice that the match variable is accessed. This leaves
checking the handler statements.

However the library function that doesn’t work goes:

fun erase (var x:ucstr, sl:slice[int]) : ucstr =>
match sl with
| Slice_all => set (x,0,char "")
| Slice_from idx => set (x,idx, char “”)


The problem is that the “x” is not the match variable here anyhow. It’s an upscope variable.
It works fine as a statement kind of function, just not with the => form.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Feb 12, 2020, 7:53:41 PM2/12/20
to felix google


> On 11 Feb 2020, at 17:29, John Skaller2 <ska...@internode.on.net> wrote:
>
> Ok, i changed the impl to pass the match variable to the match function,
> in the hope that this means it will be detected as used.
>
> Maybe, maybe not. It didn’t help because the match function now
> reports variable not used.

There’s no choice now. I am extending the flow control, so that for any
instruction containing an expression, if the expression contains an
application of a child function, we assume it is called after entry to
the instruction, but before any other control flow occurs.

I’m just doing random ordering of the applications. What should
really happen is that the expression is unravelled to fix an order
so that in particular, inner expressions are evaluated before containing
ones, for example in:

f (g (x))

g is applied before f, due to eager evaluation. To be even more precise,
it is not enough to do a topological sort, as unravelling does. For example
in the expression

f (x), g (y)

it is not reasonable to assume f is called first, then g. It could be the other way
around, or execution might even be intertwined. HOWEVER, any accesses to
unique variables in a child function can ONLY be reads. These require the
variable to be live, and after the read the variable becomes dead. Functions
aren’t allowed to have side effects so, there is no way to liven a dead variable
in an expression. So we actually ONLY need to check there is at most one
read access to each variable, and, if there is one, the variable is live,
and if so set the variable to dead. This check is order independent.

Now, we have another two problems. The first is that a child function
can call other functions, including siblings and its children, which may
also cause a variable to be read. We cannot just check how many
times, because the function could contain branches. So, any function
call has to have full scale flow analysis. If the function is valid it
can, at most, be equivalent to no state change, or a single read.
But we have to validate it, with respect to each unique variable
of the parent, just like a procedure call.

So its tricky to get right! We just check and there are four
cases:

(a) invalid function found (abort flow)
(b) no mention of variable: no state change
(c) a single read:
if the state is dead, abort
if the state is live, set to dead
(d) multiple reads (abort)

But we are STILL not finished due to recursion. Luckily recursion
is easy: recursions must not change the state of a variable,
more precisely, the state must be changed an even number
of times so that what ever the entry state is, that state must
hold at the point of recursion also. After this check, we do not
need to flow into the recursion.

Problem is, we don’t know if some calls are recursive. Technically,
if we call the parent of the original function from a child, we have to
assume that calls the original function, recursively. The parent
can’t see the unique variable, so it is equivalent to a self-recursion.

So it’s all a bit tricky. But lets consider now the current show stopper
is that the COMPILER generates functions that read a parent’s uniq
variable and the parent calls it, without the accessing it, so the variable
appear unused. At least that error should be avoided by an oversimplified
analysis.






John Skaller
ska...@internode.on.net





John Skaller2

unread,
Feb 12, 2020, 8:26:15 PM2/12/20
to felix google
>
> There’s no choice now. I am extending the flow control,


It would actually be nice to abstract the flow control algorithm so it can be used
for other data flow tasks.


John Skaller
ska...@internode.on.net





John Skaller2

unread,
Feb 14, 2020, 1:59:18 AM2/14/20
to felix google


> On 13 Feb 2020, at 12:10, John Skaller2 <ska...@internode.on.net> wrote:
>
>>
>> There’s no choice now. I am extending the flow control,
>

On special case implemented by unravelling return instructions.
Then an assignment of the form

v = f a

is detected, and f is called like a procedure. The problem I had was the flow control
has no concept of “where it is up to” inside an expression. Unravelling fixes this.
The unravelling is transient, the variable v at the moment is a dummy.
Recursion isn’t supported yet. ONLY return arguments are unravelled.


This fixes functions of the form:

fun f(x: uniq T) => match ….

because this form is changed to

fun f(x: uniq T) {
fun body () { match … }
return body();
}

so the x is not used in f, but it is used in body.

there’s some danger getting too eager. At present matches desugar
to an if/then/elif/else chain, each of which uses a conditional goto
examining the match variable to see if it matches the pattern.

If the match variable is uniq this will mean it is used more than once.

Actually, I don’t think you can match on uniq values anyhow.
This is because there is only ONE operation available: unbox.
(Actually there is also kill). You can’t do anything with a box
except move it. If you want to operate on it you have to open
it to get the stuff inside.




John Skaller
ska...@internode.on.net





John Skaller2

unread,
Feb 19, 2020, 9:26:22 AM2/19/20
to felix google
Felix is designed to support muliple targets. There’s a pain with configuration though.
If you rebuild Felix, the config directory will get clobbered.

For the main target, host, the default config is first constructed,
but then any *.fpc files in $HOME/.felix/config are copied over the top.
This way your local setup is preserved across builds.

But it only works for the host target.

SImilarly if you’re compiling for host and switch to another target,
the cache will get cleaned out entirely to ensure the dependency
checker doesn’t get confused. This is uncool, considering parse caches
are typically platform independent (though not always!).

So I’m thinking of changing the cache so there’s a separate cache
for each target, more or less allowing the cache to reflect the install
structure. This should make multi-target building faster and more
reliable and also allow separate overrides of the config database.

However it isn’t a small job and will break things for a while.
in theory if the system were well constructed it wouldn’t be too hard,
but it isn’t: i will have to find EVERY reference to the cache, and
add an extra directory into the path. Worse, I will have to make sure
the current target is known at every such point.

BTW: Felix currently has config data for common targets
so that in principle, fbuild is not need anymore. We still need
a build script, but there’s no longer any need to actually
compute a configuration. The problem is that this computation
only occurs when bootstrapping the host and doesn’t impact
secondary targets anyhow.




John Skaller
ska...@internode.on.net





Reply all
Reply to author
Forward
0 new messages