Heap allocating all activation records

26 views
Skip to first unread message

Neal Alexander

unread,
Sep 24, 2019, 7:39:53 AM9/24/19
to Pragmatic functional programming research
Why has this fallen so out of favor? Certain compilers have 6+ different ways of constructing a closure depending on the situation, yet still struggle with CPS heavy code.

If a language cant do TCO, CPS, currying and church encoding without much penalty, how functional is it really? 
If you have such a large performance based disincentive in using all these lambda-calculesque features why even bother with the language in the first place?

Chained heap allocated activation records don't really penalize any of the above mentioned (maybe It could be argued that they penalize everything haha).

Currying and Church encoding fall out naturally from chained closures, since closures and cons become more or less the same datatype.

Generational GC helps but isn't good enough for this type of usage situation because too much global GC pressure is created. We need an extra local generation where traces 
are limited to cleaning up the thread local graph of the bound environment, ignoring all references to global data.

Why not:
- Bump pointer GC allocate activation records from separate thread local pools. 
- Enforce that activation records never escape the thread local execution context.
- Create a barrier on writes and data constructors which deep-copy and type-cast activation records from the local pool into one of the generations of the global pool (maybe skipping the first generation)
- Make all activation records the same size, binding a single variable.

This should not be much slower than a regular stack model. It probably isn't so different than Cheney on the MTA.

When a pool runs out of memory the partial GC trace can be limited to the current thread local execution context, following chain links and ignoring the global data bound by each record. 
The majority of the pool would be garbage, satisfying the generational hypothesis.

Naïve linked activation records retain too much memory due to unreferenced data being kept alive because something further up the chain is needed. However with this system, only the inter-record links are
visited during a local trace, not the actual data bound in the environment.  The deep-copy mentioned above could discard unreferenced links in a chain during the copy.

A full-scale GC cycle would still need to inspect these local activation records though, causing the previously mentioned live-data issue to resurface. 
Maybe a bitmask could be added to each activation record describing which elements are unreferenced at the point of capture.  This gives multiple context-dependent perspectives of what is live. 
Unfortunately this requires multiple passes for the same chain since the liveness is relative to the first node starting the sub-trace. At least these extra passes are limited to the local context, which is probably smallish at any given point. 
If that doesn't work well, maybe an activation record conversion phase could be added to the start of all full collection cycles. This would convert all the local activation records into global closures using the deep-copy method above,
thus eliminating unreferenced intermediate links. This sounds slow, but a full collection cycle would bump the current live data up a generation anyway.

The system penalizes storing a closure in a global data structure, but only the first time, as after then it becomes upgraded to normal global data. The type of usage situation where closures are being recorded in an object
seems like it would usually involve a longer lifetime than regular trash. If a standard generational GC would end up copying that data up to another generation anyway, you aren't losing much in the long run.

It also penalizes people who use functions with lots of parameters or collect too many outer variables, due to all the chain indirection. However, Forth and J programmers seem to do fine creating functions with less than 3 parameters most of the time.
If you need to pass 7 parameters, odds are that you have accumulated enough environment context to warrant consolidating it into a tuple or whatever. It basically encourages you to split code up into smaller functions that hold on to less 
context. Being strong-armed into certain code factoring disciplines by irrelevant runtime implementation details is less than ideal though.
Reply all
Reply to author
Forward
0 new messages