Function calling for unconventional architectures.

William Pearson

unread,

Jul 14, 2010, 6:05:19 PM7/14/10

to Future Computer Architecture - Technology

Hi all,

I don't have much time to do much research at the moment so you will
have to bear with my ideas unsullied (or uninformed) by reading.

The thing I have been thinking about recently is the function call
issue. In the von Neumann architecture we treat functions as generally
non-corporeal objects. We can instantiate (by calling them) as many
times as we like. All that gets used up is stack space for the
arguments and local variables. This can have problems (non-tail call
recursive functions) but they are generally not insurmountable in von
Neumann archs.

Things get more interesting when trying to program in things like a
network of processing elements with local memory storage. Think brain
like systems, where processing and memory are the pretty much
together. While a portion of network may represent a function it
cannot be used in the same fashion as a function. Different bits of
code using the same "function" would interfere (either directly or by
slowing down the response time for other users). Using a centralised
stack or memory for messages would be possible, but would have
significant performance problems. Distributed stacks would have the
problem of what to do if the stack got too small, or was too big and
wasted on an infrequently used function. So you might want to be able
to make copies of your "function" to different parts of the network.
However the network might be non-heterogenous in its language (e.g.
GPGPU and many core) so direct copying may not be available or
feasible.

There are also functions that are also called only once and it would
be nice to be able to reuse some of their resources after they are no
longer needed.

So we would want a language that allowed code/programmatic elements to
be referenced, altered and called in different ways. What I have
thought of is a language that stores an intermediate representation of
a function (in an assembler level language). So to use a C type syntax
to start with (although I'd prefer nicer ones) as I expect everyone is
familiar with it.

I have the intermediate assembler level language for a number of reasons.

1) Self-modifying code
2) Portability between systems (I don't want to have to rewrite lots
of code, especially code that writes code)

//Definition of function
int func (int y)

void main()
{
int n = 0;
func (n); // Attempts to setup and use a form of stack to use func.

func !(n); // Instantiates func at compile time in such a way that
it can only be used once in the lifetime of the code, and the
resources are used after this. Cannot be used for a non-Tail recursive
function.

ResDesc res; //Resource description, platform specific

Func func2; // Unsure whether this should be typed. Probably it should be

res = getResDesc (source!(func)); // Get the resources needed for
func by analysing the source. The source is stored somewhere, although
in assembler-ish form for simple and quick compilation

func2 = ralloc!(res , importance ) // ralloc would be a user
defined/definable function that would take the importance of the
resource allocation call and see if it could rearrange the resources
it had access to.

writefunc!(func2, compile!(source!(func)));

//This writes the function to location specified. compile takes the
source and makes a description to be written.

// This will allow for self-modifying code, or updates that come
from outside. Otherwise we shall have to shut down our massively
parrallel machines in order to change them from the outside

if (func2!=NULL)
{
func2>(n); // Uses the internal resources created by ralloc. Not
thread safe, although if only one place has a reference to func2 this
shouldn't be a problem.
}
}

Does this strike people as the right way to go for language constructs
or is there a different way to go that I am not seeing?

However there are a number of other things I would like to do. Part of
the problem with potentially self-modifying code is that it cannot be
optimised or parrellised well. So perhaps make the code lockable so
that it can be optimised in some fashion.

Will

Russell Wallace

unread,

Jul 15, 2010, 3:34:50 AM7/15/10

to fc...@googlegroups.com

Might be worth your while to take a look at Erlang? They try to solve
some of the same problems you are trying to solve, reportedly with
some success.

Samantha Atkins

unread,

Jul 15, 2010, 4:03:53 PM7/15/10

to Future Computer Architecture - Technology

On Jul 14, 3:05 pm, William Pearson <wil.pear...@gmail.com> wrote:

> The thing I have been thinking about recently is the function call
> issue. In the von Neumann architecture we treat functions as generally
> non-corporeal objects. We can instantiate (by calling them) as many
> times as we like. All that gets used up is stack space for the
> arguments and local variables. This can have problems (non-tail call
> recursive functions) but they are generally not insurmountable in von
> Neumann archs.
>

So dropping "stack" as that is implementation, you need a chunk of
memory of some kind and some compute resources. You need a bit more
than this as the function may invoke other functions through, for the
sake of discussion, some kind of messaging. It may also be some kind
of generator than needs to keep local state and compute and hand back
the "next" value when asked.

> Things get more interesting when trying to program in things like a
> network of processing elements with local memory storage. Think brain
> like systems, where processing and memory are the pretty much
> together. While a portion of network may represent a function it
> cannot be used in the same fashion as a function. Different bits of
> code using the same "function" would interfere (either directly or by
> slowing down the response time for other users).

Perhaps I am missing something but you seem to have created part of
the problem by your assumptions. A network of compute memory nodes or
actors does not necessarily need to represent a function as some sub
network at all. Are you conflating brain specifics with general
networks? In a network of semi-autonomous agents there is no need for
such interference. In an Actor model the system can be quite
functional with no shared or interfering state.

> Using a centralised
> stack or memory for messages would be possible, but would have
> significant performance problems.

See Actor model.

> Distributed stacks would have the
> problem of what to do if the stack got too small, or was too big and
> wasted on an infrequently used function.

Chop memory dynamically for use by different Actors is one way to
handle this.

> So you might want to be able
> to make copies of your "function" to different parts of the network.

Functions and data are just streams of information you can have as
many or few copies as convenient.

> However the network might be non-heterogenous in its language (e.g.
> GPGPU and many core) so direct copying may not be available or
> feasible.

Making it more or less feasible could be a job of some executive
layer. That or distributing and dividing different tasks up into
pieces most fit for different resources and gathering up the result
when available.

>
> There are also functions that are also called only once and it would
> be nice to be able to reuse some of their resources after they are no
> longer needed.

See general Actor model and dynamic assignment of task/memory.

>
> So we would want a language that allowed code/programmatic elements to
> be referenced, altered and called in different ways. What

First class functions and good metaprogramming abilities. Code and
data being identical helps a lot.

>I have
> thought of is a language that stores an intermediate representation of
> a function (in an assembler level language). So to use a C type syntax
> to start with (although I'd prefer nicer ones) as I expect everyone is
> familiar with it.
>

Seems to be way down at the bit level. Why not start with Scheme or
Lisp or Clojure or Scala? All would be nicer and not significantly
slower for a good implementation. Besides this is experimental work
not full production at this stage.

> I have the intermediate assembler level language for a number of reasons.
>
> 1) Self-modifying code
> 2) Portability between systems (I don't want to have to rewrite lots
> of code, especially code that writes code)
>

Assembler is not needed for either of these.

> //Definition of function
> int func (int y)
>
> void main()
> {
> int n = 0;
> func (n); // Attempts to setup and use a form of stack to use func.
>
> func !(n); // Instantiates func at compile time in such a way that
> it can only be used once in the lifetime of the code, and the
> resources are used after this. Cannot be used for a non-Tail recursive
> function.

I don't see why the last is at all necessary. If you can call an
arbitrary function and have resources allocated from some pool and not
released until function completion that seems to be enough.

>
> ResDesc res; //Resource description, platform specific
>
> Func func2; // Unsure whether this should be typed. Probably it should be
>
> res = getResDesc (source!(func)); // Get the resources needed for
> func by analysing the source. The source is stored somewhere, although
> in assembler-ish form for simple and quick compilation
>

I think you would like the Actor book by Agha if you are not familiar
with it. Also this is a common task in many OS without needing to
analyze sources at all. In Lisp worlds a set of readevals on streams
of code and data. I don't see why on the fly analysis is needed much.
It can be pre-analysed as to resource needs and resources needed can
auto-assemble from a pool of tasks to be done to some degree with the
need for such centralized executive except for mostly low level
housekeeping. At least that is how it looks to me.

I would make the compute nodes very plentiful and able to sense/notice
tasks and request any resources and on the fly networking with other
nodes needed to the maximal extent possible. The networking could
possibly also be removed from node repertoire. Instead it puts other
tasks on the pile for still other nodes to volunteer to pick up. See
where I am going?

- samantha

J. Andrew Rogers

unread,

Jul 17, 2010, 2:35:55 PM7/17/10

to fc...@googlegroups.com

On Fri, Jul 16, 2010 at 2:24 PM, William Pearson <wil.p...@gmail.com> wrote:

1) I dislike the actor model of concurrency due to the fact that way
messages are passed is not ownable or controllable. If a malicious
bit of code got on the system what is to stop it denial of servicing
the other actors, either by filling their message queues of other
actors or by just by over using the bandwidth? What I want to explore
is systems where every computer resource (memory, memory bandwidth,
compute power) is ownable, so I do not want it abstracted away too
much. Or at least not from the lowest level of coding. I expect the
high level, day to day coding, will abstract it away to some extent
though.

What does "ownable" even mean in this context? There are computational models that guarantee a very high degree of pervasive resource disjoint-ness (e.g. Morton topologies) but ownership is still just a convention in real systems.

2) It is not just simply that I am copying brains. I'm also looking at
things like FPGAs and the fleet architecture and asking myself how we
can create computer languages that can program themselves where we
just do the right thing at the right place and route the signals
around.

Signal routers are a shared (read: contended) resource. It is the reason many data parallel algorithms do not scale in practice. To create a computer language that plays well with signal routers you first need a computational model that is efficiently and pervasively access parallel.

--

J. Andrew Rogers

William Pearson

unread,

Jul 17, 2010, 4:18:50 PM7/17/10

to fc...@googlegroups.com

On 17 July 2010 19:35, J. Andrew Rogers <jar.m...@gmail.com> wrote:
> On Fri, Jul 16, 2010 at 2:24 PM, William Pearson <wil.p...@gmail.com>
> wrote:
>>
>> 1) I dislike the actor model of concurrency due to the fact that way
>> messages are passed is not ownable or controllable. If a malicious
>> bit of code got on the system what is to stop it denial of servicing
>> the other actors, either by filling their message queues of other
>> actors or by just by over using the bandwidth? What I want to explore
>> is systems where every computer resource (memory, memory bandwidth,
>> compute power) is ownable, so I do not want it abstracted away too
>> much. Or at least not from the lowest level of coding. I expect the
>> high level, day to day coding, will abstract it away to some extent
>> though.
>
>
> What does "ownable" even mean in this context? There are computational
> models that guarantee a very high degree of pervasive resource disjoint-ness
> (e.g. Morton topologies) but ownership is still just a convention in real
> systems.

Yes I am talking about logical ownership rather than physical
separation. We already have some security model at the moment in
hardware (Rings etc) so we need to ask whether this needs improving
and how to take its enforcement distributed.

The following link provides a brief intro into what I am thinking,
although there are some factual inaccuracies (you can share memory as
read only if you muck around with the page tables and fault on both
and then perform reads but not writes in the fault handler).

http://groups.google.com/group/fca-t/browse_thread/thread/d67fab6ea60b4c00?hl=en_US

>>
>> 2) It is not just simply that I am copying brains. I'm also looking at
>> things like FPGAs and the fleet architecture and asking myself how we
>> can create computer languages that can program themselves where we
>> just do the right thing at the right place and route the signals
>> around.
>
>
> Signal routers are a shared (read: contended) resource. It is the reason
> many data parallel algorithms do not scale in practice. To create a computer
> language that plays well with signal routers you first need a computational
> model that is efficiently and pervasively access parallel.

Do you have any thoughts along these lines?

Will

J. Andrew Rogers

unread,

Jul 17, 2010, 6:23:27 PM7/17/10

to fc...@googlegroups.com

On Sat, Jul 17, 2010 at 1:18 PM, William Pearson <wil.p...@gmail.com> wrote:

Yes I am talking about logical ownership rather than physical
separation. We already have some security model at the moment in
hardware (Rings etc) so we need to ask whether this needs improving
and how to take its enforcement distributed.

The following link provides a brief intro into what I am thinking,
although there are some factual inaccuracies (you can share memory as
read only if you muck around with the page tables and fault on both
and then perform reads but not writes in the fault handler).

There are current processor architectures that allow a process to lock individual words in distributed memory without spinning cores. However, the latency-hiding non-local memory assumption used for these architectures requires designing software with vast quantities of fine-grained parallelism. Shared/Exclusive resource locking does not have to slow things down if a core is aware of it, can switch between tasks on a clock-by-clock basis, and the code is sufficiently access parallel to generate many more executable tasks than there are cores. These are fun architectures to work with but you have to design data structures that are suited for them.

However, you still have to abide by conventions e.g. it is possible to force a read past a word lock. Also, these systems are designed for distribution on the scale of a data center, not geographical if that matters.

> Signal routers are a shared (read: contended) resource. It is the reason
> many data parallel algorithms do not scale in practice. To create a computer
> language that plays well with signal routers you first need a computational
> model that is efficiently and pervasively access parallel.

Do you have any thoughts along these lines?

Succinctly, you cannot build a massively parallel software model that plays well with 'signal routers' using conventional tree structures or distributed hash tables. For example, most things expressible as relational algebras would be right out since join operators are not access parallel (they are data parallel if you implement a relational algebra using DHTs). It is not a problem of syntactic sugar, it requires a whole new set of foundational data structures for doing computation.

A pervasively access parallel computational model is an active area of theoretical research and in the last year or two there have been some breakthroughs. There is an elegant computational model based on the bastard child of differential geometry and information theory that will work for this. For example, IBM Research has demonstrated join algorithms that empirically parallelize linearly across 50,000 compute nodes using a prototype implementation. The caveat is that you have to rebuild your software libraries because the primitives are very odd by conventional standards. I know there are products based on this computational model in the works.

--

J. Andrew Rogers

Samantha Atkins

unread,

Jul 19, 2010, 6:09:36 AM7/19/10

to fc...@googlegroups.com

On Jul 17, 2010, at 3:23 PM, J. Andrew Rogers wrote:

On Sat, Jul 17, 2010 at 1:18 PM, William Pearson <wil.p...@gmail.com> wrote:

Yes I am talking about logical ownership rather than physical
separation. We already have some security model at the moment in
hardware (Rings etc) so we need to ask whether this needs improving
and how to take its enforcement distributed.

The following link provides a brief intro into what I am thinking,
although there are some factual inaccuracies (you can share memory as
read only if you muck around with the page tables and fault on both
and then perform reads but not writes in the fault handler).

There are current processor architectures that allow a process to lock individual words in distributed memory without spinning cores. However, the latency-hiding non-local memory assumption used for these architectures requires designing software with vast quantities of fine-grained parallelism. Shared/Exclusive resource locking does not have to slow things down if a core is aware of it, can switch between tasks on a clock-by-clock basis, and the code is sufficiently access parallel to generate many more executable tasks than there are cores. These are fun architectures to work with but you have to design data structures that are suited for them.

However, you still have to abide by conventions e.g. it is possible to force a read past a word lock. Also, these systems are designed for distribution on the scale of a data center, not geographical if that matters.

> Signal routers are a shared (read: contended) resource. It is the reason
> many data parallel algorithms do not scale in practice. To create a computer
> language that plays well with signal routers you first need a computational
> model that is efficiently and pervasively access parallel.

Do you have any thoughts along these lines?

Succinctly, you cannot build a massively parallel software model that plays well with 'signal routers' using conventional tree structures or distributed hash tables. For example, most things expressible as relational algebras would be right out since join operators are not access parallel (they are data parallel if you implement a relational algebra using DHTs). It is not a problem of syntactic sugar, it requires a whole new set of foundational data structures for doing computation.

To me most joins are simply a primitive means of making do for the lack of dependable uniform object/struct identity, efficient pointer/object id navigation/lookup and lack of first class dynamic relations between objects. It seems more a weakness of the relational model than an inherent data problem. But then I have a strong ODBMS bias.

- samantha

J. Andrew Rogers

unread,

Jul 19, 2010, 2:46:46 PM7/19/10

to fc...@googlegroups.com

On Mon, Jul 19, 2010 at 3:09 AM, Samantha Atkins <sjat...@gmail.com> wrote:

To me most joins are simply a primitive means of making do for the lack of dependable uniform object/struct identity, efficient pointer/object id navigation/lookup and lack of first class dynamic relations between objects. It seems more a weakness of the relational model than an inherent data problem. But then I have a strong ODBMS bias.

Join operators are fundamental, they do not have much to do with databases except that databases implement them. In the abstract, a join operator defines a relationship between two sets. A relationship between two objects can be described as a join on two sets each containing a single element. Data structures and algorithms are built on relational algebras even though we rarely think of them as such.

There are important ideas that are not generally expressible in relational algebras, such as transitive closures (used in graph theory). Not surprisingly, bounded approximations of the general case *are* expressible in relational algebras since we do implement them in software. Transitive closures, to use that example, are approximated with a finite sequence of self-joins.

In a distributed environment, join algorithms that involve more than a single pair of objects in aggregate do not decompose nicely for parallel computation. Selection operators parallelize just fine (see: MapReduce). For massively parallel software design, a complete relational algebra implementation model where every operator has an access-parallel decomposition is a major Holy Grail.

--

J. Andrew Rogers

Samantha Atkins

unread,

Jul 19, 2010, 3:11:55 PM7/19/10

to fc...@googlegroups.com

On Jul 19, 2010, at 11:46 AM, J. Andrew Rogers wrote:

On Mon, Jul 19, 2010 at 3:09 AM, Samantha Atkins <sjat...@gmail.com> wrote:

To me most joins are simply a primitive means of making do for the lack of dependable uniform object/struct identity, efficient pointer/object id navigation/lookup and lack of first class dynamic relations between objects. It seems more a weakness of the relational model than an inherent data problem. But then I have a strong ODBMS bias.

Join operators are fundamental, they do not have much to do with databases except that databases implement them. In the abstract, a join operator defines a relationship between two sets.

So why not make the relationship itself first order and persistent instead of on the fly computing it based on silliness like cross-table column matches? Why not simply record explicitly what is related to what in what manner? Persistently memoize the relationship and forget about embedded foreign keys and their implicit limits on relationships and pinning of valid pseudo-identifiers in the data space of these sets.

A relationship between two objects can be described as a join on two sets each containing a single element. Data structures and algorithms are built on relational algebras even though we rarely think of them as such.

You can decompose that way sure.

There are important ideas that are not generally expressible in relational algebras, such as transitive closures (used in graph theory). Not surprisingly, bounded approximations of the general case *are* expressible in relational algebras since we do implement them in software. Transitive closures, to use that example, are approximated with a finite sequence of self-joins.

In a distributed environment, join algorithms that involve more than a single pair of objects in aggregate do not decompose nicely for parallel computation. Selection operators parallelize just fine (see: MapReduce). For massively parallel software design, a complete relational algebra implementation model where every operator has an access-parallel decomposition is a major Holy Grail.

I would think than in a distributed persistent situation that anything that was the equivalent of a table/set scan would be relatively inefficient. You would have to broadcast the query to all distributed stores and accumulate the results.

- samantha

J. Andrew Rogers

unread,

Jul 19, 2010, 4:36:17 PM7/19/10

to fc...@googlegroups.com

On Mon, Jul 19, 2010 at 12:11 PM, Samantha Atkins <sjat...@gmail.com> wrote:

So why not make the relationship itself first order and persistent instead of on the fly computing it based on silliness like cross-table column matches? Why not simply record explicitly what is related to what in what manner? Persistently memoize the relationship and forget about embedded foreign keys and their implicit limits on relationships and pinning of valid pseudo-identifiers in the data space of these sets.

This is what graph databases do -- they use relationship-centric representation. As you imagine, this is *much* more efficient than a relation-centric model. Unfortunately, this does nothing at all to address the issue of join parallelization. You haven't made the problem go away, you are just using a more efficient representation as a starting point.

The rough upper-bound for practical join scalability on efficient edge-centric data representations is around 1-billion edges, primarily because it doesn't parallelize worth a damn. That is two or three orders of magnitude too small for many people currently.

I would think than in a distributed persistent situation that anything that was the equivalent of a table/set scan would be relatively inefficient. You would have to broadcast the query to all distributed stores and accumulate the results.

Yes, selectivity of the operations needs to be kept high, particularly when decomposed i.e. you do not want one node talking to all of the other nodes. The real trick from a theoretical standpoint is preserving locality and disjoint-ness across a long sequence of distributed selection and join operations. Entropy minimization scales joins and entropy maximization scales lookup tables, hence why hashing creates problems for joins.