I don't have much time to do much research at the moment so you will
have to bear with my ideas unsullied (or uninformed) by reading.
The thing I have been thinking about recently is the function call
issue. In the von Neumann architecture we treat functions as generally
non-corporeal objects. We can instantiate (by calling them) as many
times as we like. All that gets used up is stack space for the
arguments and local variables. This can have problems (non-tail call
recursive functions) but they are generally not insurmountable in von
Neumann archs.
Things get more interesting when trying to program in things like a
network of processing elements with local memory storage. Think brain
like systems, where processing and memory are the pretty much
together. While a portion of network may represent a function it
cannot be used in the same fashion as a function. Different bits of
code using the same "function" would interfere (either directly or by
slowing down the response time for other users). Using a centralised
stack or memory for messages would be possible, but would have
significant performance problems. Distributed stacks would have the
problem of what to do if the stack got too small, or was too big and
wasted on an infrequently used function. So you might want to be able
to make copies of your "function" to different parts of the network.
However the network might be non-heterogenous in its language (e.g.
GPGPU and many core) so direct copying may not be available or
feasible.
There are also functions that are also called only once and it would
be nice to be able to reuse some of their resources after they are no
longer needed.
So we would want a language that allowed code/programmatic elements to
be referenced, altered and called in different ways. What I have
thought of is a language that stores an intermediate representation of
a function (in an assembler level language). So to use a C type syntax
to start with (although I'd prefer nicer ones) as I expect everyone is
familiar with it.
I have the intermediate assembler level language for a number of reasons.
1) Self-modifying code
2) Portability between systems (I don't want to have to rewrite lots
of code, especially code that writes code)
//Definition of function
int func (int y)
void main()
{
int n = 0;
func (n); // Attempts to setup and use a form of stack to use func.
func !(n); // Instantiates func at compile time in such a way that
it can only be used once in the lifetime of the code, and the
resources are used after this. Cannot be used for a non-Tail recursive
function.
ResDesc res; //Resource description, platform specific
Func func2; // Unsure whether this should be typed. Probably it should be
res = getResDesc (source!(func)); // Get the resources needed for
func by analysing the source. The source is stored somewhere, although
in assembler-ish form for simple and quick compilation
func2 = ralloc!(res , importance ) // ralloc would be a user
defined/definable function that would take the importance of the
resource allocation call and see if it could rearrange the resources
it had access to.
writefunc!(func2, compile!(source!(func)));
//This writes the function to location specified. compile takes the
source and makes a description to be written.
// This will allow for self-modifying code, or updates that come
from outside. Otherwise we shall have to shut down our massively
parrallel machines in order to change them from the outside
if (func2!=NULL)
{
func2>(n); // Uses the internal resources created by ralloc. Not
thread safe, although if only one place has a reference to func2 this
shouldn't be a problem.
}
}
Does this strike people as the right way to go for language constructs
or is there a different way to go that I am not seeing?
However there are a number of other things I would like to do. Part of
the problem with potentially self-modifying code is that it cannot be
optimised or parrellised well. So perhaps make the code lockable so
that it can be optimised in some fashion.
Will
messages are passed is not ownable or controllable. If a malicious1) I dislike the actor model of concurrency due to the fact that way
bit of code got on the system what is to stop it denial of servicing
the other actors, either by filling their message queues of other
actors or by just by over using the bandwidth? What I want to explore
is systems where every computer resource (memory, memory bandwidth,
compute power) is ownable, so I do not want it abstracted away too
much. Or at least not from the lowest level of coding. I expect the
high level, day to day coding, will abstract it away to some extent
though.
2) It is not just simply that I am copying brains. I'm also looking at
things like FPGAs and the fleet architecture and asking myself how we
can create computer languages that can program themselves where we
just do the right thing at the right place and route the signals
around.
Yes I am talking about logical ownership rather than physical
separation. We already have some security model at the moment in
hardware (Rings etc) so we need to ask whether this needs improving
and how to take its enforcement distributed.
The following link provides a brief intro into what I am thinking,
although there are some factual inaccuracies (you can share memory as
read only if you muck around with the page tables and fault on both
and then perform reads but not writes in the fault handler).
http://groups.google.com/group/fca-t/browse_thread/thread/d67fab6ea60b4c00?hl=en_US
>>
>> 2) It is not just simply that I am copying brains. I'm also looking at
>> things like FPGAs and the fleet architecture and asking myself how we
>> can create computer languages that can program themselves where we
>> just do the right thing at the right place and route the signals
>> around.
>
>
> Signal routers are a shared (read: contended) resource. It is the reason
> many data parallel algorithms do not scale in practice. To create a computer
> language that plays well with signal routers you first need a computational
> model that is efficiently and pervasively access parallel.
Do you have any thoughts along these lines?
Will
Yes I am talking about logical ownership rather than physicalseparation. We already have some security model at the moment in
hardware (Rings etc) so we need to ask whether this needs improving
and how to take its enforcement distributed.
The following link provides a brief intro into what I am thinking,
although there are some factual inaccuracies (you can share memory as
read only if you muck around with the page tables and fault on both
and then perform reads but not writes in the fault handler).
> Signal routers are a shared (read: contended) resource. It is the reasonDo you have any thoughts along these lines?
> many data parallel algorithms do not scale in practice. To create a computer
> language that plays well with signal routers you first need a computational
> model that is efficiently and pervasively access parallel.
On Sat, Jul 17, 2010 at 1:18 PM, William Pearson <wil.p...@gmail.com> wrote:Yes I am talking about logical ownership rather than physicalseparation. We already have some security model at the moment in
hardware (Rings etc) so we need to ask whether this needs improving
and how to take its enforcement distributed.
The following link provides a brief intro into what I am thinking,
although there are some factual inaccuracies (you can share memory as
read only if you muck around with the page tables and fault on both
and then perform reads but not writes in the fault handler).There are current processor architectures that allow a process to lock individual words in distributed memory without spinning cores. However, the latency-hiding non-local memory assumption used for these architectures requires designing software with vast quantities of fine-grained parallelism. Shared/Exclusive resource locking does not have to slow things down if a core is aware of it, can switch between tasks on a clock-by-clock basis, and the code is sufficiently access parallel to generate many more executable tasks than there are cores. These are fun architectures to work with but you have to design data structures that are suited for them.However, you still have to abide by conventions e.g. it is possible to force a read past a word lock. Also, these systems are designed for distribution on the scale of a data center, not geographical if that matters.> Signal routers are a shared (read: contended) resource. It is the reasonDo you have any thoughts along these lines?
> many data parallel algorithms do not scale in practice. To create a computer
> language that plays well with signal routers you first need a computational
> model that is efficiently and pervasively access parallel.
Succinctly, you cannot build a massively parallel software model that plays well with 'signal routers' using conventional tree structures or distributed hash tables. For example, most things expressible as relational algebras would be right out since join operators are not access parallel (they are data parallel if you implement a relational algebra using DHTs). It is not a problem of syntactic sugar, it requires a whole new set of foundational data structures for doing computation.
To me most joins are simply a primitive means of making do for the lack of dependable uniform object/struct identity, efficient pointer/object id navigation/lookup and lack of first class dynamic relations between objects. It seems more a weakness of the relational model than an inherent data problem. But then I have a strong ODBMS bias.
On Mon, Jul 19, 2010 at 3:09 AM, Samantha Atkins <sjat...@gmail.com> wrote:To me most joins are simply a primitive means of making do for the lack of dependable uniform object/struct identity, efficient pointer/object id navigation/lookup and lack of first class dynamic relations between objects. It seems more a weakness of the relational model than an inherent data problem. But then I have a strong ODBMS bias.Join operators are fundamental, they do not have much to do with databases except that databases implement them. In the abstract, a join operator defines a relationship between two sets.
A relationship between two objects can be described as a join on two sets each containing a single element. Data structures and algorithms are built on relational algebras even though we rarely think of them as such.
There are important ideas that are not generally expressible in relational algebras, such as transitive closures (used in graph theory). Not surprisingly, bounded approximations of the general case *are* expressible in relational algebras since we do implement them in software. Transitive closures, to use that example, are approximated with a finite sequence of self-joins.In a distributed environment, join algorithms that involve more than a single pair of objects in aggregate do not decompose nicely for parallel computation. Selection operators parallelize just fine (see: MapReduce). For massively parallel software design, a complete relational algebra implementation model where every operator has an access-parallel decomposition is a major Holy Grail.
So why not make the relationship itself first order and persistent instead of on the fly computing it based on silliness like cross-table column matches? Why not simply record explicitly what is related to what in what manner? Persistently memoize the relationship and forget about embedded foreign keys and their implicit limits on relationships and pinning of valid pseudo-identifiers in the data space of these sets.
I would think than in a distributed persistent situation that anything that was the equivalent of a table/set scan would be relatively inefficient. You would have to broadcast the query to all distributed stores and accumulate the results.