Full Metal Jacket

139 peržiūros
Praleisti ir pereiti prie pirmo neskaityto pranešimo

Harsha

neskaityta,
2016-07-21 07:30:442016-07-21
kam: Flow Based Programming
Hi All,

I recently came across Full Metal Jacket.


I believe this to be an implementation of Pure / Dynamic Data Flow which is turing-complete.
It differs from FBP in the way it implements concurrency, however it is asynchronous like FBP.

As far as I know FBP, KPN and Dynamic Data Flow are the only know models of Asynchronous Data Flow.

Cheers,
Harsha


Humberto Madeira

neskaityta,
2016-07-23 14:43:352016-07-23
kam: Flow Based Programming
Hi Harsha,


On Thursday, 21 July 2016 07:30:44 UTC-4, Harsha wrote:

Thanks for the info.  I am always looking to compare models with my own implementation.

I believe this to be an implementation of Pure / Dynamic Data Flow which is turing-complete.
It differs from FBP in the way it implements concurrency, however it is asynchronous like FBP.

My own implementation also differs from FBP in the way it handles concurrency. 

I would also like  to compare it to the "Dynamic Data Flow" you mentioned but for that I need a sufficiently authoritative and detailed description of how it is supposed to work.
Would you happen to know where to find such?
 

As far as I know FBP, KPN and Dynamic Data Flow are the only know models of Asynchronous Data Flow.

I believe SEDA was also mentioned before in this forum.

And my own implementation can likely be added to the list (once it is ready for the big reveal).

Also, judging by my own explorations, there are quite a few possible variations on "Asynchronous Data Flow" out there.
The implementation model really depends on the requirements you are trying to satisfy.

Best Regards,
--Bert 

Harsha

neskaityta,
2016-07-29 12:09:132016-07-29
kam: Flow Based Programming
Hi Bert,

The author of Full Metal Jacket provides psudeo-code for the implementation of Pure Data Flow / Dynamic Data Flow in

Let us consider a Node with m input ports and n output ports.
In Pure / Dynamic Data Flow, the Node get's executed when Data is available at all of the m input ports.
In this model the Node is executed when the all the data is available, so the order of arrival of data at each individual port
can be asynchronous.

In KPN / FBP what happens is more like this, when new data is available the scheduler asks the following questions.

Is the Node in Running State ?
     If so, then push data into a waiting Queue
Is the Node in Ready State ?
    If so then push the data into the Node.
Is the Node in Waiting State ? 
   What is the port that the Node is waiting for ?
        Is the New Data for the waiting port ?
            If so, then push data into the Node and change the Node to Running State.
         Is the New Data not for the waiting port ?
            If so, then push data into a waiting Queue

Inside the Node, every time `port.read()` called it changes it's state from Running to Waiting.

As you can FBP is very generic and this allows for Asynchronous, out of order arrival at the input ports too.

I have no experience with SEDA so I can't speak for it. From the brief description it does seem like KPN with
explicit queues and all.

The beauty of FBP / Dynamic Data Flow is that, the concurrency synchronisation is in-built with the semantics of writing the program so you don't need to worry about explicit techniques like queues, barriers, mutexes, futures and so on.
 
Would be happy to know more about your implementation !

Cheers,
Harsha
---
www.foobar.systems

Humberto Madeira

neskaityta,
2016-07-30 16:36:532016-07-30
kam: Flow Based Programming
Hi Harsha,


On Friday, 29 July 2016 12:09:13 UTC-4, Harsha wrote:

The author of Full Metal Jacket provides psudeo-code for the implementation of Pure Data Flow / Dynamic Data Flow in
http://web.onetel.net.uk/~hibou/fmj/interpreter.pdf

Thanks for the link.

A couple of comments on the linked article:

1) It's a pretty good description of how it works at runtime, but it doesn't really cover 
how the dataflow program is generated from the diagram.

Based on the abstract looking very academic, and on mentions in the article about being interpreted, 
I am going to assume that the diagram editor first generates Emblem code which is then interpreted. 

Programmatic code generation seems to be very popular with academics, but I am really not a fan of it.

In the commercial world, having to ship a compiler along with your code is generally a licensing/distribution problem.
Getting ther customer to install it separately, specially if you are version dependent, is a bigger problem.

On top of all that, just having to run generated code also tends to be a great big security hole 
that will exclude you from certain types of customers (like banks).

2) Every graph they show is a DAG (directed acyclic graph) - is looping not available?
IMHO, if you don't handle looping in the network, you will miss some very important code constructs.

Maybe this is because it is still incomplete?

Let us consider a Node with m input ports and n output ports.
In Pure / Dynamic Data Flow, the Node get's executed when Data is available at all of the m input ports.

I am also not a big fan of this "waiting for all inputs to be ready" approach.

In this model the Node is executed when the all the data is available, so the order of arrival of data at each individual port
can be asynchronous.

Arrival may be asynchronous, but will the data be processed in sequence or in parallel?

In KPN / FBP what happens is more like this, when new data is available the scheduler asks the following questions.

Is the Node in Running State ?
     If so, then push data into a waiting Queue
Is the Node in Ready State ?
    If so then push the data into the Node.
Is the Node in Waiting State ? 
   What is the port that the Node is waiting for ?
        Is the New Data for the waiting port ?
            If so, then push data into the Node and change the Node to Running State.
         Is the New Data not for the waiting port ?
            If so, then push data into a waiting Queue

Inside the Node, every time `port.read()` called it changes it's state from Running to Waiting.

Actually AFAIK, Classic FBP uses selectors to pull the data from the queue, and waits for each selected item, 
which guarantees order of availability (if not arrival)

As you can FBP is very generic and this allows for Asynchronous, out of order arrival at the input ports too.

My understanding with Classic FBP is that order within a dataflow is guaranteed to be preserved (implicit ordering)
This is because of the fifo queues in front of each component (actually inport).

I have no experience with SEDA so I can't speak for it. From the brief description it does seem like KPN with
explicit queues and all.

It has been my experience that all of these techniques seem to be vaguely the same.  

Until you look more closely. :-)

The beauty of FBP / Dynamic Data Flow is that, the concurrency synchronisation is in-built with the semantics of writing the program so you don't need to worry about explicit techniques like queues, barriers, mutexes, futures and so on.

+1 Agree - it just needs to get more traction.
 
Would be happy to know more about your implementation !

I suppose I can let out some tidbits...

For discussion purposes, the current working codename of my implementation is "Chicle".  
(Real name TBD)

It is my intention for it to be commercial, so some of the design will be avoided or hand-waved over.

Chicle has a 3-layered node model.

1) Tasks are mostly stateless - they are enumerable so you can select them in the editor/designer.

2) Operations wrap Tasks - they also include configuration state such as parameters and connection info

3) Steps wrap Operations - they also include run-time state (as distinct from configuration state)

Simple Tasks are essentially just code in the underlying platform wrapped in an enter method.  They need to be provided either in the framework or as pluging/expansion modules.
Sequences are collections of Operations - they may also include Exports/Imports and can be used directly as a complex Task

Jobs wrap Sequences - they associate a collection of Steps to the Operations within their contained Sequence.
Multiple Jobs can simultaneously wrap the same Sequence.

Jobs are what you run.

An Operation connects to a Flow name by associating it with an Inlet name or an Outlet name.
The actual Flow/Inlet/Outlet objects don't get created until the Sequence is wrapped into a Job

Flows act as broadcast/manifolds.  Whatever data is fed by each associated Inlet is broadcast to every associated Outlet.

When a data item enters a Task (via the enter method), the item is wrapped in a TaskContext that also identifies the arrival Inlet, 
and provides access to the working context (sort of like a closure)

If a Task needs to store or retrieve its own run-time data, or read its configuration it is also accessible via the TaskContext.

When a Task needs to emit something, it asks the TaskContext for a named Outlet and calls its enter method with the output data.
Output data is enforced to be immutable (so it can be broadcast without problems)

The emitted Datum is wrapped into a WorkItem along with the next Inlet/Step to enter.
The WorkItem may be placed into a queue for scheduling purposes (more handwaving goes here).

The dataflow in Chicle is non-blocking by design, so instead of blocking, 
each Step will maintain a set of emission Counters which will track how many emitted WorkItems are extant/unconsumed.
If a Counter gets too large, the Step can suspend itself until the extant Count gets back below the threshold.

When all the Counters of all the Steps go to zero, the Job is complete and may be deleted.

The original Scheduler was Parallel/Asynchronous using a pool of Worker threads, 
but recent work has shown that other Schedulers may also be possible via configuration (both globally and locally).

In any case, the dataflow approach in Chicle makes no attempt to maintain (implicit) ordering 
(this is so that invocations of the same Step can go implicitly in parallel).

As such, sorting can be accomplished at the endpoints (such as in the display) 
or inside a Task by sorting on an explicit key.

The way to express a Sequence is to describe it in XML as a collection of operations.
This can be done manually, programmatically, or using a graphical designer/editor.

From the editor, you pick from a list of Tasks to create Operations.  
Operations may also require an associated model (another pick list) and you may need to configure them (via drop panel)
Connect Operations by linking Outlets to Inlets (or Exports). 

Note that, unlike in a lot of other dataflow editors, Chicle Flows are not represented as lines (they are broadcast manifolds), 
but as a diagram components with a label and a droplist

Once available, the sequence XML is then used to configure/instantiate a Sequence, which can then be retained in memory as a pre-configured Sequence,
or immediately executed as a Job.  Pre-configured Sequences can be used as Tasks by other Sequences (providing they are accessible).

---

So I have glossed over and omitted quite a bit, but as I previously stated, it is not yet quite ready for prime-time.

Hopefully this limited description is enough to enable comparison with other approaches.

Best Regards,
--Bert
 
Atsakyti visiems
Atsakyti autoriui
Persiųsti
0 naujų pranešimų