Feedback sought on yet another cells implementation!

7 views
Skip to first unread message

Anand Patil

unread,
Feb 19, 2009, 1:46:17 PM2/19/09
to Clojure
Hi all,

I would really appreciate some comments and criticism on my cells
implementation, which I've uploaded to this group's files as
dataflow.clj . My goal was to fully exploit the available concurrency,
but not do any unnecessary cell evaluations.


My solution is lazy rather than push-based; if you change a ref that
cells depend on, nothing happens to the cells until you actually ask
for their values. If you call

(evaluate [cell1 cell2 cell3 ...])

first cell1, cell2, cell3, ... and all their ancestors have their
values set to {:updating}, and the root cells of the computation (all
ancestors of cell1 etc. whose parents are non-cell identities like
refs) are identified. Then, messages are sent to the root cells
telling them to recompute their values.

Each time a cell p is able to compute its value, it sends a message to
each of its children c that needs to update. The message extends c's
value, which is a map, with the key and value {p (p's value)}. If c
has received reports from all its cell parents, it retrieves values of
its non-cell parents, computes its value, and reports to its children.
When cell1, cell2, cell3 ... are all able to compute, the computation
is over. You can optionally call

(sync-evaluate [cell1 cell2 cell3 ...])

which blocks until the computation finishes using a Java countdown
latch.


This approach (I think) guarantees that each cell recomputes at most
once per computation. This helps certain models perform much better
than with push-based cells. For example, think about m generations of
n cells each, where every cell in a generation is a parent of every
cell in the next generation. Say all the root cells depend on a single
ref. If you change the value of the ref, all nm cells need to update.
If you do it with pushes, you first have to update all the root cells,
for n updates. Then each root cell pushes to all of its children, for
n^2 updates. You end up doing waaay more than nm updates.

You could update the generations sequentially to get back to good
performance... but I feel like my approach will incur less overhead
than finding all of the generations.


The example in the testing section of dataflow.clj is less dramatic.
The dependency graph, with -> pointing from parents to child, is:

a b -> c
a c -> d
a -> e
c e -> f

The roots are a and b, and each update function takes 1s.

With my implementation, it runs in 3s with 6 updates:
- a and b update together
- c and e update together
- d and f update together

whereas if you were using a push-based scheme I think it would take 4s
with 9 updates. It would take 4s because:
- a and b update together
- c has to update twice in sequence, once in response to a and once in
response to b
- d and f have to update together in response to c's final update.


My implementation runs the test, but:
- It's pretty ugly,
- I'd eventually like it to be fast & am sure it's nowhere near its
potential,
- I feel like I'm not taking full advantage of the language.

After the basic idea gets some scrutiny, I'd like to include a system
for handling exceptions. I'll also make the cells maintain caches
keyed by their non-cell ancestors. This will save a lot of
coordination overhead in addition to actual computation because if you
evaluate a cell twice in a row it won't have to bother its cell
ancestors at all the second time.

I haven't given much thought to circular dependencies, because I don't
currently need them.


Thanks very much!

Anand

Anand Patil

unread,
Feb 21, 2009, 4:26:41 AM2/21/09
to Clojure
Hi all,

My second attempt is lazy-cells.clj in this group's files. The cells
are actual agents rather than ugly maps, and all the information about
parents is hidden in watchers.

If you change a cell, the cells that depend on it all change their
values to {:needs-update true}. You can subsequently tell cells a b c
d e to update by calling (update a b d c e), which will change their
values to {:needs-update true :updating true} and eventually an actual
value. To wait for and get the results, do (evaluate a b c d e).

The updates should (hopefully) be as economical as in the first
version.

Thanks,
Anand


On Feb 19, 6:46 pm, Anand Patil <anand.prabhakar.pa...@gmail.com>
wrote:
> Hi all,
>
> I would really appreciate some comments and criticism on mycellsimplementation, which I've uploaded to this group's files as
> dataflow.clj . My goal was to fully exploit the available concurrency,
> but not do any unnecessary cell evaluations.
>
> My solution is lazy rather than push-based; if you change a ref thatcellsdepend on, nothing happens to thecellsuntil you actually ask
> for their values. If you call
>
>     (evaluate [cell1 cell2 cell3 ...])
>
> first cell1, cell2, cell3, ... and all their ancestors have their
> values set to {:updating}, and the rootcellsof the computation (all
> ancestors of cell1 etc. whose parents are non-cell identities like
> refs) are identified. Then, messages are sent to the rootcells
> telling them to recompute their values.
>
> Each time a cell p is able to compute its value, it sends a message to
> each of its children c that needs to update. The message extends c's
> value, which is a map, with the key and value {p (p's value)}. If c
> has received reports from all its cell parents, it retrieves values of
> its non-cell parents, computes its value, and reports to its children.
> When cell1, cell2, cell3 ... are all able to compute, the computation
> is over. You can optionally call
>
>     (sync-evaluate [cell1 cell2 cell3 ...])
>
> which blocks until the computation finishes using a Java countdown
> latch.
>
> This approach (I think) guarantees that each cell recomputes at most
> once per computation. This helps certain models perform much better
> than with push-basedcells. For example, think about m generations of
> ncellseach, where every cell in a generation is a parent of every
> cell in the next generation. Say all the rootcellsdepend on a single
> ref. If you change the value of the ref, all nmcellsneed to update.
> If you do it with pushes, you first have to update all the rootcells,
> for n updates. Then each root cell pushes to all of its children, for
> n^2 updates. You end up doing waaay more than nm updates.
>
> You could update the generations sequentially to get back to good
> performance... but I feel like my approach will incur less overhead
> than finding all of the generations.
>
> The example in the testing section of dataflow.clj is less dramatic.
> The dependency graph, with -> pointing from parents to child, is:
>
> a b -> c
> a c -> d
> a -> e
> c e -> f
>
> The roots are a and b, and each update function takes 1s.
>
> With myimplementation, it runs in 3s with 6 updates:
> - a and b update together
> - c and e update together
> - d and f update together
>
> whereas if you were using a push-based scheme I think it would take 4s
> with 9 updates. It would take 4s because:
> - a and b update together
> - c has to update twice in sequence, once in response to a and once in
> response to b
> - d and f have to update together in response to c's final update.
>
> Myimplementationruns the test, but:
> - It's pretty ugly,
> - I'd eventually like it to be fast & am sure it's nowhere near its
> potential,
> - I feel like I'm not taking full advantage of the language.
>
> After the basic idea gets some scrutiny, I'd like to include a system
> for handling exceptions. I'll also make thecellsmaintain caches

Timothy Pratley

unread,
Feb 21, 2009, 6:16:22 AM2/21/09
to Clojure
Hi Anand,

Very interesting! Great to see so much cells activity, and yes for
long chains lazy cells are great.

My impressions:
(1) auto-agents by SS in contrib has a more convenient syntax [maybe
you can mimic it]
(2) I can add a watcher to one of your cells - it wont do anything
unless evaluate is called. I can see the advantage of not calculating
needlessly, but if it is not going to be updated dynamically it is a
bit misleading to allow watchers.
(3) I don't see the need for agents and watcher for lazy cells, after
all can't they just be formula which are evaluated/updated in the
correct order? I have a feeling this would be more efficient (seeing
that is your goal).
(4) Both lazy and watchable cells are useful.
(5) id? could be just: instance? IDeref x unless I'm missing
something
(6) case macro - not used in the code, and doesn't cond suffice?

Regards,
Tim.

Anand Patil

unread,
Feb 22, 2009, 5:46:47 AM2/22/09
to Clojure
Hi Tim, thanks for the feedback!


On Feb 21, 11:16 am, Timothy Pratley <timothyprat...@gmail.com> wrote:
> (1) auto-agents by SS in contrib has a more convenient syntax [maybe
> you can mimic it]

Agreed, it is nicer. At the moment 'def-cell' is already a stretch for
me, but maybe someday I'll get there. :)

> (2) I can add a watcher to one of your cells - it wont do anything
> unless evaluate is called. I can see the advantage of not calculating
> needlessly, but if it is not going to be updated dynamically it is a
> bit misleading to allow watchers.

The cell changes to {:needs-update true} as soon as one of its
ancestors changes (plus the time it takes for the messages to
propagate). Watchers could make use of that information, no?

> (3) I don't see the need for agents and watcher for lazy cells, after
> all can't they just be formula which are evaluated/updated in the
> correct order? I have a feeling this would be more efficient (seeing
> that is your goal).

It's the concurrency. If you have a cell setup like

(def a (agent 0))
(def-cell b fn [a])
(def-cell c fn [a])

the agent-based implementation allows b and c can update concurrently,
whereas if they were just formulas they would of course have to update
sequentially. Or do you have some clever use of pmap in mind?

Either way, I'm planning on using this for cells whose update
functions involve reasonably heavy number-crunching, so I'm hoping
that the concurrency will be worth the overhead it requires.

However the overhead in my implementation is admittedly substantial...
I couldn't figure out a way to keep the number of updates to a minimum
yet keep the updates concurrent that required less than two watchers
per cell.

> (4) Both lazy and watchable cells are useful.

Good point... and come to think of it, there's no reason why these
cells shouldn't be able to mix with Stuart Sierra's auto-agents, given
that they're both just agents with watchers.

> (5) id? could be just:  instance? IDeref x   unless I'm missing
> something

Thanks for the tip!

> (6) case macro - not used in the code, and doesn't cond suffice?

You're right, as of version two case is officially cruft, and anyway
cond does the same job. No problem, it was good macro practice. :-)

That was very helpful, I appreciate it! Please pass along any other
thoughts on this.

Anand

Timothy Pratley

unread,
Feb 22, 2009, 7:46:37 AM2/22/09
to Clojure
Thanks for the details Anand, I understand better now.


Reply all
Reply to author
Forward
0 new messages