Clojure, Parallel programming and Leslie Lamport

Tim Daly

unread,

Dec 22, 2010, 3:28:49 AM12/22/10

to Clojure

Clojure works well for concurrency but does not really address
the parallel question well. For that I've turned to MPI.
I am working on using MPI from Clojure.
These are some links others might find interesting.

The video interview with Leslie Lamport
http://channel9.msdn.com/Shows/Going+Deep/E2E-Erik-Meijer-and-Leslie-Lamport-Mathematical-Reasoning-and-Distributed-Systems

Time, Clocks and the Ordering of Events in a Distributed System
http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf

In particular, I can highly recommend Leslie Lamport's site:
http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html

Tim Daly

Konrad Hinsen

unread,

Dec 22, 2010, 3:50:58 AM12/22/10

to clo...@googlegroups.com

On 22 Dec 2010, at 09:28, Tim Daly wrote:

> Clojure works well for concurrency but does not really address
> the parallel question well. For that I've turned to MPI.
> I am working on using MPI from Clojure.

That's a topic I am very interested in as well, although unfortunately
I never find the time to really do something. Some random thoughts
based on what I did look at in the past:

1) Parallel computing vs. distributed computing: these are two
different levels of complexity in my opinion. Parallel computing in a
shared-memory environment (e.g. fork/join style) is a much simpler
problem than parallel computing on distributed-memory systems, where
you have to take care of distributing data among the machines and try
to minimize data exchange in addition to balancing CPU load. There are
some interesting approaches in Clojure's par branch for the first
problem. The second one deserves to be tackled as well, but we should
use another label than "parallel" to reduce confusion.

2) MPI via Java - which one do you plan to use?

3) Exchanging data between nodes: as far as I know many Clojure data
types, in particular closures, are not serializable yet.

4) Efficient data exchange between nodes: it would be nice to able to
profit from MPI's efficiency for large homogeneous data sets (read:
arrays) in Clojure as well. Java arrays should be easy to handle
efficiently, but Clojure code tends to avoid them. Perhaps primitive-
type vectors could be transferred as arrays as well?

5) High-level layer: MPI is much too low-level for daily use. For
distributed programming in Clojure, I'd like to have a higher-level
model which abstracts away the synchronization issues that lead to
deadlocks, race conditions, and ultimately a miserable life for
programmers. There are some good ideas in the PGAS languages that
would perhaps work fine in a Clojure context as well.

> These are some links others might find interesting.

At first glance this looks promising - they are on my "to watch" list.
Thanks!

Konrad.

Sunil S Nandihalli

unread,

Dec 22, 2010, 8:02:42 AM12/22/10

to clo...@googlegroups.com

Hello Tim and Konrad,

I am interested in distributed parallel computing too ... I have prior experience coding with MPI and c .. but that besides the point .. while I was looking at options with clojure .. I recently came across swarmiji. https://github.com/amitrathore/swarmiji I don't know much to make any technical analysis of the tool .. just thought of throwing what little I knew in to the pot .. would love to hear any technical analysis from either of you or anybody else...

I come from the scientific computing community .. the likes of Computation Fluid Dynamics and related topics.. large matrix operations and such stuff..

Sunil.

On Wed, Dec 22, 2010 at 2:20 PM, Konrad Hinsen <konrad...@fastmail.net> wrote:

On 22 Dec 2010, at 09:28, Tim Daly wrote:

Clojure works well for concurrency but does not really address
the parallel question well. For that I've turned to MPI.
I am working on using MPI from Clojure.

That's a topic I am very interested in as well, although unfortunately I never find the time to really do something. Some random thoughts based on what I did look at in the past:

1) Parallel computing vs. distributed computing: these are two different levels of complexity in my opinion. Parallel computing in a shared-memory environment (e.g. fork/join style) is a much simpler problem than parallel computing on distributed-memory systems, where you have to take care of distributing data among the machines and try to minimize data exchange in addition to balancing CPU load. There are some interesting approaches in Clojure's par branch for the first problem. The second one deserves to be tackled as well, but we should use another label than "parallel" to reduce confusion.

2) MPI via Java - which one do you plan to use?

3) Exchanging data between nodes: as far as I know many Clojure data types, in particular closures, are not serializable yet.

4) Efficient data exchange between nodes: it would be nice to able to profit from MPI's efficiency for large homogeneous data sets (read: arrays) in Clojure as well. Java arrays should be easy to handle efficiently, but Clojure code tends to avoid them. Perhaps primitive-type vectors could be transferred as arrays as well?

5) High-level layer: MPI is much too low-level for daily use. For distributed programming in Clojure, I'd like to have a higher-level model which abstracts away the synchronization issues that lead to deadlocks, race conditions, and ultimately a miserable life for programmers. There are some good ideas in the PGAS languages that would perhaps work fine in a Clojure context as well.

These are some links others might find interesting.

At first glance this looks promising - they are on my "to watch" list. Thanks!

Konrad.

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Konrad Hinsen

unread,

Dec 22, 2010, 8:15:53 AM12/22/10

to clo...@googlegroups.com

On 22 Dec 2010, at 14:02, Sunil S Nandihalli wrote:

> I am interested in distributed parallel computing too ... I have
> prior experience coding with MPI and c .. but that besides the
> point .. while I was looking at options with clojure .. I recently
> came across swarmiji. https://github.com/amitrathore/swarmiji

Thanks for the link! Judging from the example in the README, it's a
library for task farming in Clojure. While that's a limited form of
parallelism, there are still lots of applications where it is useful,
so I'd say this library is definitely worth a closer look. However, it
doesn't seem to deal with distributed data.

> I come from the scientific computing community .. the likes of
> Computation Fluid Dynamics and related topics.. large matrix
> operations and such stuff..

My background is somewhat similar: molecular simulations and analysis
of large data sets.

Konrad.

Johann Hibschman

unread,

Dec 22, 2010, 8:59:14 AM12/22/10

to clo...@googlegroups.com

Konrad Hinsen <konrad...@fastmail.net> writes:

> Thanks for the link! Judging from the example in the README, it's a
> library for task farming in Clojure. While that's a limited form of
> parallelism, there are still lots of applications where it is useful,
> so I'd say this library is definitely worth a closer look. However, it
> doesn't seem to deal with distributed data.

Distributed data is hard, though, partly because kind of distribution
you need depends on your calculation. Every time I've had to do a
distributed calculations, I've always just used the filesystem for data.

I see a lot of frameworks that assume the data is small and can be
entirely contained in the "message," while I need some kind of data
affinity. (I do model estimation on large data sets, so I'd like to send
a lump of data to different nodes, leave it there, then exchange
parameter vectors and error scores with a controller.)

In today's world, I've found I get more done faster with a single 8-core
machine with a lot of RAM (96 GB now; at a previous employer I had
access to a 512 GB monster) than I would with a farm of machines with
only 4 GB or 8 GB, so I'm back to concurrency. Of course, that's just
because my data is large, but not too large.

>> I come from the scientific computing community .. the likes of
>> Computation Fluid Dynamics and related topics.. large matrix
>> operations and such stuff..
>
> My background is somewhat similar: molecular simulations and analysis
> of large data sets.

I did astronomy, but mostly small-scale stuff. Integration, cascade
calculations, the like. These days, though, I'm doing finance,
mortgages in particular. That's a field that's been fun for the past
few years.

-Johann

Reply all

Reply to author

Forward