Re: Is there a Go native approch to MPI (Message passing interface) ?

adam willis

unread,

Jun 10, 2015, 10:23:09 AM6/10/15

to golan...@googlegroups.com

Try http://gocircuit.github.io/circuit/. It might provide the functionality you need.

Serge Hulne

unread,

Jun 9, 2015, 4:40:39 PM6/9/15

to golan...@googlegroups.com

Hi, I would like to know if there is a build-in mechanism (or a typical Go paradigm) to address message passing interfaces.

Go solves the problem of message passing between goroutines using the goroutines / channels construct and is also solves the problem to spread the load over multiple processors when one isolated computer is involved.

Does Go also provide something similar (similar to goroutines / channels or to MPI) to distribute computing load across computers on a network ?

(I am aware that MPI is more targeted at parallel processing and goutines / channels are more targeted at making concurrency easy to formulate, but still).

Serge.

Roberto Zanotto

unread,

Jun 9, 2015, 7:37:54 PM6/9/15

to golan...@googlegroups.com

No sign of MPI in the golang specification nor in the standard library, so nothing built-in. But you can google for bindings or create them yourself with cgo.

Regarding channels over network, there was a package once (netchan), but was discontinued because of the difficulties of implementing the semantics of channels over the network. Check out docker/libchan.
Alternatively you could just use TCP connections, Go is the best language for doing networking stuff hands down ;)

Cheers.

Brendan Tracey

unread,

Jun 10, 2015, 2:48:52 PM6/10/15

to golan...@googlegroups.com

If you need "real" MPI, in theory you could use cgo to link in the library calls. I'm not sure if MPI needs to do tricky things that would make it hard to link it with Go code.

If you need MPI-like functionality, a simple version is not that hard to implement. I made github.com/btracey/mpi which I put together for educational purposes. There are problems with it (the message creation isn't as efficient is could be, and it does not do shutdown properly), but in broad strokes it does work. I'd like to make it better, but haven't found the time, so if you want to use it for real code you should fork and/or rewrite it. There are also infiniband messaging libraries that you could hypothetically use for fast communication if you have that, though I have not attempted to link them in with my code.

In many cases, MPI is not the right abstraction. See my code at the bottom of https://groups.google.com/d/msg/golang-nuts/Q7lwBDPmQh4/60Y3OGDe1mEJ for how one may make a server for evaluating a function. Again, that code is simplistic, but should be enough to get you started.

Dan Kortschak

unread,

Jun 10, 2015, 6:06:21 PM6/10/15

to Serge Hulne, golan...@googlegroups.com

Probably not what you want, but for broader scale distributed computing, you could try http://iris.karalabe.com/

Egon

unread,

Jun 11, 2015, 2:12:48 AM6/11/15

to golan...@googlegroups.com

On Tuesday, 9 June 2015 23:40:39 UTC+3, Serge Hulne wrote:

Hi, I would like to know if there is a build-in mechanism (or a typical Go paradigm) to address message passing interfaces.

Instead of asking about technology, ask about how to solve a practical real-world problem. It usually gives you more and better options to choose from.

What is the problem you are trying to solve? E.g. traveling salesman, page rank, optimization etc. What is the thing that results in value from the system?

What is the scale - are you solving for 10, 1000000 cities?

What are the limits - e.g. it should handle it in under 1s.

What are the computer resources you can access... i.e. can you use a single high-performant computer instead of a cluster of computers? How many computers are in the cluster?

That way you'll get better advice....

+ Egon

Serge Hulne

unread,

Jun 11, 2015, 7:22:26 AM6/11/15

to golan...@googlegroups.com

Good question.

I was wondering about the following issues :

1. In which cases a cluster of say 4 (or 10 or 100 for instance) Raspberry Pi mini computers can be more cost-effective than a single computer with the same amount of cores (does the cost of communicating the data between the computers via the network not outweigh the fact that they car run tasks simultaneously) ?

2. More generally if N is the Number of computers in a network, who does the comparison hold when N increases ?

3. Are clusters of computers only better at improving availability (in network bound applications as opposed to CPU bound applications) or can they actually add raw computing power when used in parallel and using the adequate data communicating (or data-sharing) method.

A concrete example of a subject that puzzles me is the following.

Some people claim (e.g. in videos on YouTube) that they have build a "super-computer" using networks of computers (typically "Raspberry Pi") containing from 4 to sometimes 64 or more Raspberry Pi. Does that make any sense ? Would a single computer with 64 cores not perform much faster (and have more RAM, etc ...), in particular to solve number crunching problems or large text processing problems (as opposed to IO bound problems).

Or stated otherwise:

The rise of the mini-computers (Raspberry Pi etc...) allows one to build relatively cheap clusters of computers, but what area of technology could theoretically benefit from the use of such clusters:

1. Only IO-bound applications (high-availability servers) ?

2. Also purely CPU-Bound applications and if yes, why ? and how ? (hence my initial question "can Go help in that respect ?"). Last but not least : if yes does Go have some built-in libraries (or methodology) which might prove useful in this context (trying improve the crunching power of a system by using a cluster of cheap computers).

What makes these questions rather delicate is that there seems to be a huge lot of confusion in the literature and the web about (related, but different) concepts like : Concurrency, availability, parallelism, network computing, clusters, muti-core, mulch-threading, parallelism of processes on a single computer and parallelizing a task across a cluster. Even in recent books (mostly books about Python...) concurrency seems to be used as a synonym for parallelism and threads, processes and co-routines are treated as synonyms whereas they are in fact totally different subjects.

Serge.

Konstantin Khomoutov

unread,

Jun 11, 2015, 8:51:10 AM6/11/15

to Serge Hulne, golan...@googlegroups.com

On Thu, 11 Jun 2015 04:22:26 -0700 (PDT)
Serge Hulne <serge...@gmail.com> wrote:

[...]

> 1. In which cases a cluster of say 4 (or 10 or 100 for instance)
> Raspberry Pi mini computers can be more cost-effective than a single
> computer with the same amount of cores (does the cost of
> communicating the data between the computers via the network not
> outweigh the fact that they car run tasks simultaneously) ?

[...]

> A concrete example of a subject that puzzles me is the following.
>
> Some people claim (e.g. in videos on YouTube) that they have build a
> "super-computer" using networks of computers (typically "Raspberry
> Pi") containing from 4 to sometimes 64 or more Raspberry Pi. Does
> that make any sense ? Would a single computer with 64 cores not
> perform much faster (and have more RAM, etc ...), in particular to
> solve number crunching problems or large text processing problems (as
> opposed to IO bound problems).

[...]

Well, I'm not an expert but let's look at the question this way: if
clusters of Raspberries were really as effective, we'd surely seen this
happening on a larger scale -- I mean, companies trying to occupy this
market niche. But we don't see this happening. The only thing I'm
watching happening with ARM architecture in the server market is
attempts at pushing energy-effective solutions (as opposed to x86-based
which are ubiquitous in data centers). What this means to me is that
those attempts to build "supercomputers" out of Raspberries are just
for fun of those enthusiasts which do them. And sure the word
"supercomputer" is not wrong here, it's just probably not what
everyone learned to imagine when seeing it ;-)

Egon

unread,

Jun 11, 2015, 8:53:36 AM6/11/15

to golan...@googlegroups.com

Note, I don't work with clusters and following is based on opinions. Please correct me if I said something wrong or particularly stupid... :)

On Thursday, 11 June 2015 14:22:26 UTC+3, Serge Hulne wrote:

Good question.

I was wondering about the following issues :

1. In which cases a cluster of say 4 (or 10 or 100 for instance) Raspberry Pi mini computers can be more cost-effective than a single computer with the same amount of cores (does the cost of communicating the data between the computers via the network not outweigh the fact that they car run tasks simultaneously) ?

The general answer is Amdahl's Law (http://en.wikipedia.org/wiki/Amdahl%27s_law), of course it's not always applicable (http://www.futurechips.org/thoughts-for-researchers/parallel-programming-gene-amdahl-said.html). When moving things to multiple-computers you'll get a larger overhead in communication when compared to a single-computer, at the same time you may reduce resource-contention for disk, RAM (or other resources). So depending where your bottlenecks are, it could go either way...

2. More generally if N is the Number of computers in a network, who does the comparison hold when N increases ?

Amdahl's law... :) There probably are revised editions of it.

3. Are clusters of computers only better at improving availability (in network bound applications as opposed to CPU bound applications) or can they actually add raw computing power when used in parallel and using the adequate data communicating (or data-sharing) method.

Yes they can, basically, at some point you hit a limit how many cores a single computer can have, or the cost to add another core, is significantly larger than equivalent other computer.

A concrete example of a subject that puzzles me is the following.

Some people claim (e.g. in videos on YouTube) that they have build a "super-computer" using networks of computers (typically "Raspberry Pi") containing from 4 to sometimes 64 or more Raspberry Pi. Does that make any sense ? Would a single computer with 64 cores not perform much faster (and have more RAM, etc ...), in particular to solve number crunching problems or large text processing problems (as opposed to IO bound problems).

Probably... although I haven't looked any stats or done any research to that particular problem. Of course there might be other factors at play - cost of the system and computing power per watt. ARM-s use generally less power, but also have less capabilities - so if you can efficiently program your algorithm using ARM instruction set you might win in power usage.

Or stated otherwise:

The rise of the mini-computers (Raspberry Pi etc...) allows one to build relatively cheap clusters of computers, but what area of technology could theoretically benefit from the use of such clusters:

1. Only IO-bound applications (high-availability servers) ?

2. Also purely CPU-Bound applications and if yes, why ? and how ? (hence my initial question "can Go help in that respect ?"). Last but not least : if yes does Go have some built-in libraries (or methodology) which might prove useful in this context (trying improve the crunching power of a system by using a cluster of cheap computers).

The approaches would be pretty much the same as you use in other languages. There's no magic solution for balancing distribution overhead and performance. Although there are languages that try to make distributed code easier to write -- e.g. http://halide-lang.org/.

What makes these questions rather delicate is that there seems to be a huge lot of confusion in the literature and the web about (related, but different) concepts like : Concurrency, availability, parallelism, network computing, clusters, muti-core, mulch-threading, parallelism of processes on a single computer and parallelizing a task across a cluster. Even in recent books (mostly books about Python...) concurrency seems to be used as a synonym for parallelism and threads, processes and co-routines are treated as synonyms whereas they are in fact totally different subjects.

At some point I looked into the terminology of "parallelism" and "concurrency" and there simply hadn't been set a clear precedent for the usage of those terms, hence the usage varies (I personally use them in this way http://blog.golang.org/concurrency-is-not-parallelism). I expect that with threads/processes/co-routines the case is same.

+ Egon

adam willis

unread,

Jun 11, 2015, 9:52:24 AM6/11/15

to golan...@googlegroups.com

parallel computing is a means in which multiple tasks are executed simultaneously, concurrent computing is a means by which those tasks coordinate execution. Here is a video about go, parallelism, concurrency: https://www.youtube.com/watch?v=cN_DpYBzKso

Brendan Tracey

unread,

Jun 11, 2015, 10:08:14 AM6/11/15

to Serge Hulne, golan...@googlegroups.com

2. Also purely CPU-Bound applications and if yes, why ? and how ? (hence my initial question "can Go help in that respect ?"). Last but not least : if yes does Go have some built-in libraries (or methodology) which might prove useful in this context (trying improve the crunching power of a system by using a cluster of cheap computers).

Do MPI implementations deal with reliability? I assume ones intended for huge supercomputers must, but I haven’t seen any articles on it. I’d be curious to know how they do it.

As Egon implied, even with your CPU-Bound constraint there are still a lot of ways to perform distributed memory processing. The right style will depend on the problem you wish to solve. SETI@HOME is a distributed-memory computation that does not use MPI.

Egon

unread,

Jun 11, 2015, 10:12:45 AM6/11/15

to golan...@googlegroups.com, serge...@gmail.com

On Thursday, 11 June 2015 17:08:14 UTC+3, Brendan Tracey wrote:

2. Also purely CPU-Bound applications and if yes, why ? and how ? (hence my initial question "can Go help in that respect ?"). Last but not least : if yes does Go have some built-in libraries (or methodology) which might prove useful in this context (trying improve the crunching power of a system by using a cluster of cheap computers).

Do MPI implementations deal with reliability? I assume ones intended for huge supercomputers must, but I haven’t seen any articles on it. I’d be curious to know how they do it.

checkpointing + restarting...

https://www.open-mpi.org/faq/?category=ft

http://stackoverflow.com/questions/4194965/open-mpi-mpich-what-happens-if-a-node-terminates

Joubin Houshyar

unread,

Jun 12, 2015, 9:04:42 AM6/12/15

to golan...@googlegroups.com

On Thursday, June 11, 2015 at 7:22:26 AM UTC-4, Serge Hulne wrote:

Good question.

I was wondering about the following issues :

You should not expect definitive answers to these very open ended questions, Serge.

In general, scaling up is almost always the most efficient and optimal approach.

Semantics:

Parallelism: go stand in the queue at McDonnalds. (Don't eat the food ..)

Concurrent: try a team sport.

Samuel Lampa

unread,

Sep 14, 2020, 6:25:58 AM9/14/20

to golang-nuts

On Thursday, June 11, 2015 at 2:53:36 PM UTC+2 Egon wrote:

1. In which cases a cluster of say 4 (or 10 or 100 for instance) Raspberry Pi mini computers can be more cost-effective than a single computer with the same amount of cores (does the cost of communicating the data between the computers via the network not outweigh the fact that they car run tasks simultaneously) ?

The general answer is Amdahl's Law (http://en.wikipedia.org/wiki/Amdahl%27s_law), of course it's not always applicable (http://www.futurechips.org/thoughts-for-researchers/parallel-programming-gene-amdahl-said.html). When moving things to multiple-computers you'll get a larger overhead in communication when compared to a single-computer, at the same time you may reduce resource-contention for disk, RAM (or other resources). So depending where your bottlenecks are, it could go either way...

Yes, and also note that super-computers often use special network protocols/technologies which support so called "Remote direct memory access" (RDMA) [1], such as Infiniband [2], to get acceptable performance for high-performance multi-core computations across compute nodes. Infiniband cards are pretty expensive as far as I know, so will probably outweigh the benefits of buying a lot of RPis.

I'd still be interested to hear if anybody knows about new developments on MPI for Go (for HPC use cases if nothing else)? :)

[1] https://en.wikipedia.org/wiki/Remote_direct_memory_access

[2] https://en.wikipedia.org/wiki/InfiniBand

Best

Samuel

Randall O'Reilly

unread,

Sep 14, 2020, 7:07:25 AM9/14/20

to Samuel Lampa, golang-nuts

I just wrote a wrapper around open mpi in Go: https://github.com/emer/empi

Also, here's a set of random go bindings I found:
• https://github.com/yoo/go-mpi
• https://github.com/marcusthierfelder/mpi
• https://github.com/JohannWeging/go-mpi
Even a from scratch implementation:

• https://github.com/btracey/mpi
Also a few libraries for using infiniband directly:

• https://github.com/Mellanox/rdmamap
• https://github.com/jsgilmore/ib
• https://github.com/Mellanox/libvma
Some discussion: https://groups.google.com/forum/#!topic/golang-nuts/t7Vjpfu0sjQ

- Randy

> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/d1d39602-e48e-4c2e-909b-a85d0d7e81ban%40googlegroups.com.

Samuel Lampa

unread,

Sep 14, 2020, 8:18:47 AM9/14/20

to golang-nuts

Nice, thanks a lot for sharing!

Best
Samuel

Vitaly

unread,

Sep 15, 2020, 12:46:44 AM9/15/20

to golang-nuts

https://nanomsg.org/ aka https://zeromq.org/

среда, 10 июня 2015 г. в 01:40:39 UTC+5, Serge Hulne:

Serge Hulne

unread,

Sep 15, 2020, 2:03:56 AM9/15/20

to golang-nuts

Thank you for the info.

Reply all

Reply to author

Forward