Go, CPU and GPU

Russel Winder

unread,

Jun 20, 2011, 2:40:53 AM6/20/11

to GoLang Nuts

It is becoming more and more obvious that mixed CPU/GPU processors are
soon to be coming down the Intel production line. We all knew this was
going to happen two or three years ago, but all the corporate
positioning is beginning to happen, so I guess we can expect more
announcements fairly soon.

Apple's position is clearly Objective-C or Objective-C++ and OpenCL.
NVIDIA will no doubt continue to push CUDA whilst being accepting of
OpenCL. CUDA and OpenCL are though C level technologies, so only just
above the level of assembly language.

Microsoft have now staked out their position with C++ AMP (which has
come as a surprise to many given is dissociation from .NET).

The Java Platform doesn't really have a position per se, but the GPars
project really should be looking at how to support this just as soon as
it can -- sadly it might involve JNI :-(

Intel only backs C and C++ technology, so they will not have anything to
say about Java.

So an obvious question is where is Go in all this. I believe there is
no "Go Manifesto" that covers this situation. Are people working on it?
Is there any interest in doing so?

Thanks.

--
Russel.
=============================================================================
Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel...@ekiga.net
41 Buckmaster Road m: +44 7770 465 077 xmpp: rus...@russel.org.uk
London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder

signature.asc

a...@google.com

unread,

Jun 20, 2011, 5:08:30 AM6/20/11

to golan...@googlegroups.com

The Go team itself isn't working on this, and we have no plans to do so.

But there's nothing in the language itself that prevents us from targeting GPUs. In fact, Go's concurrency model could be well-suited to heterogeneous processor environments. (Wouldn't it be cool to start a goroutine on a GPU and communicate with it via channels?)

Andrew

fango

unread,

Jun 20, 2011, 5:50:14 AM6/20/11

to golang-nuts

The formula is actually available today: GPGPU = Go + (GPU + WebGL +
WebSocket)i

Of course it is complex, and Go language is orthogonal to whatever Web
or chip technology has to offer, because CUDA or OpenCL is vendor
driver thingy, and Go is not.

Cheers,
fango

Paulo Pinto

unread,

Jun 20, 2011, 6:44:54 AM6/20/11

to golang-nuts

I fail to see this solution catch on HPC world.

André Moraes

unread,

Jun 20, 2011, 7:39:19 AM6/20/11

to golang-nuts

On Mon, Jun 20, 2011 at 6:50 AM, fango <fan.h...@gmail.com> wrote:
> The formula is actually available today: GPGPU = Go + (GPU + WebGL +
> WebSocket)i

I think this is over complex.

Maybe some small program in C that can communicate with the GPU (do
the hard thing) and back to Go is a better approach.

I don't think is possible run the CGO libraries on the GPU and the GO
programns on the CPU.
But making a protocol to control that kind of communication isn't very
hard at a first thought.

--
André Moraes
http://andredevchannel.blogspot.com/

ivorget

unread,

Jun 20, 2011, 10:09:55 AM6/20/11

to golang-nuts

Maybe Go should borrow/steal from this:

http://www.ateji.com/px/index.html

It seems to be a very simple means of writing parallel code with some
minimal constructs added on top of java.

Colm

On Jun 20, 1:39 pm, André Moraes <andr...@gmail.com> wrote:

xavier...@gmail.com

unread,

Jun 21, 2011, 6:01:13 AM6/21/11

to golang-nuts

Just about Java, JavafX 2.0 aims GPU too through Prism, their native
graphical stack. It is promising and alreday quite efficient..

regards

> Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.win...@ekiga.net

> 41 Buckmaster Road m: +44 7770 465 077 xmpp: rus...@russel.org.uk
> London SW11 1EN, UK w:www.russel.org.uk skype: russel_winder
>

> signature.asc
> < 1KViewDownload

patric...@ateji.com

unread,

Jun 21, 2011, 11:17:53 AM6/21/11

to golang-nuts

I'd be happy to help ;-)

I am wondering however to which extent the coroutine paradigm can be
extended in this direction. Parallelism in Ateji PX is based on an
algebraic composition operator, and this is precisely what makes it at
the same time "minimal" and expressive (data + task parallelism, multi-
gpu, numa, clusters, etc).

Patrick, Ateji.

Kyle Lemons

unread,

Jun 21, 2011, 1:09:25 PM6/21/11

to patric...@ateji.com, golang-nuts

To me, it seems the first step (not necessarily the easiest step) might be to do some static analysis of the goroutines a program creates and figure out if it spends much of its time in a loop, and if it does, whether that loop (or loops) can be parallelized. The process of trying to hack out this sort of analysis might lead you to discover a good way to write Go code that can be parallelized, which might lead you to some syntactic sugar that makes this easy and more expressive.

~K

Fabio Kaminski

unread,

Jun 21, 2011, 10:48:59 PM6/21/11

to Kyle Lemons, patric...@ateji.com, golang-nuts

oh.. a good gallium3d state-tracker for go its another solution... i
think the less painful one..

On Tue, Jun 21, 2011 at 11:46 PM, Fabio Kaminski
<fabiok...@gmail.com> wrote:
> i think the most succesfull hardware formula is the one produced by
> intel and amd now.. with cpu and gpu "integrated"
> in the end its just means a cpu with good simd and vectorization support..
>
> with intel for instance when you use their solution to opencl in fact
> you are just using special avx registers ..
>
> with processor industry creating these news cpus.. the best thing to
> do is just wait, until they become the norm...
>
> and then.. a lot of work in the compilers will have to be made.. and
> i think some language sintax sugar will be helpfull to like this weird
> one:
>
> v1, v2, v3, v4 += 3.7, 7.8, 9.9, 8.3 ---> all in one "shot" on the processor
>
> but of course.. compiler can do this hard work in our behalf with
> actual language constructs..
>
> so in my opinion.. the best thing to do.. its just wait til things calm down ..
>
> its just to much work to be done.. cause.. you have to work with all
> gpus.. ati, nvidia, and others.. each one with its own proprietary and
> complex bytecode..
>
> if somebody really hates c99 cl.. maybe the shortest path its to
> create some glue to llvm bytecode right now.. otherwise.. to much a
> headache.. :s
>
> better to wait for more and bigger vector support from the new cpus..

Fabio Kaminski

unread,

Jun 21, 2011, 10:46:01 PM6/21/11

to Kyle Lemons, patric...@ateji.com, golang-nuts

i think the most succesfull hardware formula is the one produced by
intel and amd now.. with cpu and gpu "integrated"
in the end its just means a cpu with good simd and vectorization support..

with intel for instance when you use their solution to opencl in fact
you are just using special avx registers ..

with processor industry creating these news cpus.. the best thing to
do is just wait, until they become the norm...

and then.. a lot of work in the compilers will have to be made.. and
i think some language sintax sugar will be helpfull to like this weird
one:

v1, v2, v3, v4 += 3.7, 7.8, 9.9, 8.3 ---> all in one "shot" on the processor

but of course.. compiler can do this hard work in our behalf with
actual language constructs..

so in my opinion.. the best thing to do.. its just wait til things calm down ..

its just to much work to be done.. cause.. you have to work with all
gpus.. ati, nvidia, and others.. each one with its own proprietary and
complex bytecode..

if somebody really hates c99 cl.. maybe the shortest path its to
create some glue to llvm bytecode right now.. otherwise.. to much a
headache.. :s

better to wait for more and bigger vector support from the new cpus..

On Tue, Jun 21, 2011 at 2:09 PM, Kyle Lemons <kev...@google.com> wrote:

Oleku Konko

unread,

Aug 6, 2013, 10:38:27 AM8/6/13

to golan...@googlegroups.com

GPU support would be a real plus .... Its not in their plain 2 years ago .. I wonder if its in their plan now ..

On Monday, June 20, 2011 7:40:53 AM UTC+1, Russel Winder wrote:

It is becoming more and more obvious that mixed CPU/GPU processors are
soon to be coming down the Intel production line. We all knew this was
going to happen two or three years ago, but all the corporate
positioning is beginning to happen, so I guess we can expect more
announcements fairly soon.
Apple's position is clearly Objective-C or Objective-C++ and OpenCL.
NVIDIA will no doubt continue to push CUDA whilst being accepting of
OpenCL. CUDA and OpenCL are though C level technologies, so only just
above the level of assembly language.
Microsoft have now staked out their position with C++ AMP (which has
come as a surprise to many given is dissociation from .NET).
The Java Platform doesn't really have a position per se, but the GPars
project really should be looking at how to support this just as soon as
it can -- sadly it might involve JNI :-(
Intel only backs C and C++ technology, so they will not have anything to
say about Java.
So an obvious question is where is Go in all this. I believe there is
no "Go Manifesto" that covers this situation. Are people working on it?
Is there any interest in doing so?
Thanks.

--
Russel.
=============================================================================
Dr Russel Winder t: +44 20 7585 2200 voip: sip:russ...@ekiga.net

Michael Jones

unread,

Aug 6, 2013, 11:15:17 AM8/6/13

to Oleku Konko, golang-nuts

One very general and natural step could be extending the language from scalar expressions by adding vector, matrix, and mixed {S,V,M} op {S, V, M} expressions. These are readily implemented (same generated code as now) by just presuming the implied loops and scalar code, but can be accelerated on CPUs where that makes sense with vector instructions (AVX), cache-aware optimization libraries like BLAS, or attached heterogeneous processors.

var a,b [30]int

:

a = 0

a = b

a = b-3

a += b

a = a-b

all have a natural meaning that a "vector-to-vector" or "scalar to vector" understanding in the compiler could readily implement:

for i := 0; i < 30; i++ {

a[i] = 0

}

and just the same, a SIMD machine could do this 4 or 8 times faster (or more given overlapped vector load, op, and store instructions with cache-aware hints).

I've thought that slices could make this very natural too:

var x,y []float64

:

x[0:16] -= y[16:32]

That the possibility side. Here's the reality side: such efforts have historically been unsuccessful. There is not enough vector/matrix code in ordinary applications to make much overall performance difference, yet, those applications where there is enough and the benefit is extraordinary need to do many subtle things that are not captured by the ordinary semantics above (multiply two band-symmetric matrices, say, or set a matrix to the identity matrix for its size). These applications gravitate toward math libraries with comprehensive support. The result is that language designers generally say no.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

Michael T. Jones | Chief Technology Advocate | m...@google.com | +1 650-335-5765

Dmitry Vyukov

unread,

Aug 6, 2013, 11:58:22 AM8/6/13

to Michael Jones, Oleku Konko, golang-nuts

Another option is to create a specialized Go-like language that
supports only vector/matrix operations and implement compiler/runtime
as a wrapper around OpenCL or something else. Then one would be able
to do:

// a.pgo
func Foo(a, b, c []float64) {
a = a*b +c
}

// main.go
func main() {
var a, b, c []float64
...
Foo(a, b, c)
...
}

This does not require any Go support, and can be implemented
completely aside. When/if it is successful and mature enough, go tool
can be extended to handle pgo files.

Kevin Gillette

unread,

Aug 6, 2013, 12:59:40 PM8/6/13

to golan...@googlegroups.com, Oleku Konko

If this were ever added to the language proper, I'd expect we'd need to invent new operators to make vector ops readily distinguishable from scalar ops. Strings already support + to mean concatenation, so it couldn't use plus to mean "value of each byte is added to X". Additionally, some way would need to be devised to mitigate the cost of redundant copies to get past type restrictions. For example:

type X int

x := make([]X, 10000)

y := make([]int, 10000)

// change some data in x and y, then...

y *= x

Even if vector ops support were implemented, the last line would not work according to existing semantics because x is not convertible to y (although the element types are convertible to each other), and making runtime copies as an indulgence to language semantics would bleed off some of the value of SIMD support.

Additionally, it'd be desirable to allow arbitrary expressions containing vector ops, which the compiler would arrange to perform in as many passes as necessary. This may, however, be infeasible when working with slices.

Sebastien Binet

unread,

Aug 6, 2013, 1:23:31 PM8/6/13

to Dmitry Vyukov, Michael Jones, Oleku Konko, golang-nuts

On Tue, Aug 6, 2013 at 5:58 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> Another option is to create a specialized Go-like language that
> supports only vector/matrix operations and implement compiler/runtime
> as a wrapper around OpenCL or something else. Then one would be able
> to do:
>
> // a.pgo
> func Foo(a, b, c []float64) {
> a = a*b +c
> }
>
> // main.go
> func main() {
> var a, b, c []float64
> ...
> Foo(a, b, c)
> ...
> }
>
> This does not require any Go support, and can be implemented
> completely aside. When/if it is successful and mature enough, go tool
> can be extended to handle pgo files.

sounds like vgo:
https://github.com/remyoudompheng/go-vectops

-s

Dmitry Vyukov

unread,

Aug 6, 2013, 1:36:39 PM8/6/13

to Sebastien Binet, Michael Jones, Oleku Konko, golang-nuts

yeah, looks very similar
However full support should include GPU, coprocessors like MIC, etc.
and load balancing across CPU/GPU/MIC. At this point it's easier to
wrap OpenCL, Intel Ct or something similar.

Archos

unread,

Aug 6, 2013, 3:01:31 PM8/6/13

to golan...@googlegroups.com

This news is very related to this post: http://www-03.ibm.com/press/us/en/pressrelease/41684.wss

"The Consortium intends to build advanced server, networking, storage and GPU-acceleration technology aimed at delivering more choice, control and flexibility to developers of next-generation, hyperscale and cloud data centers."
"As part of their initial collaboration within the consortium, NVIDIA and IBM will work together to integrate the CUDA GPU and POWER ecosystems."

El lunes, 20 de junio de 2011 07:40:53 UTC+1, Russel Winder escribió:

It is becoming more and more obvious that mixed CPU/GPU processors are
soon to be coming down the Intel production line. We all knew this was
going to happen two or three years ago, but all the corporate
positioning is beginning to happen, so I guess we can expect more
announcements fairly soon.
Apple's position is clearly Objective-C or Objective-C++ and OpenCL.
NVIDIA will no doubt continue to push CUDA whilst being accepting of
OpenCL. CUDA and OpenCL are though C level technologies, so only just
above the level of assembly language.
Microsoft have now staked out their position with C++ AMP (which has
come as a surprise to many given is dissociation from .NET).
The Java Platform doesn't really have a position per se, but the GPars
project really should be looking at how to support this just as soon as
it can -- sadly it might involve JNI :-(
Intel only backs C and C++ technology, so they will not have anything to
say about Java.
So an obvious question is where is Go in all this. I believe there is
no "Go Manifesto" that covers this situation. Are people working on it?
Is there any interest in doing so?
Thanks.

--
Russel.
=============================================================================
Dr Russel Winder t: +44 20 7585 2200 voip: sip:russ...@ekiga.net

Tharaneedharan Vilwanathan

unread,

Aug 6, 2013, 9:57:39 PM8/6/13

to golang-nuts

Hi,

Sorry if I am missing something obvious but does this package work?

> sounds like vgo:
> https://github.com/remyoudompheng/go-vectops

Is it actively maintained? I just tried to use it but couldn't get it working. Any steps on how to install and use?

Appreciate your help.

Thanks

dharani

Ian Wetherbee

unread,

Aug 7, 2013, 12:32:21 PM8/7/13

to mgo...@google.com, golang-nuts

That was the goal of https://github.com/wetherbeei/gopar. Now it identifies and runs loops in parallel where safe in goroutines, but the goal was to generate OpenCL from the inner loop and launch that.

Ian Wetherbee

On Fri, Feb 1, 2013 at 4:53 AM, <mgo...@google.com> wrote:

Go seems to be a really good fit for GPU stuff since it has soft, runtime-managed threads (aka goroutines) built in.
The suggestion to compile goroutines into GPU code by unrolling loops is good but we might not yet have a technology that does that reliably.

A more conservative approach could be to use the SIMD principle, in this case Single Goroutine Multiple Data. The goroutine would define the processing and the runtime would apply it to specified vector of data, executing the processing of each vector component on a GPU core.

The tricky part here is to extend the goroutine syntax to include the input vector and the output vector.

A possible limitation is that people might want to execute parallell calculations that do not fit the SIMD principle but are for example map-reduce based.

Any opinions ?

Martin

On Monday, June 20, 2011 8:40:53 AM UTC+2, Russel Winder wrote:

It is becoming more and more obvious that mixed CPU/GPU processors are
soon to be coming down the Intel production line. We all knew this was
going to happen two or three years ago, but all the corporate
positioning is beginning to happen, so I guess we can expect more
announcements fairly soon.
Apple's position is clearly Objective-C or Objective-C++ and OpenCL.
NVIDIA will no doubt continue to push CUDA whilst being accepting of
OpenCL. CUDA and OpenCL are though C level technologies, so only just
above the level of assembly language.
Microsoft have now staked out their position with C++ AMP (which has
come as a surprise to many given is dissociation from .NET).
The Java Platform doesn't really have a position per se, but the GPars
project really should be looking at how to support this just as soon as
it can -- sadly it might involve JNI :-(
Intel only backs C and C++ technology, so they will not have anything to
say about Java.
So an obvious question is where is Go in all this. I believe there is
no "Go Manifesto" that covers this situation. Are people working on it?
Is there any interest in doing so?
Thanks.

--
Russel.
=============================================================================
Dr Russel Winder t: +44 20 7585 2200 voip: sip:russ...@ekiga.net

41 Buckmaster Road m: +44 7770 465 077 xmpp: rus...@russel.org.uk
London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder

--

Reply all

Reply to author

Forward