Go and GPUs

Nikolay Dubina

unread,

Jun 25, 2021, 11:52:30 AM6/25/21

to golang-nuts

I tried to write down my own CUDA / NVIDIA GPU driver in native Go last weekend. To my great surprise, CUDA and pretty much all high performance software/hardware from NVIDIA is proprietary close-source C/C++ code. Meaning, you can't write native Go driver even if you wanted to. Forget Go, people are trying to reverse engineer it in C/C++ with limited success. From what I heard OpenCV is not a priority for NVIDIA either. Then I looked up what Apple is doing with their Neural Engine in latest chips. It too is closed-source Objective-C, Swift. I suspect situation with other powerful hardware is the same. Moore's law seem to be about GPU lately, and everyone is locking it in. Kind of not in the spirit open-source and Linux. That's quite bad state of affairs for Go and computing in general. Yikes!

Just want to see what others are thinking.

Marcin Romaszewicz

unread,

Jun 25, 2021, 1:12:11 PM6/25/21

to Nikolay Dubina, golang-nuts

Graphics chips have a lot of proprietary IP, some of which the manufacturers would like to keep secret. If you see source for one of these drivers, you will have a good idea about the hardware organization, so they keep everything secret. It stinks for us developers who want to write cross platform open source. The best bet right now, in my opinion, is to write CGO wrappers around platform native libraries, and sadly, they'll only work on some OS/hardware combinations.

On Fri, Jun 25, 2021 at 8:53 AM Nikolay Dubina <nikolay.d...@gmail.com> wrote:

I tried to write down my own CUDA / NVIDIA GPU driver in native Go last weekend. To my great surprise, CUDA and pretty much all high performance software/hardware from NVIDIA is proprietary close-source C/C++ code. Meaning, you can't write native Go driver even if you wanted to. Forget Go, people are trying to reverse engineer it in C/C++ with limited success. From what I heard OpenCV is not a priority for NVIDIA either. Then I looked up what Apple is doing with their Neural Engine in latest chips. It too is closed-source Objective-C, Swift. I suspect situation with other powerful hardware is the same. Moore's law seem to be about GPU lately, and everyone is locking it in. Kind of not in the spirit open-source and Linux. That's quite bad state of affairs for Go and computing in general. Yikes!

Just want to see what others are thinking.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/0913f098-700a-443f-bd02-2db7ad2408a6n%40googlegroups.com.

Robert Engels

unread,

Jun 25, 2021, 1:32:51 PM6/25/21

to Marcin Romaszewicz, Nikolay Dubina, golang-nuts

Why not develop a Go <> CUDA binding using CGo?

On Jun 25, 2021, at 12:11 PM, Marcin Romaszewicz <mar...@gmail.com> wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CA%2Bv29LvH8eZqX8ASryOgBexMVUUyDO%2BTQFtaxz8Y-EDi92hkEA%40mail.gmail.com.

Tom Mitchell

unread,

Jun 25, 2021, 2:11:56 PM6/25/21

to Nikolay Dubina, golang-nuts

As others have said, lots of secret sauce which includes the instruction set for the function blocks in silicon.
Thus there is no assembler for the compiler that generates the code. Other chunks of the necessary tool chain are also absent or homegrown (no document other than source).

The best advice is to look at the installation scripts that bind secret sauce to the exposed API wrapper.
Look at wrappers and install scripts for the same GPU on multiple systems if you can.

Decades ago I had to be careful when debugging customer problems because depending on the graphics engine you could walk down customer code to GL library and system C and ASM code to a off the shelf digital signal processor code and then custrom processors in VHDL. The C&ASM->hardware transition was almost seamless to a reader with a good tag file and the full source. Functional symbols inside the graphics library drivers were not intended to be used except by the GL but some were too tempting and those symbols had to be edited out of shipped binary objects to clarify the ABI and keep the system stable.

Ask nicely for help from the vendor.

I would assert() that GO could be more useful in large clusters of GPU rich systems if facilitated.
Beware bogus asserts().

On Fri, Jun 25, 2021 at 8:52 AM Nikolay Dubina <nikolay.d...@gmail.com> wrote:

I tried to write down my own CUDA / NVIDIA GPU driver in native Go last weekend. To my great surprise, CUDA and pretty much all high performance software/hardware from NVIDIA is proprietary close-source C/C++ code. Meaning, you can't write native Go driver even if you wanted to. Forget Go, people are trying to reverse engineer it in C/C++ with limited success. From what I heard OpenCV is not a priority for NVIDIA either. Then I looked up what Apple is doing with their Neural Engine in latest chips. It too is closed-source Objective-C, Swift. I suspect situation with other powerful hardware is the same. Moore's law seem to be about GPU lately, and everyone is locking it in. Kind of not in the spirit open-source and Linux. That's quite bad state of affairs for Go and computing in general. Yikes!

Just want to see what others are thinking.

--

You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/0913f098-700a-443f-bd02-2db7ad2408a6n%40googlegroups.com.

--

T o m M i t c h e l l ( o n N i f t y E g g )

Michael Poole

unread,

Jun 25, 2021, 9:03:10 PM6/25/21

to Nikolay Dubina, golang-nuts

On Fri, Jun 25, 2021 at 11:52 AM Nikolay Dubina wrote:
>
> I tried to write down my own CUDA / NVIDIA GPU driver in native Go last weekend. To my great surprise, CUDA and pretty much all high performance software/hardware from NVIDIA is proprietary close-source C/C++ code. Meaning, you can't write native Go driver even if you wanted to. Forget Go, people are trying to reverse engineer it in C/C++ with limited success. From what I heard OpenCV is not a priority for NVIDIA either. Then I looked up what Apple is doing with their Neural Engine in latest chips. It too is closed-source Objective-C, Swift. I suspect situation with other powerful hardware is the same. Moore's law seem to be about GPU lately, and everyone is locking it in. Kind of not in the spirit open-source and Linux. That's quite bad state of affairs for Go and computing in general. Yikes!
>
> Just want to see what others are thinking.

That would be a very nice thing to have. I see four basic areas where
this becomes tricky or requires a lot of work.

1) Getting memory management right for sharing with the accelerator
device. Shared memory buffers need to stay locked for longer than
just the call to a library or to the OS kernel, but in the worse case
could use something like Manish Rai Jain of Dgraph described at
https://dgraph.io/blog/post/manual-memory-management-golang-jemalloc/
. A lot of Nvidia's recent CUDA programmer-productivity improvements
have focused around transparent data movement, and Go's garbage
collector probably breaks the assumptions for those.

2) "Friendly" source development (like CUDA C) integrates the host and
target code into a single code base, with some kind of markup or
compiler hint about which functions should be compiled for the
accelerator, and which function calls need to be treated specially.
Those "special" function calls must be translated to dispatch the call
(usually with additional arguments like the grid/NDRange parameters)
to the accelerator rather than as a normal function call. This is a
compiler front-end problem, which is almost certainly a lot easier
with Go than with C or C++, but still requires attention and perhaps
considerable effort because it requires multiple back-ends to run for
some functions. In the worst case, do like OpenCL and require the
programmer to provide or build strings containing program text, along
with verbose code to set up calls to the accelerator.

3) Generating code for the accelerator. SPIR-V is an obvious
candidate for portable uses; it integrates reasonably with current
OpenCL as well as Vulkan. Nvidia makes a reasonable compromise for
this with their PTX assembly pseudo-language: They have good
documentation about which PTX instructions are only supported on some
GPU generations, they document when the translation to actual machine
code varies in major ways, and they even have a decent API for
compiling application-provided PTX code. This is a compiler back-end
problem, conditional on not accepting the "worst case" in #2 above.

4) Integrating target libraries with the rest of an application. A
large part of the value proposition for CUDA is that it has a lot of
highly optimized libraries available out of the box: cuFFT,
cuBLAS/NVBLAS, and more. These are a hybrid between GPU elements and
host elements, and a lot of the host elements end up being black boxes
with respect to other languages. The most general fix is to call out
to C, which is not satisfying for portability.

If I were going to spend the time on this, I would probably target
SPIR-V with Vulkan or OpenCL for portability rather than Nvidia's PTX
or Apple's Metal. AMD, Nvidia and Google (Android) all have good
support for SPIR-V through their Vulkan stacks, and there are
third-party Vulkan layers that run on MacOS.

- Michael Poole

David Riley

unread,

Jun 25, 2021, 10:20:58 PM6/25/21

to Robert Engels, Marcin Romaszewicz, Nikolay Dubina, golang-nuts

On Jun 25, 2021, at 1:32 PM, Robert Engels <ren...@ix.netcom.com> wrote:
>
> Why not develop a Go <> CUDA binding using CGo?

This (ditto for OpenCL, Vulkan, etc) is more likely the path you'll have to go down. Generally all of these interfaces rely on pretty massive libraries from NVIDIA, AMD, Intel, etc. which are only going to have a C ABI because basically every other language on the planet uses the C ABI (with exceptions for interpreted languages like Java or Pythons, which contain adaptations like JNI to the C ABI, whose purpose is fulfilled by CGo here).

You're not going to get a non-CGo interface to any mainstream GPU programming interface unless you happen to have enough money to convince Intel, AMD and NVIDIA to go to the trouble of writing them, at which point you will also be involving massive industry committees. You don't want that.

CGo is an acceptable tradeoff here, IMO, because the overhead of bus transactions to the GPU is likely to be much worse than any overhead CGo adds unless you're doing something terribly wrong. You want large, infrequent transactions that you can queue up, otherwise you'll eat up all your time waiting for transactions anyway.

Consider that plenty of high performance GPU computation is done using plain Python interfaced to CUDA. The interface to the library isn't what matters, it's what you do with it.

These interfaces already exist in various states of maintenance, FWIW:

https://github.com/gorgonia/cu (last updated a year or so ago)
https://github.com/rafaelescrich/cuda-go (last updated about half a year ago)

- Dave

Robert Engels

unread,

Jun 25, 2021, 10:24:43 PM6/25/21

to Michael Poole, Nikolay Dubina, golang-nuts

There is also a LOT of support for Java and CUDA/OpenCL. You can essentially reimplement them Java portion in Go. There are multiple open source projects in this area.

Might be a lot easier than starting from scratch.

> On Jun 25, 2021, at 8:03 PM, Michael Poole <mdp...@troilus.org> wrote:

> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAOU-OAK3dUn0hDAvEt3ayhQO4Ryg2-bMpaKZTCxW086%2BYFFnpQ%40mail.gmail.com.

David Riley

unread,

Jun 25, 2021, 10:25:41 PM6/25/21

to Robert Engels, Michael Poole, Nikolay Dubina, golang-nuts

On Jun 25, 2021, at 10:23 PM, Robert Engels <ren...@ix.netcom.com> wrote:
>
> There is also a LOT of support for Java and CUDA/OpenCL. You can essentially reimplement them Java portion in Go. There are multiple open source projects in this area.
>
> Might be a lot easier than starting from scratch.

Yes, CGo is going to be similar to JNI in this context.

- Dave

Robert Engels

unread,

Jun 25, 2021, 10:31:47 PM6/25/21

to David Riley, Michael Poole, Nikolay Dubina, golang-nuts

Agreed. (Didn’t see your previous message before I sent that).

> On Jun 25, 2021, at 9:25 PM, David Riley <frave...@gmail.com> wrote:

Kevin Chadwick

unread,

Jun 27, 2021, 11:37:48 AM6/27/21

to golang-nuts

Didn't AMD and Intel open source their drivers. Or are you talking about firmware here?

I thought that is how OpenBSD can run well with them but not with Nvidia hw?

Reply all

Reply to author

Forward