CoreOS and nVidia

1,311 views
Skip to first unread message

Сергей Дьяченко

unread,
Jun 6, 2014, 6:48:03 AM6/6/14
to coreo...@googlegroups.com
Hello CoreOS team,

I want to start nVidia CUDA apps in docker. So I have to install nVidia driver with kernel module. How can I compile kernel module and add it to CoreOS?

Brandon Philips

unread,
Jun 6, 2014, 2:56:18 PM6/6/14
to Сергей Дьяченко, coreos-user
On Fri, Jun 6, 2014 at 3:48 AM, Сергей Дьяченко <diache...@gmail.com> wrote:
> I want to start nVidia CUDA apps in docker. So I have to install nVidia
> driver with kernel module. How can I compile kernel module and add it to
> CoreOS?

Do you know which kernel driver is needed to support CUDA? We could
add it to our Kernel if you want to test it.

Thanks,

Brandon

Seán C. McCord

unread,
Jun 6, 2014, 3:39:32 PM6/6/14
to coreos-user
Unfortunately, CUDA is nVidia's proprietary interface.  It requires an external kernel module from nVidia.



--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Seán C. McCord
ule...@gmail.com
CyCore Systems

Сергей Дьяченко

unread,
Jun 7, 2014, 3:42:57 AM6/7/14
to coreo...@googlegroups.com, diache...@gmail.com
To support CUDA needs proprietary nvidia's driver for GPU which consist of kernel module and shared libraries. I think I can place shared libraries into docker's container. But I don't understand what I should do with kernel module which depend on kernel version

пятница, 6 июня 2014 г., 22:56:18 UTC+4 пользователь Brandon Philips написал:

Michael Marineau

unread,
Jun 7, 2014, 11:49:54 AM6/7/14
to Сергей Дьяченко, coreos-user
I am slowly working out the kinks in a CoreOS dev container that will
provide the kernel source and a full toolchain so it is possible to
build external kernel modules. Not quite ready yet though.

On Sat, Jun 7, 2014 at 12:42 AM, Сергей Дьяченко
<diache...@gmail.com> wrote:
> To support CUDA needs proprietary nvidia's driver for GPU which consist of
> kernel module and shared libraries. I think I can place shared libraries
> into docker's container. But I don't understand what I should do with kernel
> module which depend on kernel version
>
> пятница, 6 июня 2014 г., 22:56:18 UTC+4 пользователь Brandon Philips
> написал:
>>
>> On Fri, Jun 6, 2014 at 3:48 AM, Сергей Дьяченко <diache...@gmail.com>
>> wrote:
>> > I want to start nVidia CUDA apps in docker. So I have to install nVidia
>> > driver with kernel module. How can I compile kernel module and add it to
>> > CoreOS?
>>
>> Do you know which kernel driver is needed to support CUDA? We could
>> add it to our Kernel if you want to test it.
>>
>> Thanks,
>>
>> Brandon
>

Сергей Дьяченко

unread,
Jun 10, 2014, 7:44:25 AM6/10/14
to coreo...@googlegroups.com, diache...@gmail.com
So, I was able to build kernel module and loaded it into CoreOS. But when CoreOS updates and changes kernel verison, nvidia module won't work. What should I do to prevent this?

суббота, 7 июня 2014 г., 19:49:54 UTC+4 пользователь Michael Marineau написал:

Brandon Philips

unread,
Jun 10, 2014, 11:50:07 AM6/10/14
to Сергей Дьяченко, coreos-user

For now you can stop update-engine. The best bet moving forward is to automate the build on reboot. When we have an SDK container it should be easier.

Xiaoyun Wu

unread,
Aug 10, 2014, 12:20:00 AM8/10/14
to coreo...@googlegroups.com, diache...@gmail.com
Glad you made it work. Can you share what the detailed howto so that other can follow?

Thanks. 

Xiaoyun

Сергей Дьяченко

unread,
Aug 14, 2014, 9:32:46 AM8/14/14
to coreo...@googlegroups.com, diache...@gmail.com
You may try this

1. Create docker container from ubuntu:12.04 image. Container must be run in privileged mode.
2. Get kernel source for your version CoreOS from https://github.com/coreos/linux
3. Install necessery tools (make, gcc...)
4. Download nvidia driver from nvidia site.
5. Run driver install file with --kernel-source-path parameter.
6. And you may try running your CUDA-app:)

воскресенье, 10 августа 2014 г., 8:20:00 UTC+4 пользователь Xiaoyun Wu написал:

Andrey Yankin

unread,
Sep 27, 2014, 7:54:05 AM9/27/14
to coreo...@googlegroups.com, diache...@gmail.com
Hi!

I am trying to install Nvidia driver on CoreOS 444.2.0, kernel: 3.16.2 following Сергей's instructions.
My problem is that `nvidia` module depends on `drm` which in turn depends on `agp` and `dma shared buffer` (and `hdmi`).
Am I right to assume that for getting DMA Buffer sharing Framework working, I need new kernel (recompile it)?

For example I get these messages in my log:
server kernel: drm: Unknown symbol dma_buf_get (err 0)
server kernel: drm: Unknown symbol dma_buf_put (err 0)

How can I install nvidia driver on CoreOS?

Сергей Дьяченко

unread,
Oct 1, 2014, 3:29:46 AM10/1/14
to coreo...@googlegroups.com, diache...@gmail.com
I don't have any problem with the latest stable version of CoreOS (410.1.0).
Could you try it?

суббота, 27 сентября 2014 г., 15:54:05 UTC+4 пользователь Andrey Yankin написал:

Traun Leyden

unread,
Oct 26, 2014, 5:17:21 PM10/26/14
to coreo...@googlegroups.com, diache...@gmail.com

I was able to get a Docker container talking to the GPU with Ubuntu as the host os, but still need to figure it out with CoreOS.

If it helps anyone, I wrote a blog post with the details here:

Traun Leyden

unread,
Nov 4, 2014, 10:32:52 AM11/4/14
to coreo...@googlegroups.com, diache...@gmail.com


On Thursday, August 14, 2014 6:32:46 AM UTC-7, Сергей Дьяченко wrote:
You may try this

1. Create docker container from ubuntu:12.04 image. Container must be run in privileged mode.
2. Get kernel source for your version CoreOS from https://github.com/coreos/linux
3. Install necessery tools (make, gcc...)
4. Download nvidia driver from nvidia site.
5. Run driver install file with --kernel-source-path parameter.
6. And you may try running your CUDA-app:)

Are you saying that steps 2-6 should be done inside the container or in the CoreOS host?

I've gotten nvidia cuda drivers working from within a Docker container, but my approach was to get the drivers installed on the host OS (ubuntu in my case), then fire up a docker container with:

--device /dev/nvidia0:/dev/nvidia0

when then made the device available from within the container.  Inside the container, it wasn't necessary to install the nvidia kernel module, because that was already done in the host OS.  

It sounds like you are suggesting:

CoreOS: no kernel module or nvidia cuda drivers
Docker container (Ubuntu): nvidia kernel module + cuda drivers

Is that correct?

Traun Leyden

unread,
Nov 4, 2014, 12:07:31 PM11/4/14
to coreo...@googlegroups.com, diache...@gmail.com

Сергей,

So I gave this a shot, going off the assumption that steps 2-6 are supposed to happen inside the container.

However when I tried to run the driver install file, I got an error about not being able to find the /usr/src/kernels/linux/include/linux/version.h file.

So to fix that error, I tried compiling the kernel with the CoreOS kernel config file here.

That didn't work, so then I tried compiling the kernel by running "make menuconfig" and leaving everything as the default.

But when I tried to run the driver install, I got the error:

ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release. Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.

Here is a step-by-step of exactly what I did, in case anyone knows how to get past this error.

Сергей Дьяченко

unread,
Nov 5, 2014, 6:12:14 AM11/5/14
to coreo...@googlegroups.com, diache...@gmail.com
You are right, steps 2-6 should be done inside the container.

Docker shares kernel (and modules) between CoreOS and docker's containers. So, you can load modules from CoreOS or from any containers (in privileged mode), I belive it's the same.
I prefer to install nvidia drivers and module into container. 

Some notes about your article. I choose ubuntu 12.04 because it has gcc 4.6. CoreOS kernel built by gcc 4.6 and modules for this kernel should be compiled with gcc 4.6. If you really need gcc 4.7, you should upgrade it after driver installation.
I think it's the main reason of your error.

I use the following steps for preparing kernel sources:

Clone CoreOS kernel repository (as in your article)
$ mkdir -p /usr/src/kernels
$ cd /usr/src/kernels

Get CoreOS kernel version:
$ uname -a

Switch to branch with this version:
$ git checkout remotes/origin/coreos/<kernel_version>

Create kernel configuration file:
$ zcat /proc/config.gz > /usr/src/kernels/linux/.config

Prepare kernel source for building modules:
$ cd /usr/src/kernels/linux
$ make modules_prepare

After that you can extract nvidia driver and install it with kernel-source-path parameter.

вторник, 4 ноября 2014 г., 20:07:31 UTC+3 пользователь Traun Leyden написал:

Traun Leyden

unread,
Nov 5, 2014, 10:10:38 AM11/5/14
to coreo...@googlegroups.com, diache...@gmail.com

I think we might be getting different results because we are using different versions.

I was using this AMI: 

CoreOS-stable-444.5.0-hvm - ami-d878c3b0
CoreOS stable 444.5.0 (HVM)

But have recently found this one:

CoreOS-stable-410.1.0-hvm - ami-7c8b3f14
CoreOS stable 410.1.0 (HVM)
 
Which I think is the same version that you used.

The only reason I upgraded to gcc4.7 was that at some point when trying to run the Nvidia installer, it gave me an error and told me that the kernel was compiled with gcc 4.7.  (so I take it CoreOS 444.5.0 was built with gcc4.7)

I'm going to try again with CoreOS stable 410.1.0 + your instructions.

Thanks!

Сергей Дьяченко

unread,
Nov 5, 2014, 10:20:16 AM11/5/14
to coreo...@googlegroups.com, diache...@gmail.com
Hmmm... I use CoreOS 410.2.0. Probably kernel in last stable version was compiled by gcc 4.7. I'll try to install nvidia driver to the latest stable version of CoreOS. Please, write about your results!

среда, 5 ноября 2014 г., 18:10:38 UTC+3 пользователь Traun Leyden написал:

Traun Leyden

unread,
Nov 5, 2014, 11:18:10 AM11/5/14
to coreo...@googlegroups.com, diache...@gmail.com

I got it working and wrote up a blog post with all the steps:

CoreOS With Nvidia CUDA GPU Drivers

http://tleyden.github.io/blog/2014/11/04/coreos-with-nvidia-cuda-gpu-drivers/

Thanks again for your help Сергей.  If you get it working on a more recent version of CoreOS, please post it.

Сергей Дьяченко

unread,
Nov 5, 2014, 2:04:23 PM11/5/14
to coreo...@googlegroups.com, diache...@gmail.com
So, kernel in recent CoreOS is built by gcc 4.7. The previous instructions worked fine, but I used container based on ubuntu 14.04 instead of ubuntu 12.04 and installed gcc 4.7 in it. All other steps are the same.
I belive, container based on ubuntu 12.04 with gcc 4.7 will work too, but I haven't checked it.

среда, 5 ноября 2014 г., 19:18:10 UTC+3 пользователь Traun Leyden написал:

Traun Leyden

unread,
Nov 5, 2014, 2:33:06 PM11/5/14
to coreo...@googlegroups.com, diache...@gmail.com

Yeah I got it working with a more recent CoreOS too:

* CoreOS-alpha-490.0.0-hvm
* Ubuntu 14.04
* Gcc 4.7

Also I updated my blog post to show the different steps needed for CoreOS 490.0.0.

The nice thing about this is that since CoreOS alpha is using Docker 1.3, you can create device nodes in the CoreOS instance, and expose them to other docker containers running under the same instance:

1. In CoreOS run this shell script.
2. Kick off a docker container and map the nvdia devices, eg:

$ sudo docker run -ti --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm tleyden5iwx/ubuntu-cuda /bin/bash

Now this docker will have access to the nvidia devices.  

I think there might be even be a way to automate this a bit more with a script rather than following all these instructions.  I believe the nvidia installers support a "-y" flag or something that lets you answer all the questions with the default values.

Traun Leyden

unread,
Dec 9, 2014, 12:53:15 PM12/9/14
to coreo...@googlegroups.com, diache...@gmail.com

I just tried it on CoreOS stable 494.4.0 (HVM) and ran into this error when trying to build the nvidia module:


it seemed to stem from the fact that uname -a returns "3.17.2+", but the version in /usr/src/kernels/linux/include/generated/utsrelease.h is "3.17.2".

As a workaround, I updated utsrelease.h to have "3.17.2+", which allowed the nvidia module to build.

Anyone know why the version in utsrelease.h doesn't match what's returned from uname -a?

The full set of steps of what I'm doing is here.

Alex Crawford

unread,
Dec 10, 2014, 6:00:15 PM12/10/14
to Traun Leyden, coreos-user, diache...@gmail.com
The '+' is an artifact of CoreOS's build system. Basically, the linux build system sees that the source has been cloned with git rather than a source tarball, and tosses the '+' on the end of the version to indicate that the sources maybe have been modified. In CoreOS's case, the sources haven't been modified. We just haven't gotten around to addressing this issue yet.

-Alex

Luke Benson

unread,
Jan 9, 2016, 1:15:14 AM1/9/16
to CoreOS User
Might be a bit late to the party, but we published some code (and a blog post) yesterday addressing this deployment scenario. The setup includes loading Nvidia drivers and registering devices as well as running a docker container to perform a simple deep learning task, via TensorFlow.

Hope this helps:

Brian Harrington

unread,
Jan 13, 2016, 3:02:39 AM1/13/16
to Luke Benson, CoreOS User
Luke,

This is a great write up and I especially like checkout of the kernel source and pivot based on the running version.  This is squarely categorized under "I've been meaning to do that" but you've nailed it.

I'm sure a lot of people will find this design pattern helpful!

--redbeard
--

Luke Benson

unread,
Jan 14, 2016, 7:12:04 PM1/14/16
to CoreOS User, vik...@gmail.com
Hi Brian,

Thanks for the positive endorsement. Glad you like it. I can't take much of the credit - mike....@emergingstack.com is the man who worked this out. 

We're still experimenting and, if it's OK with you, will reach out to you when we need some expert guidance. 

Thanks again,

Luke

Gopinath Taget

unread,
Feb 29, 2016, 7:27:12 PM2/29/16
to CoreOS User, vik...@gmail.com
Hi Luke,

Is there any way to build and install the nvidia CUDA drivers natively on CoreOS or do they have to be in a ubuntu docker container inside CoreOS? I am trying to install the CUDA drivers on CoreOS on AWS.

Thanks
Gopinath

Joshua Kolden

unread,
Feb 29, 2016, 8:39:49 PM2/29/16
to Gopinath Taget, CoreOS User, vik...@gmail.com
Well it definitely doesn't have to be in an Ubuntu container. I had stuck with that early on as a convince for quick build and deploy, but you should be able to get away with as little as modprobe and the compiled module in a container.  

I don't see any issue with it being in a container.  If anything it makes it more plugable, and easy to manage in different hardware contexts with the same everyday docker/systemd tools.  Perhaps someone at CoreOS can weigh in on any other driver deployment strategies, but my understanding was modprobe from fprivileged docker containers was the way.

sha...@gmail.com

unread,
Feb 29, 2016, 9:21:11 PM2/29/16
to Joshua Kolden, Gopinath Taget, CoreOS User, vik...@gmail.com
I think the sysdig/sysdig docker container is a good solution that uses DKMS to keep the module up to date easily for new kernels while providing any other tools needed to utilize it.  You could do a lot worse than following their example.

-Blake
Reply all
Reply to author
Forward
0 new messages