Jira (PUP-11588) dnfmodule issues with Nvidia CUDA repository

32 views
Skip to first unread message

Konrad Bucheli (Jira)

unread,
Jul 6, 2022, 11:32:04 AM7/6/22
to puppe...@googlegroups.com
Konrad Bucheli created an issue
 
Puppet / Bug PUP-11588
dnfmodule issues with Nvidia CUDA repository
Issue Type: Bug Bug
Assignee: Unassigned
Created: 2022/07/06 8:31 AM
Priority: Normal Normal
Reporter: Konrad Bucheli

Puppet Version: 7.17.0
Puppet Server Version: 7.2.1
OS Name/Version: RedHat 8

From the Nvidia CUDA package repository I changed the nvidia-driver module between different streams and found two issues:

  1. Stream named "latest" is not installed with "ensure => 'latest'"
  2. Cannot deal with conflicting packages when changing from one stream to another

Desired Behavior:

  1. If there is a stream "latest" and it is demanded with "ensure => 'latest'", it shall be installed
  2. Automatically deal with or allow to deal (e.g. using install_options) with conflicting packages when changing module stream

Actual Behavior:

Check that no modules are installed:

[root@pt86test ~]# dnf module list --installed
Last metadata expiration check: 2:31:22 ago on Mi 06 Jul 2022 13:40:09 CEST.
[root@pt86test ~]# 

install stream '470':

package { 'nvidia-driver':
    ensure   => '470',
    provider => 'dnfmodule',
}

puppet agent run successful:

Notice: /Stage[main]/Profile::Nvidia::Cuda/Package[nvidia-driver]/ensure: created (corrective)
...
[root@pt86test ~]# dnf module list --installed
Last metadata expiration check: 2:48:12 ago on Mi 06 Jul 2022 13:40:09 CEST.
CUDA and drivers from Nvidia
Name                                                  Stream                                         Profiles                                                            Summary                                                            
nvidia-driver                                         470 [e]                                        default [d] [i], fm, ks, src                                        Nvidia driver for 470 branch                                       
Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled
[root@pt86test ~]# 

 
Now let's bump it to stream '510':

package { 'nvidia-driver':
    ensure   => '510',
    provider => 'dnfmodule',
}

perfect again:

Notice: /Stage[main]/Profile::Nvidia::Cuda/Package[nvidia-driver]/ensure: ensure changed '470' to '510'
...
[root@pt86test ~]# dnf module list --installed
Last metadata expiration check: 2:53:26 ago on Mi 06 Jul 2022 13:40:09 CEST.
CUDA and drivers from Nvidia
Name                                                  Stream                                         Profiles                                                            Summary                                                            
nvidia-driver                                         510 [e]                                        default [d] [i], fm, ks, src                                        Nvidia driver for 510 branch                                       
Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled
[root@pt86test ~]# 

 
now let's try stream 'latest'

package { 'nvidia-driver':
    ensure   => 'latest',
    provider => 'dnfmodule',
}

it does not do anything!

[root@pt86test ~]# dnf module list --installed
Last metadata expiration check: 2:56:21 ago on Mi 06 Jul 2022 13:40:09 CEST.
CUDA and drivers from Nvidia
Name                                                  Stream                                         Profiles                                                            Summary                                                            
nvidia-driver                                         510 [e]                                        default [d] [i], fm, ks, src                                        Nvidia driver for 510 branch                                       
Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled
[root@pt86test ~]# 

 
I guess "latest" has a special meaning and is not used as stream name.

OK, let's try now stream '510-dkms':

package { 'nvidia-driver':
    ensure   => '510-dkms',
    provider => 'dnfmodule',
}

but here it is not happy at all:

Error: /Stage[main]/Profile::Nvidia::Cuda/Package[nvidia-driver]/ensure: change from 'purged' to '510-dkms' failed: Could not update: Execution of '/usr/bin/dnf module install -d 0 -e 1 -y nvidia-driver:510-dkms' returned 1: Error: 
 Problem: problem with installed package kmod-nvidia-510.73.08-4.18.0-372.9.1-3:510.73.08-3.el8.x86_64
  - package kmod-nvidia-510.73.08-4.18.0-372.9.1-3:510.73.08-3.el8.x86_64 conflicts with kmod-nvidia-latest-dkms provided by kmod-nvidia-latest-dkms-3:510.73.08-1.el8.x86_64
  - conflicting requests (corrective)

a manual dnf run gives me following hint:

(try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)

So '--allowerasing' it shall be:

package { 'nvidia-driver':
    ensure          => '510-dkms',
    provider        => 'dnfmodule',
    install_options => ['--allowerasing'],
}

Exactly the same error, so install_options seam not to be supported:

Error: /Stage[main]/Profile::Nvidia::Cuda/Package[nvidia-driver]/ensure: change from 'purged' to '510-dkms' failed: Could not update: Execution of '/usr/bin/dnf module install -d 0 -e 1 -y nvidia-driver:510-dkms' returned 1: Error: 
 Problem: problem with installed package kmod-nvidia-510.73.08-4.18.0-372.9.1-3:510.73.08-3.el8.x86_64
  - package kmod-nvidia-510.73.08-4.18.0-372.9.1-3:510.73.08-3.el8.x86_64 conflicts with kmod-nvidia-latest-dkms provided by kmod-nvidia-latest-dkms-3:510.73.08-1.el8.x86_64
  - conflicting requests (corrective)

Manual install with it works fine:

[root@pt86test ~]# dnf module install --allowerasing nvidia-driver:510-dkms
...
Installed:
  dkms-3.0.4-1.el8.noarch                                         elfutils-libelf-devel-0.186-1.el8.x86_64                                         kmod-nvidia-latest-dkms-3:510.73.08-1.el8.x86_64                                        
Removed:
  kmod-nvidia-510.73.08-4.18.0-372.9.1-3:510.73.08-3.el8.x86_64                                                                                                                                                                             
 
Complete!
[root@pt86test ~]# 

And for completeness to show that there is actually a 'latest' stream:

[root@pt86test ]# dnf module install --allowerasing nvidia-driver:latest
...
[root@pt86test ~]# dnf module list --installed
Last metadata expiration check: 0:01:02 ago on Mi 06 Jul 2022 17:28:41 CEST.
CUDA and drivers from Nvidia
Name                                                Stream                                           Profiles                                                           Summary                                                             
nvidia-driver                                       latest [e]                                       default [d] [i], fm, ks, src                                       Nvidia driver for latest branch                                     
 
Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled
[root@pt86test ~]# 

 

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.20.2#820002-sha1:829506d)
Atlassian logo

Konrad Bucheli (Jira)

unread,
Jul 6, 2022, 11:34:01 AM7/6/22
to puppe...@googlegroups.com
Konrad Bucheli updated an issue
Change By: Konrad Bucheli
*Puppet Version: {color:#000000}7.17.0{color}*
*Puppet Server Version: 7.2.1*
*OS Name/Version: RedHat 8
.6 *

From the [Nvidia CUDA package repository|http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/] I changed the {{nvidia-driver}} module between different streams and found two issues:
# Stream named "{{{}latest{}}}" is not installed with "{{{}ensure => 'latest'{}}}"
# Cannot deal with conflicting packages when changing from one stream to another

*Desired Behavior:*
# If there is a stream "{{{}latest{}}}" and it is demanded with "{{{}ensure => 'latest'{}}}", it shall be installed
# Automatically deal with or allow to deal (e.g. using {{install_options}}) with conflicting packages when changing module stream

*Actual Behavior:*


Check that no modules are installed:
{noformat}

[root@pt86test ~]# dnf module list --installed
Last metadata expiration check: 2:31:22 ago on Mi 06 Jul 2022 13:40:09 CEST.
[root@pt86test ~]# {noformat}
install stream '{{{}470{}}}':
{noformat}

package { 'nvidia-driver':
    ensure   => '470',
    provider => 'dnfmodule',
}
{noformat}

puppet agent run successful:
{noformat}

Notice: /Stage[main]/Profile::Nvidia::Cuda/Package[nvidia-driver]/ensure: created (corrective)
...
[root@pt86test ~]# dnf module list --installed
Last metadata expiration check: 2:48:12 ago on Mi 06 Jul 2022 13:40:09 CEST.
CUDA and drivers from Nvidia
Name                                                  Stream                                         Profiles                                                            Summary                                                            
nvidia-driver                                         470 [e]                                        default [d] [i], fm, ks, src                                        Nvidia driver for 470 branch                                       
Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled
[root@pt86test ~]# 
{noformat}
 
Now let's bump it to stream '{{{}510{}}}':
{noformat}

package { 'nvidia-driver':
    ensure   => '510',
    provider => 'dnfmodule',
}
{noformat}

perfect again:
{noformat}

Notice: /Stage[main]/Profile::Nvidia::Cuda/Package[nvidia-driver]/ensure: ensure changed '470' to '510'
...
[root@pt86test ~]# dnf module list --installed
Last metadata expiration check: 2:53:26 ago on Mi 06 Jul 2022 13:40:09 CEST.
CUDA and drivers from Nvidia
Name                                                  Stream                                         Profiles                                                            Summary                                                            
nvidia-driver                                         510 [e]                                        default [d] [i], fm, ks, src                                        Nvidia driver for 510 branch                                       
Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled
[root@pt86test ~]# 
{noformat}
 
now let's try stream '{{{}latest{}}}'
{noformat}

package { 'nvidia-driver':
    ensure   => 'latest',
    provider => 'dnfmodule',
}
{noformat}


it does not do anything!
{noformat}

[root@pt86test ~]# dnf module list --installed
Last metadata expiration check: 2:56:21 ago on Mi 06 Jul 2022 13:40:09 CEST.
CUDA and drivers from Nvidia
Name                                                  Stream                                         Profiles                                                            Summary                                                            
nvidia-driver                                         510 [e]                                        default [d] [i], fm, ks, src                                        Nvidia driver for 510 branch                                       
Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled
[root@pt86test ~]# 
{noformat}

 
I guess "latest" has a special meaning and is not used as stream name.

OK, let's try now stream '{{510-dkms}}':
{noformat}

package { 'nvidia-driver':
    ensure   => '510-dkms',
    provider => 'dnfmodule',
}
{noformat}


but here it is not happy at all:
{noformat}

Error: /Stage[main]/Profile::Nvidia::Cuda/Package[nvidia-driver]/ensure: change from 'purged' to '510-dkms' failed: Could not update: Execution of '/usr/bin/dnf module install -d 0 -e 1 -y nvidia-driver:510-dkms' returned 1: Error: 
 Problem: problem with installed package kmod-nvidia-510.73.08-4.18.0-372.9.1-3:510.73.08-3.el8.x86_64
  - package kmod-nvidia-510.73.08-4.18.0-372.9.1-3:510.73.08-3.el8.x86_64 conflicts with kmod-nvidia-latest-dkms provided by kmod-nvidia-latest-dkms-3:510.73.08-1.el8.x86_64
  - conflicting requests (corrective)
{noformat}


a manual {{dnf}} run gives me following hint:
{noformat}

(try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
{noformat}


So '{{--allowerasing}}' it shall be:
{noformat}

package { 'nvidia-driver':
    ensure          => '510-dkms',
    provider        => 'dnfmodule',
    install_options => ['--allowerasing'],
}
{noformat}


Exactly the same error, so install_options seam not to be supported:
{noformat}

Error: /Stage[main]/Profile::Nvidia::Cuda/Package[nvidia-driver]/ensure: change from 'purged' to '510-dkms' failed: Could not update: Execution of '/usr/bin/dnf module install -d 0 -e 1 -y nvidia-driver:510-dkms' returned 1: Error: 
 Problem: problem with installed package kmod-nvidia-510.73.08-4.18.0-372.9.1-3:510.73.08-3.el8.x86_64
  - package kmod-nvidia-510.73.08-4.18.0-372.9.1-3:510.73.08-3.el8.x86_64 conflicts with kmod-nvidia-latest-dkms provided by kmod-nvidia-latest-dkms-3:510.73.08-1.el8.x86_64
  - conflicting requests (corrective)
{noformat}


Manual install with it works fine:
{noformat}

[root@pt86test ~]# dnf module install --allowerasing nvidia-driver:510-dkms
...
Installed:
  dkms-3.0.4-1.el8.noarch                     elfutils-libelf-devel-0.186-1.el8.x86_64                     kmod-nvidia-latest-dkms-3:510.73.08-1.el8.x86_64                    
Removed:
  kmod-nvidia-510.73.08-4.18.0-372.9.1-3:510.73.08-3.el8.x86_64                    

Complete!
[root@pt86test ~]#
{noformat}


And for completeness to show that there is actually a '{{latest}}' stream:
{noformat}

[root@pt86test ]# dnf module install --allowerasing nvidia-driver:latest
...
[root@pt86test ~]# dnf module list --installed
Last metadata expiration check: 0:01:02 ago on Mi 06 Jul 2022 17:28:41 CEST.
CUDA and drivers from Nvidia
Name                     Stream                     Profiles                     Summary                    
nvidia-driver                     latest [e]                     default [d] [i], fm, ks, src                     Nvidia driver for latest branch                    

Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled
[root@pt86test ~]#
{noformat}
 

Lisa Ross (Jira)

unread,
Jul 28, 2022, 10:07:01 AM7/28/22
to puppe...@googlegroups.com
This message was sent by Atlassian Jira (v8.20.11#820011-sha1:0629dd8)
Atlassian logo

Lisa Ross (Jira)

unread,
Jul 28, 2022, 10:07:03 AM7/28/22
to puppe...@googlegroups.com

Lisa Ross (Jira)

unread,
Jul 28, 2022, 10:08:02 AM7/28/22
to puppe...@googlegroups.com
Lisa Ross updated an issue
Change By: Lisa Ross
Sprint: Phoenix 2022-08-17

Nirupama Mantha (Jira)

unread,
Jul 28, 2022, 11:25:03 AM7/28/22
to puppe...@googlegroups.com
Nirupama Mantha updated an issue
Change By: Nirupama Mantha
Acceptance Criteria: should be possible to install from a stream named latest

Should be possible to install a new version of a package from a different stream that conflicts with an installed package

Nirupama Mantha (Jira)

unread,
Jul 28, 2022, 11:26:03 AM7/28/22
to puppe...@googlegroups.com

Christopher Thorn (Jira)

unread,
Aug 15, 2022, 6:12:03 PM8/15/22
to puppe...@googlegroups.com

Christopher Thorn (Jira)

unread,
Aug 17, 2022, 12:13:02 PM8/17/22
to puppe...@googlegroups.com
Christopher Thorn updated an issue
Change By: Christopher Thorn
Sprint: Phoenix 2022-08-17 , Phoenix 2022-08-31

Christopher Thorn (Jira)

unread,
Aug 17, 2022, 1:19:01 PM8/17/22
to puppe...@googlegroups.com
Christopher Thorn assigned an issue to Unassigned
Change By: Christopher Thorn
Assignee: Christopher Thorn

Aria Li (Jira)

unread,
Aug 17, 2022, 6:57:02 PM8/17/22
to puppe...@googlegroups.com
Aria Li assigned an issue to Aria Li
Change By: Aria Li
Assignee: Aria Li

Josh Cooper (Jira)

unread,
Aug 22, 2022, 7:30:03 PM8/22/22
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-11588
 
Re: dnfmodule issues with Nvidia CUDA repository

Adding some debug notes.

The Puppet::Parameter#newvalues method adds possible values to a ValueCollection. For example, the package type defines :latest as a newvalue https://github.com/puppetlabs/puppet/blob/832424710993567a0bdc942cddae9b32d66d7e4d/lib/puppet/type/package.rb#L116

Puppet handles "ensure => latest" (bare word) the same as "ensure => 'latest'" (string) because the parameter value is munged (canonicalized) https://github.com/puppetlabs/puppet/blob/832424710993567a0bdc942cddae9b32d66d7e4d/lib/puppet/parameter/value_collection.rb#L102 which ends up calling https://github.com/puppetlabs/puppet/blob/832424710993567a0bdc942cddae9b32d66d7e4d/lib/puppet/parameter/value.rb#L79-L82

So due to the canonicalization, it's not possible to differentiate between wanting to upgrade to the latest version versus installing a version whose value is the literal string "latest".

Aria Li (Jira)

unread,
Aug 22, 2022, 8:08:02 PM8/22/22
to puppe...@googlegroups.com
Aria Li commented on Bug PUP-11588

We are unable to resolve this issue because the provider, dnfmodule, does not support the :upgradeable option because we cannot determine a latest version. Therefore, ensure => latest cannot be used for dnfmodule. For more information, please see this comment.

We were able to reproduce this issue on a RedHat 8 VM with Puppet 7.18.0. 
We began by checking if the latest stream of nvidia-driver could be manually installed:

[root@faulty-believer ~]# sudo dnf module install nvidia-driver:latest
...
Complete!
[root@faulty-believer ~]# dnf module list --installed
cuda-rhel8-x86_64
Name                         Stream                   Profiles                                  
nvidia-driver                latest [e]               fm, src, ks, default [d] [i]              
Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled
[root@faulty-believer ~]# puppet resource package nvidia-driver
package { 'nvidia-driver':
  ensure   => '3:515.65.01-1.el8',
  provider => 'dnf',
}

Then, we checked in stream 470 and 510 }}could be installed.{{ 

manifest.pp:

package { 'nvidia-driver':
  ensure   => '470',
  provider => 'dnfmodule',
} 

 

[root@faulty-believer ~]# puppet apply manifest.pp 
Notice: /Stage[main]/Main/Package[nvidia-driver]/ensure: ensure changed 'latest' to '470' 
Notice: Applied catalog in 90.60 seconds

manifest.pp:

package { 'nvidia-driver':
  ensure   => '510',
  provider => 'dnfmodule',

[root@faulty-believer ~]# puppet apply manifest.pp
Notice: Compiled catalog for faulty-believer.delivery.puppetlabs.net in environment production in 0.31 seconds
Notice: /Stage[main]/Main/Package[nvidia-driver]/ensure: ensure changed '470' to '510'
Notice: Applied catalog in 107.34 seconds

Then we used this manifest (manifest.pp) to attempt to install the latest stream:

package { 'nvidia-driver':
  ensure   => 'latest',
  provider => 'dnf',
}

And got this:

[root@faulty-believer ~]# puppet apply manifest.pp
Notice: Compiled catalog for faulty-believer.delivery.puppetlabs.net in environment production in 0.32 seconds
Notice: Applied catalog in 7.24 seconds
[root@faulty-believer ~]# dnf module list --installed
cuda-rhel8-x86_64
Name                          Stream                  Profiles                                    
nvidia-driver                 510 [e]                 fm, src, ks, default [d] [i]

We tried applying the same manifest with --debug to see what's happening:

[root@faulty-believer ~]# puppet apply --debug manifest.pp
...
Debug: Prefetching dnfmodule resources for package
Debug: Executing: '/usr/bin/dnf --version'
Debug: Executing: '/usr/bin/dnf module list -d 0 -e 1'
Debug: Executing: '/usr/bin/dnf check-update'
Debug: Package[nvidia-driver](provider=dnfmodule): Yum didn't find updates, current version (510) is the latest
Debug: Finishing transaction 12040
Debug: Storing state
Debug: Pruned old state cache entries in 0.00 seconds

Since dnfmodule does not have its own defined latest, Puppet is calling the latest method from a superclass of dnfmodule and thinks there is no later version.

Puppet thinks ensure => 'latest' is ensure => latest instead. More information can be found here

Konrad Bucheli (Jira)

unread,
Aug 23, 2022, 3:11:02 AM8/23/22
to puppe...@googlegroups.com

Option: dnfmodule could implement the 'latest' method on its own and install then the 'latest' stream and fail if there is none as the module streams do not have a notion of "latest".
There is the "default" stream, but that is more the earliest and not the latest...

Morgan Rhodes (Jira)

unread,
Aug 31, 2022, 1:35:03 PM8/31/22
to puppe...@googlegroups.com
Morgan Rhodes updated an issue
 
Change By: Morgan Rhodes
Sprint: Phoenix 2022-08-17, Phoenix 2022-08-31 , Phoenix 2022-09-14

Morgan Rhodes (Jira)

unread,
Aug 31, 2022, 1:38:03 PM8/31/22
to puppe...@googlegroups.com

Pat Riehecky (Jira)

unread,
Feb 8, 2023, 3:10:02 PM2/8/23
to puppe...@googlegroups.com
Pat Riehecky commented on Bug PUP-11588
 
Re: dnfmodule issues with Nvidia CUDA repository

Is there a way to help get a resolution for this prioritized?

Reply all
Reply to author
Forward
0 new messages