AMD cpu and GPU setting conflict, need to bios change.

39 views
Skip to first unread message

Jaeyoung Oh

unread,
Jan 11, 2023, 2:00:45 PM1/11/23
to cloudlab-users
Hello,

I am using 1 cloudlab node, and I need to change bios settings for my experiments. Can you help me to change the settings of bios?
My node is Clemson clgpu020/r7525(jyoh250-144001) and my experiments use Nvidia nccl, and the nccl conflict with AMD VT and IOMMU. So I need to change the bios settings.
In the system setup:
- processor settings
-- virtualization technology: disable
-- iommu support: disable
Please, change the settings.

Thank you

Mike Hibler

unread,
Jan 11, 2023, 2:28:54 PM1/11/23
to cloudla...@googlegroups.com
Can you point us to the location in the NCCL documentation where it says
it is necessary to disable these features? Is it necessary to disable all
virtualization technology or just IOMMU (VT-d)?
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/
> cloudlab-users/2986d2d0-4915-41d8-b18e-bc8790fbd85fn%40googlegroups.com.

Jaeyoung Oh

unread,
Jan 11, 2023, 4:27:14 PM1/11/23
to cloudlab-users
I got that from the NCCL test GitHub discussion section(https://github.com/NVIDIA/nccl-tests/issues/18#issuecomment-714240156). In the post, I need to turn off VT-d and IOMMU both because I got the same NCCL failure.

Jaeyoung Oh

unread,
Jan 12, 2023, 4:46:34 AM1/12/23
to cloudlab-users
I solved this problem by disabling IOMMU in the grub file. The solution here is IOMMU Advisory for Multi-GPU Environments (amd.com)

Thank you

Mike Hibler

unread,
Jan 12, 2023, 12:46:25 PM1/12/23
to cloudla...@googlegroups.com
Glad you found a solution, and thanks for the pointer.

The IOMMU is crucial for people doing virtualization experiments, so turning
it off in the BIOS on all machines would not have been a good solution. You
would have needed to have us do it for every experiment you did.
> cloudlab-users/c955913b-1e70-4eb4-b583-119550f1faadn%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages