access to /dev/infiniband from user space

589 views
Skip to first unread message

Adrian Jackson

unread,
Apr 9, 2019, 5:32:14 AM4/9/19
to singu...@lbl.gov
Hi,

I'm trying to get a singularity container to run using the infiniband network on a cluster I have access to. I can get it to run using MPI fine, but it's only using TCP/IP and hence the MPI performance is 10x slower than it should be.

Tracing through where things are going wrong it looks like it's failing where it's trying to write to: /dev/infiniband/uverbs0. It looks like it doesn't have permission to write into there, although such a call works fine for applications run outside singularity (for debugging all I'm running is ibv_devinfo inside and outside singularity and stracing what happens).

Anyone any ideas why this would happen or what I should do to get around this issue?

thanks

adrianj


victor sv

unread,
Apr 9, 2019, 6:46:07 AM4/9/19
to singu...@lbl.gov
Hi Adrian,

I don't have too much details on what is happening in your particular case. Which singularity version are you running? are infiniband related libraries installed inside the container? MPI has been linked/compiled with infiniband support?

Singularity recommendations explicitly says 'To support infiniband the container must support it". It means that you have to install infiniband libraries and link MPI to them inside the container.

Here I've a singularity recipe to install infiniband libraries, It's old stuff and probable there are more up-to-date recipes anywhere:

Here is a solution in one of the singularity issues:

Hope it helps!
Víctor
--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

Adrian Jackson

unread,
Apr 9, 2019, 6:52:20 AM4/9/19
to singu...@lbl.gov
Hi Victor,

Singularity 3.0.3, I've installed the infiniband drivers inside the container and strace is showing they are being found. It is likely the infiniband libraries inside the container are not exactly the same version as on the system. At the point I'm getting this error I've not yet touched MPI, I'm still just trying to get the infiniband tools working (i.e. ibv_devinfo which should just print out the details information about the infiniband devices I have in the system). ibstat does work, so the container can see the infiniband device is there, but it cannot access it to get detailed information (which I can do outside the container).

Thanks, I'll try your recipe and see if works better than the container I've built.

I h ave been through that github issue but it didn't seem to help my issues.

Thanks for the reply.

cheers

adrianj

victor sv

unread,
Apr 9, 2019, 7:00:39 AM4/9/19
to singu...@lbl.gov
Hi Adrian,

I'm not aware of any kind of incompatibility between different infiniband drivers, but, anyway, I would like to know if you find it. Please, let me know in this thread.

Thanks :)

Víctor

Shenglong Wang

unread,
Apr 9, 2019, 12:10:23 PM4/9/19
to singu...@lbl.gov, Shenglong Wang
Not sure if this helps. On our HPC cluster, I bind IB related libraries and folders from host to container, I’m able to run ibv_devinfo correctly.

LD_LIBRARY_PATH is set as as 

export LD_LIBRARY_PATH=$MY_LD_LIBRARY_PATH:.:/host/lib:$LD_LIBRARY_PATH

inside Singularity container.

[wang@c17-04 osu-bench]$ cat run-test2.sh 
#!/bin/bash

img=/beegfs/work/public/singularity/ubuntu-18.10.simg

ib=/etc/libibverbs.d
for lib in /opt/slurm/lib64/lib*.so* /usr/lib64/libosmcomp.so.3* /usr/lib64/libmlx*.so* /usr/lib64/libi40iw-rdmav2.so* /lib64/libib*.so* /usr/lib64/libnl.so*; do
    ib="$lib:/host/lib/$(basename $lib),$ib"
done

singularity exec --bind /opt/slurm,/usr/bin/ibv_devinfo,$ib $img ibv_devinfo

ibv_devinfo

exit
[wang@c17-04 osu-bench]$ sh run-test2.sh 
hca_id: mlx5_0
        transport:                      InfiniBand (0)
        fw_ver:                         12.16.1020
        node_guid:                      7cfe:9003:0026:9360
        sys_image_guid:                 7cfe:9003:0026:9360
        vendor_id:                      0x02c9
        vendor_part_id:                 4115
        hw_ver:                         0x0
        board_id:                       DEL2180110032
        phys_port_cnt:                  1
        Device ports:
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 194
                        port_lid:               102
                        port_lmc:               0x00
                        link_layer:             InfiniBand

hca_id: mlx5_0
        transport:                      InfiniBand (0)
        fw_ver:                         12.16.1020
        node_guid:                      7cfe:9003:0026:9360
        sys_image_guid:                 7cfe:9003:0026:9360
        vendor_id:                      0x02c9
        vendor_part_id:                 4115
        hw_ver:                         0x0
        board_id:                       DEL2180110032
        phys_port_cnt:                  1
        Device ports:
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 194
                        port_lid:               102
                        port_lmc:               0x00
                        link_layer:             InfiniBand

[wang@c17-04 osu-bench]$ 


Best,
Shenglong

Adrian Jackson

unread,
Apr 16, 2019, 12:59:28 PM4/16/19
to singu...@lbl.gov, Shenglong Wang
Thanks, in the end the magic I needed was this:

export SINGULARITY_CONTAINLIBS=/lib64/libmlx5-rdmav2.so,/lib64/libibverbs.so,/lib64/libibverbs.so.1,/lib64/libmlx4-rdmav2.so
mpirun -x SINGULARITY_CONTAINLIBS   --prefix /lustre/home/z04/adrianj/openmpi/2.1.0 --mca btl openib  --hostfile $PBS_NODEFILE ...

This was after installing the infiniband libraries in the container and building OpenMPI correctly in both places.

cheers

adrianj

Shenglong Wang

unread,
Apr 16, 2019, 1:31:50 PM4/16/19
to Adrian Jackson, Shenglong Wang, singu...@lbl.gov
Very nice.

Do we need to install IB libraries inside the container? Can we just use the IB drivers on host to bind to container? Just like NVIDIA drivers for GPUs.

Best,
Shenglong

Adrian Jackson

unread,
Apr 16, 2019, 1:37:02 PM4/16/19
to Shenglong Wang, singu...@lbl.gov
For my setup I installed the Infiniband drivers in the container. I've not tried without doing that.

cheers

adrianj

Shenglong Wang

unread,
Apr 16, 2019, 4:57:57 PM4/16/19
to Adrian Jackson, Shenglong Wang, singu...@lbl.gov
On our HPC cluster, I built Singularity image file for Ubuntu 19.04, no IB drivers installed inside the container image file

After setup

export SINGULARITY_BINDPATH=/opt/slurm,/etc/libibverbs.d,/usr/include/infiniband,/usr/include/rdma
export SINGULARITY_BINDPATH=$SINGULARITY_BINDPATH,$(echo /usr/bin/ib*_* | sed -e 's/ /,/g')

export SINGULARITY_CONTAINLIBS=$(echo /usr/lib64/libmlx*.so* /usr/lib64/libi40iw-rdmav2.so* /lib64/libib*.so* /usr/lib64/libnl.so* | xargs | sed -e 's/ /,/g’)

I can build OpenMPI 3.1.3 successfully with IB and SLURM enabled inside the container.

For OSU bandwidth test, pt2pt/osu_bw, I have similar IB bandwidth performance on host and inside Singularity container.


[wang@c10-01 mpi-singularity]$ cat run-benchmarks.sh
#!/bin/bash

module purge

export LD_LIBRARY_PATH=/opt/slurm/lib64

img=/beegfs/work/public/singularity/ubuntu-19.04.sif

export SINGULARITY_BINDPATH=/opt/slurm,/etc/libibverbs.d,/usr/include/infiniband,/usr/include/rdma
export SINGULARITY_BINDPATH=$SINGULARITY_BINDPATH,$(echo /usr/bin/ib*_* | sed -e 's/ /,/g')

export SINGULARITY_CONTAINLIBS=$(echo /usr/lib64/libmlx*.so* /usr/lib64/libi40iw-rdmav2.so* /lib64/libib*.so* /usr/lib64/libnl.so* | xargs | sed -e 's/ /,/g')

exe=pt2pt/osu_bw

srun --mpi=pmi2 \
/home/wang/mpi-singularity/host/osu-local/libexec/osu-micro-benchmarks/mpi/$exe

srun --mpi=pmi2 \
singularity exec $img \
/home/wang/mpi-singularity/singularity/osu-local/libexec/osu-micro-benchmarks/mpi/$exe
[wang@c10-01 mpi-singularity]$ sh run-benchmarks.sh
srun: error: spank: x11.so: Plugin file not found
# OSU MPI Bandwidth Test v5.6.1
# Size Bandwidth (MB/s)
1 3.79
2 7.55
4 15.09
8 30.09
16 59.88
32 117.54
64 235.17
128 410.05
256 792.83
512 1296.26
1024 2240.31
2048 3941.70
4096 5834.29
8192 7806.05
16384 10099.20
32768 11436.05
65536 11781.49
131072 11968.60
262144 12065.43
524288 12077.23
1048576 12133.04
2097152 12115.81
4194304 12114.35
srun: error: spank: x11.so: Plugin file not found
# OSU MPI Bandwidth Test v5.6.1
# Size Bandwidth (MB/s)
1 4.22
2 8.44
4 16.83
8 33.49
16 66.87
32 131.68
64 259.69
128 446.83
256 880.66
512 1449.26
1024 2675.45
2048 4752.75
4096 7268.75
8192 9895.56
16384 9653.57
32768 11418.04
65536 11785.02
131072 11969.93
262144 12064.00
524288 12114.16
1048576 12134.41
2097152 12116.23
4194304 12114.46
[wang@c10-01 mpi-singularity]$


Without SINGULARITY_CONTAINLIBS setup, OpenMPI inside container is running with much lower bandwidth


[wang@c10-01 mpi-singularity]$ sh run-benchmarks.sh
srun: error: spank: x11.so: Plugin file not found
# OSU MPI Bandwidth Test v5.6.1
# Size Bandwidth (MB/s)
1 3.83
2 7.60
4 15.16
8 30.11
16 60.26
32 118.11
64 235.40
128 411.47
256 788.84
512 1271.36
1024 2295.44
2048 3850.66
4096 5665.37
8192 7812.25
16384 10185.56
32768 11438.84
65536 11787.29
131072 11968.12
262144 12066.93
524288 12114.29
1048576 12128.00
2097152 12114.70
4194304 12113.99
srun: error: spank: x11.so: Plugin file not found
# OSU MPI Bandwidth Test v5.6.1
# Size Bandwidth (MB/s)
1 0.47
2 0.94
4 2.01
8 4.00
16 7.78
32 12.53
64 27.81
128 46.53
256 100.01
512 138.82
1024 391.96
2048 489.30
4096 628.20
8192 787.51
16384 937.60
32768 1078.95
65536 2351.52
131072 2926.45
262144 3178.66
524288 3411.78
1048576 3640.92
2097152 3908.76
4194304 3741.52
[wang@c10-01 mpi-singularity]$

Best,
Shenglong

Adrian Jackson

unread,
Apr 16, 2019, 5:02:33 PM4/16/19
to Shenglong Wang, singu...@lbl.gov
Good to know. Note, for my issue the performance was only a problem going between nodes, not inside a single node, so I explicitly have to force MPI to benchmark across nodes to ensure I can see whether I'm using infiniband properly or not.

cheers

adrianj

v

unread,
Apr 16, 2019, 5:10:56 PM4/16/19
to singu...@lbl.gov, Shenglong Wang
If I created a question for this on AskCI, would everyone in this conversation be able to copy pasta / contribute their responses? it's a really good discussion, a likely use case, and I'd like to preserve that knowledge. We could feature as a question of the week to share with others and showcase your solutions.

--
You received this message because you are subscribed to the Google Groups "singularity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.


--
Vanessa Villamia Sochat
Stanford University '16

Adrian Jackson

unread,
Apr 16, 2019, 5:22:05 PM4/16/19
to singu...@lbl.gov
Sure, at least I'm happy to do that

v

unread,
Apr 16, 2019, 5:23:12 PM4/16/19
to singu...@lbl.gov
Shweet!!

Do you want to post the question, or do you want me to take a stab?

Adrian Jackson

unread,
Apr 16, 2019, 5:36:00 PM4/16/19
to singu...@lbl.gov
Depends on what you want? Something that looks exactly like it unfolded on the mailing list, or something more curated?

João Ferreira

unread,
Apr 16, 2019, 5:39:01 PM4/16/19
to singu...@lbl.gov
Hey,

Not sure if relevant at this point but I managed to use Omni-path/PSM as the interconnect with the same approach. 
Either binding the paths or installing the PSM libraries when building the container works, the interface seems to be available inside the container if it exists on the host. 

Best regards,
João Ferreira

v

unread,
Apr 16, 2019, 5:41:23 PM4/16/19
to singu...@lbl.gov
You could probably copy paste your original question for the most part, and then others can do the same for their answers. It shouldn't be a lot of additional work (aside from if you want to format it differently). The goal is so when someone in the future goes looking for it, there is a record on a general discussion board for HPC.

Adrian Jackson

unread,
Apr 16, 2019, 5:52:23 PM4/16/19
to singu...@lbl.gov
Ok, I've started it there, I can finish it as well, but shall I leave it for others to add in their answers for now?


v

unread,
Apr 16, 2019, 6:16:22 PM4/16/19
to singu...@lbl.gov
Perfecto! Here is the link for others to add responses:


Victor, Shenglong, and João let me know if you need any help to post. I think this is a good question, and I'd like to feature it for next week.
Reply all
Reply to author
Forward
0 new messages