[query] RDMA nodes fail to map links despite available capacity

14 views
Skip to first unread message

xwt1

unread,
Aug 28, 2025, 1:51:10 PM (8 days ago) Aug 28
to cloudlab-users
Hello CloudLab Team,  
      I’m evaluating and adapting algorithms for RDMA performance and need a small cluster of RDMA-capable bare-metal nodes. However, my reservations keep failing with:  
      " Could not map all requested links to physical resources. Not enough free resources currently. Please try again later. "
       At utah.cloudlab.us, I attempted to reserve 3 RDMA-capable nodes—three xl170 or three m400, or three c6525-25g—and at the time of each submission the cluster status indicated ≥ 3 idle nodes for the selected type; I did not choose APT’s r320 because I need to attach a dataset, and I could not configure a dataset on APT for this experiment. 
       My question is :
1. Why would I see the “map links” failure even when the idle node count is ≥ 3? Is this likely due to topology constraints (same-rack placement, switch/VLAN availability), or something in my profile?
2. Could you recommend specific node types and site(s) suitable for RDMA experiments that also support datasets/RemoteBlockstore? If particular constraints or parameters are required in the request, I’d appreciate guidance.
       Thank you very much for providing these bare-metal resources and for any guidance you can share. I’m happy to provide the Experiment/Reservation ID, timestamps, or screenshots if helpful. 
Best regards,
xwt1

Mike Hibler

unread,
Aug 28, 2025, 10:30:36 PM (8 days ago) Aug 28
to cloudla...@googlegroups.com
There are several things going on here.

First, it is your experiment instantiation that is failing, not a reservation.
I do not see that you have ever created a reservation. A reservation created
at https://www.cloudlab.us/resgroup.php, is a request for resources at a
future time. When a reservation start time arrives, then the resources you
requested should be available and you can instantiate (start) an experiment
to use those resources. You do not need a reservation to start an experiment,
but without one you are limited to whatever resources are currently free.

So that is most likely what happened with the c6525-25g node experiments
you tried to start. There just were not any available. The "idle count" is
the number of nodes free at that moment, but may include nodes that will
shortly be required to fulfill a reservation and that will overlap with
your experiment duration.

Your attempted experiments with the xl170 and m510 nodes failed because they
have only a single physical network interface available for experimentation.
While you marked the links to the datasets as "best effort" and "use vlans",
you also need to mark the inter-node link the same way. Otherwise it will
attempt to use a dedicated interface for that.

Hope this helps.

On Thu, Aug 28, 2025 at 10:34:35AM -0700, xwt1 wrote:
> Hello CloudLab Team,  
>       I’m evaluating and adapting ANN algorithms for RDMA performance and need
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/cloudlab-users/
> adb8f1a8-81a1-4de6-ba20-aab26cd8deccn%40googlegroups.com.

xwt1

unread,
Aug 29, 2025, 12:59:00 AM (8 days ago) Aug 29
to cloudlab-users
      
    Thanks a lot! I am new to cloudlab and I just ignore the reservation page. I will try to make a reservation later.
xwt1.
Reply all
Reply to author
Forward
0 new messages