Profile for 2 xl170 nodes

19 views
Skip to first unread message

Nurlan Nazaraliyev

unread,
Nov 9, 2024, 8:13:45 PM11/9/24
to cloudlab-users
Hello,

I have reserved 2 xl170 nodes. I want to create a profile with these w nodes connected over 25Gbps link. 

The profile code I am running:

<rspec xmlns="http://www.geni.net/resources/rspec/3"
       xmlns:emulab="http://www.protogeni.net/resources/rspec/ext/emulab/1"
       xmlns:tour="http://www.protogeni.net/resources/rspec/ext/apt-tour/1"
       xmlns:jacks="http://www.protogeni.net/resources/rspec/ext/jacks/1"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.geni.net/resources/rspec/3 http://www.geni.net/resources/rspec/3/request.xsd"
       type="request">
 
  <rspec_tour xmlns="http://www.protogeni.net/resources/rspec/ext/apt-tour/1">
    <description xmlns="" type="markdown">RDMA Project on CloudLab XL170 Nodes with 10Gb and 25Gb Links</description>
  </rspec_tour>
 
  <!-- Node 1 -->
  <node client_id="xl1" exclusive="true">
    <sliver_type name="raw">
      <disk_image name="urn:publicid:IDN+emulab.net+image+emulab-ops//UBUNTU22-64-STD"/>
    </sliver_type>
    <hardware_type name="xl170"/>
    <interface client_id="xl1:interface-0"/>
    <interface client_id="xl1:interface-2"/>
  </node>
 
  <!-- Node 2 -->
  <node client_id="xl2" exclusive="true">
    <sliver_type name="raw">
      <disk_image name="urn:publicid:IDN+emulab.net+image+emulab-ops//UBUNTU22-64-STD"/>
    </sliver_type>
    <hardware_type name="xl170"/>
    <interface client_id="xl2:interface-1"/>
    <interface client_id="xl2:interface-3"/>
  </node>
 
  <!-- 25 Gbps Link -->
  <link client_id="link-25Gb">
    <interface_ref client_id="xl1:interface-0"/>
    <interface_ref client_id="xl2:interface-1"/>
    <property source_id="xl1:interface-0" dest_id="xl2:interface-1" capacity="25000000"/>
    <property source_id="xl2:interface-1" dest_id="xl1:interface-0" capacity="25000000"/>
  </link>
 
  <!-- 10 Gbps Link -->
  <link client_id="link-10Gb">
    <interface_ref client_id="xl1:interface-2"/>
    <interface_ref client_id="xl2:interface-3"/>
    <property source_id="xl1:interface-2" dest_id="xl2:interface-3" capacity="10000000"/>
    <property source_id="xl2:interface-3" dest_id="xl1:interface-2" capacity="10000000"/>
  </link>
 
</rspec>

However, when I instantiate it I get the following error:
*** No possible mapping for xl1 Too many links of type ethernet! (2 requested, 1 found) *** No possible mapping for xl2 Too many links of type ethernet! (2 requested, 1 found)

Thanks in advance
Nurlan Nazaraliyev 


Mike Hibler

unread,
Nov 10, 2024, 10:44:01 AM11/10/24
to 'Nurlan Nazaraliyev' via cloudlab-users
The 10Gb interface on the xl170 nodes is not a "normal" interface. Those
interfaces are connected to layer1 (physical layer) switches and are used
to combine nodes and actual layer2 switches in an experiment (see:
https://docs.cloudlab.us/advanced-topics.html#%28part._user-controlled-switches%29).
The methods used to create links involving those interfaces are different.

Depending on what you are trying to do, there are likely other methods, or
other node types, to accomplish your goals. Why do you need two interfaces
and do they be different speeds?
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/cloudlab-users/
> f1111a78-e592-479f-961b-218a7cb28eean%40googlegroups.com.

Nurlan Nazaraliyev

unread,
Nov 11, 2024, 12:47:19 AM11/11/24
to cloudla...@googlegroups.com
Thanks for your reply. I think I have solved this problem with the following profile:

<rspec xmlns="http://www.geni.net/resources/rspec/3" xmlns:emulab="http://www.protogeni.net/resources/rspec/ext/emulab/1" xmlns:tour="http://www.protogeni.net/resources/rspec/ext/apt-tour/1" xmlns:jacks="http://www.protogeni.net/resources/rspec/ext/jacks/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.geni.net/resources/rspec/3 http://www.geni.net/resources/rspec/3/request.xsd" type="request">
 
  <rspec_tour xmlns="http://www.protogeni.net/resources/rspec/ext/apt-tour/1">
    <description xmlns="" type="markdown">RDMA Project on CloudLab XL170 Nodes with a 25Gb Link</description>
  <instructions xmlns="" type="markdown">check paper</instructions></rspec_tour>

 
  <!-- Node 1 -->
  <node client_id="xl1" exclusive="true">
    <sliver_type name="raw">
      <disk_image name="urn:publicid:IDN+emulab.net+image+emulab-ops//UBUNTU22-64-STD"/>

    </sliver_type>
    <hardware_type name="xl170"/>
    <interface client_id="xl1:interface-0"/>
  </node>
 
  <!-- Node 2 -->
  <node client_id="xl2" exclusive="true">
    <sliver_type name="raw">
      <disk_image name="urn:publicid:IDN+emulab.net+image+emulab-ops//UBUNTU22-64-STD"/>

    </sliver_type>
    <hardware_type name="xl170"/>
    <interface client_id="xl2:interface-1"/>
  </node>
 
  <!-- 25 Gbps Link -->
  <link client_id="link-25Gb">
    <interface_ref client_id="xl1:interface-0"/>
    <interface_ref client_id="xl2:interface-1"/>
    <property source_id="xl1:interface-0" dest_id="xl2:interface-1" capacity="25000000"/>
    <property source_id="xl2:interface-1" dest_id="xl1:interface-0" capacity="25000000"/>
  </link>
 
</rspec>

However, I installed mlnx driver and whenever I run 'sudo /etc/init.d/openibd start/restart'  the connection with the node gets broken. Is there any solution to this?

Best,
Nurlan
 

Mike Hibler

unread,
Nov 11, 2024, 10:47:31 AM11/11/24
to cloudla...@googlegroups.com
The Internet-facing control network interface is also Mellanox, so installing
a new driver will affect it. You should get on the console and see what is
happening when you reboot the machine. If you used apt to install new packages
for those drivers, it is possible other packages got pulled in as well. In
particular, the networkmanager does not play nice with our control network
configuration.
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/cloudlab-users/
> CAGvWzKiJef-RB5vemsH34t8y-8KKKktpRYw4idX_EepD1JUefA%40mail.gmail.com.

Nurlan Nazaraliyev

unread,
Nov 11, 2024, 8:53:28 PM11/11/24
to cloudla...@googlegroups.com
Hi Mike. Thanks for your replies. 
I created a fresh experiment with the same profile. I installed mlnx driver but did not restart the network. I just rebooted the machine. However, once I rebooted it, it got stuck again. I am unsure what I should do (I don't understand the console messages either).
Is there any way to install a new driver without causing network issues?


Thanks in advance
Best,
Nurlan

Mike Hibler

unread,
Nov 11, 2024, 10:58:11 PM11/11/24
to cloudla...@googlegroups.com
I logged into the console and it shows that no NICs were found. The kernel
log shows:

...
mlx5e_tc_post_act_init:40:(pid 225): firmware level support is missing
...

for each of the four interfaces. It seems likely that the driver is
incompatible with the NIC hardware (MT27710 Family [ConnectX-4 Lx]) or the
firmware revision (14.18.2030).

Is there a reason you need special drivers?
> CAGvWzKhooWqw2qKaduY2Z9RSGpXVM3jWD5ksg%2BDAoOBoE%2BkpsA%40mail.gmail.com.

Nurlan Nazaraliyev

unread,
Nov 11, 2024, 11:07:14 PM11/11/24
to cloudla...@googlegroups.com
I just downloaded and installed the one with the LTS option (5.8). 
There is no specific reason for this. But I would like to know how I can check the compatibility before installation.

Thanks in advance,
Best regards,
Nurlan

Nurlan Nazaraliyev

unread,
Nov 12, 2024, 4:17:41 PM11/12/24
to Nurlan Nazaraliyev, cloudla...@googlegroups.com
I noticed that the error

  ...
  mlx5e_tc_post_act_init:40:(pid 225): firmware level support is missing
  ...

shows up in the fresh node (unmodified) as well. I think the reason why the link goes down after I install a new driver is that I don't have a control network to the nodes.

Is that correct? If so, how can I add a control network properly?

Best, 
Nurlan.
Reply all
Reply to author
Forward
0 new messages