The current problem is not the Mellanox L2 switch, it is how our infrastructure
wires the L2 switch and the nodes together. That is done by programming the
Netscout L1 switch to which all xl170 nodes and the ualloc-* switches are
connected, creating layer 1 paths between a node NIC port and the L2 switch
port. The netscout switch appears to have died after your experiment was setup,
so the dataplane should work as those L1 paths are in place. We just cannot
teardown those paths now since we cannot talk to the management interface on
the Netscout to do so.
There may well be issues with the Mellanox switch too. The management port for
that switch is a serial port that we give you proxied access to.
Anyway, if we cannot get the Netscout switch working again with power cycle
and/or reseating cards, then there will be no way to do experiments with the
ualloc switches going forward. Parts for the netscout switches are just too
expensive and at that point, both Netscout switches will have died.
On Mon, May 18, 2026 at 11:33:00AM -0700, Christopher Canel wrote:
> Hi Mike,
>
> Thanks for your help.
>
> I observed that ualloc-mlnx1 was experiencing problems with its management interface right from the start. It appeared as if the switch was unable to check in after provisioning, and the experiment was marked as failed to start. Rebooting the switch didn???t fix the management issue. But the data plane was working fine, so I bypassed the error and used the experiment anyway.
>
> I???d like to use ualloc-mlnx1 for future experiments, so it would be awesome to fix this. Perhaps I???m setting the switch up incorrectly in my profile? I used the examples in the manual as a reference.
> To view this discussion visit
https://groups.google.com/d/msgid/cloudlab-users/59876CEB-70F1-4E14-B173-66F00FE0E397%40andrew.cmu.edu.