Experimental network down in APT Cluster

28 views
Skip to first unread message

Jinghua Wang

unread,
Aug 27, 2025, 11:46:28 AM (9 days ago) Aug 27
to cloudlab-users
Hello CloudLab Admins,

In my current experiment at https://www.cloudlab.us/status.php?uuid=5b020358-82fd-11f0-bc80-e4434b2381fc, I see that all nodes in my experiment have the experimental network interface of "vlan282@enp8s0d1" being "no-carrier". I took the following shell outputs from a r320 node. Here is one example output of "ip addr":
-----------------------------------------------------------------------------------------------------------------------------------
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether f0:1f:af:e2:04:7e brd ff:ff:ff:ff:ff:ff
    altname enp2s0f0
    inet 128.110.96.190/22 metric 1024 brd 128.110.99.255 scope global eno1
       valid_lft forever preferred_lft forever
    inet6 fe80::f21f:afff:fee2:47e/64 scope link
       valid_lft forever preferred_lft forever
3: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether f0:1f:af:e2:04:7f brd ff:ff:ff:ff:ff:ff
    altname enp2s0f1
4: enp8s0d1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether f4:52:14:15:51:22 brd ff:ff:ff:ff:ff:ff
5: vlan282@enp8s0d1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
    link/ether 02:9d:76:84:96:c1 brd ff:ff:ff:ff:ff:ff
    inet 10.10.1.3/24 brd 10.10.1.255 scope global vlan282
       valid_lft forever preferred_lft forever
-----------------------------------------------------------------------------------------------------------------------------------

And here is the example output of "lshw -c network": 
-----------------------------------------------------------------------------------------------------------------------------------
  *-network                
       description: Ethernet interface
       product: MT27500 Family [ConnectX-3]
       vendor: Mellanox Technologies
       physical id: 0
       bus info: pci@0000:08:00.0
       logical name: enp8s0d1
       version: 00
       serial: f4:52:14:15:51:22
       capacity: 56Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pm vpd msix pciexpress bus_master cap_list rom ethernet physical fibre 10000bt-fd 40000bt-fd 56000bt-fd autonegotiation
       configuration: autonegotiation=off broadcast=yes driver=mlx4_en driverversion=4.0-0 firmware=2.42.5000 latency=0 link=no multicast=yes port=fibre
       resources: irq:34 memory:d9f00000-d9ffffff memory:d5000000-d57fffff memory:d9000000-d90fffff
  *-network:0
       description: Ethernet interface
       product: NetXtreme BCM5720 Gigabit Ethernet PCIe
       vendor: Broadcom Inc. and subsidiaries
       physical id: 0
       bus info: pci@0000:02:00.0
       logical name: eno1
       version: 00
       serial: f0:1f:af:e2:04:7e
       size: 1Gbit/s
       capacity: 1Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=tg3 driverversion=5.15.0-151-generic duplex=full firmware=FFV7.8.16 bc 5720-v1.32 ip=128.110.96.190 latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s
       resources: irq:16 memory:d58a0000-d58affff memory:d58b0000-d58bffff memory:d58c0000-d58cffff memory:da000000-da03ffff
  *-network:1 DISABLED
       description: Ethernet interface
       product: NetXtreme BCM5720 Gigabit Ethernet PCIe
       vendor: Broadcom Inc. and subsidiaries
       physical id: 0.1
       bus info: pci@0000:02:00.1
       logical name: eno2
       version: 00
       serial: f0:1f:af:e2:04:7f
       capacity: 1Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=tg3 driverversion=5.15.0-151-generic firmware=FFV7.8.16 bc 5720-v1.32 latency=0 link=no multicast=yes port=twisted pair
       resources: irq:17 memory:d58d0000-d58dffff memory:d58e0000-d58effff memory:d58f0000-d58fffff memory:da040000-da07ffff
-----------------------------------------------------------------------------------------------------------------------------------

Could you please take a look and help me bring up the interface "enp8s0d1"?

Thank you very much!

Best,
Jinghua Wang

Jinghua Wang

unread,
Aug 28, 2025, 2:27:29 PM (8 days ago) Aug 28
to cloudlab-users
Hi CloudLab admins,

I just checked right now, the problem is still there, the experimental network in my experiment is still down. Because this is a quite large experiment with many nodes, over the last a few days I already consumed quite many node hours without being able to carry out my experiments... Should I terminate my experiment, wait for you to solve the problem, and then start a new experiment later?

Thank you,
Jinghua

Mike Hibler

unread,
Aug 28, 2025, 6:53:53 PM (8 days ago) Aug 28
to cloudla...@googlegroups.com
Looks like a probable switch problem. We are investigating.
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/cloudlab-users/
> 8d33e398-61df-4e88-aa72-9b00cb2958fcn%40googlegroups.com.

Mike Hibler

unread,
Aug 28, 2025, 10:47:05 PM (8 days ago) Aug 28
to cloudla...@googlegroups.com
The switch has been fixed. You also had a node (apt046) where the experiment
interface had gone missing. Rebooting the node fixed that one. You should be
good to go now!

On Thu, Aug 28, 2025 at 04:53:47PM -0600, Mike Hibler wrote:
> Looks like a probable switch problem. We are investigating.
>
> On Thu, Aug 28, 2025 at 11:27:29AM -0700, Jinghua Wang wrote:
> > Hi CloudLab admins,
> >
> > I just checked right now, the problem is still there, the experimental network
> > in my experiment is still down. Because this is a quite large experiment with
> > many nodes, over the last a few days I already consumed quite many node hours
> > without being able to carry out my experiments... Should I terminate my
> > experiment, wait for you to solve the problem, and then start a new experiment
> > later?
> >
> > Thank you,
> > Jinghua
> >
> > On Wednesday, August 27, 2025 at 10:46:28???AM UTC-5 Jinghua Wang wrote:
> >
> > Hello CloudLab Admins,
> >
> > In my current experiment at??https://www.cloudlab.us/status.php?uuid=
> > 5b020358-82fd-11f0-bc80-e4434b2381fc, I see that all nodes in my experiment
> > have the experimental network interface of "vlan282@enp8s0d1" being
> > "no-carrier". I took the following shell outputs from a r320 node. Here is
> > one example output of "ip addr":
> > -----------------------------------------------------------------------------------------------------------------------------------
> > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
> > default qlen 1000
> > ?? ?? link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > ?? ?? inet 127.0.0.1/8 scope host lo
> > ?? ?? ?? ??valid_lft forever preferred_lft forever
> > ?? ?? inet6 ::1/128 scope host
> > ?? ?? ?? ??valid_lft forever preferred_lft forever
> > 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group
> > default qlen 1000
> > ?? ?? link/ether f0:1f:af:e2:04:7e brd ff:ff:ff:ff:ff:ff
> > ?? ?? altname enp2s0f0
> > ?? ?? inet 128.110.96.190/22 metric 1024 brd 128.110.99.255 scope global eno1
> > ?? ?? ?? ??valid_lft forever preferred_lft forever
> > ?? ?? inet6 fe80::f21f:afff:fee2:47e/64 scope link
> > ?? ?? ?? ??valid_lft forever preferred_lft forever
> > 3: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
> > qlen 1000
> > ?? ?? link/ether f0:1f:af:e2:04:7f brd ff:ff:ff:ff:ff:ff
> > ?? ?? altname enp2s0f1
> > 4: enp8s0d1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state
> > DOWN group default qlen 1000
> > ?? ?? link/ether f4:52:14:15:51:22 brd ff:ff:ff:ff:ff:ff
> > 5: vlan282@enp8s0d1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
> > noqueue state LOWERLAYERDOWN group default qlen 1000
> > ?? ?? link/ether 02:9d:76:84:96:c1 brd ff:ff:ff:ff:ff:ff
> > ?? ?? inet 10.10.1.3/24 brd 10.10.1.255 scope global vlan282
> > ?? ?? ?? ??valid_lft forever preferred_lft forever
> > -----------------------------------------------------------------------------------------------------------------------------------
> >
> > And here is the example output of "lshw -c network":??
> > -----------------------------------------------------------------------------------------------------------------------------------
> > ?? *-network ?? ?? ?? ?? ?? ?? ?? ??
> > ?? ?? ?? ??description: Ethernet interface
> > ?? ?? ?? ??product: MT27500 Family [ConnectX-3]
> > ?? ?? ?? ??vendor: Mellanox Technologies
> > ?? ?? ?? ??physical id: 0
> > ?? ?? ?? ??bus info: pci@0000:08:00.0
> > ?? ?? ?? ??logical name: enp8s0d1
> > ?? ?? ?? ??version: 00
> > ?? ?? ?? ??serial: f4:52:14:15:51:22
> > ?? ?? ?? ??capacity: 56Gbit/s
> > ?? ?? ?? ??width: 64 bits
> > ?? ?? ?? ??clock: 33MHz
> > ?? ?? ?? ??capabilities: pm vpd msix pciexpress bus_master cap_list rom
> > ethernet physical fibre 10000bt-fd 40000bt-fd 56000bt-fd autonegotiation
> > ?? ?? ?? ??configuration: autonegotiation=off broadcast=yes driver=mlx4_en
> > driverversion=4.0-0 firmware=2.42.5000 latency=0 link=no multicast=yes port
> > =fibre
> > ?? ?? ?? ??resources: irq:34 memory:d9f00000-d9ffffff memory:d5000000-d57fffff
> > memory:d9000000-d90fffff
> > ?? *-network:0
> > ?? ?? ?? ??description: Ethernet interface
> > ?? ?? ?? ??product: NetXtreme BCM5720 Gigabit Ethernet PCIe
> > ?? ?? ?? ??vendor: Broadcom Inc. and subsidiaries
> > ?? ?? ?? ??physical id: 0
> > ?? ?? ?? ??bus info: pci@0000:02:00.0
> > ?? ?? ?? ??logical name: eno1
> > ?? ?? ?? ??version: 00
> > ?? ?? ?? ??serial: f0:1f:af:e2:04:7e
> > ?? ?? ?? ??size: 1Gbit/s
> > ?? ?? ?? ??capacity: 1Gbit/s
> > ?? ?? ?? ??width: 64 bits
> > ?? ?? ?? ??clock: 33MHz
> > ?? ?? ?? ??capabilities: pm vpd msi msix pciexpress bus_master cap_list rom
> > ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd
> > autonegotiation
> > ?? ?? ?? ??configuration: autonegotiation=on broadcast=yes driver=tg3
> > driverversion=5.15.0-151-generic duplex=full firmware=FFV7.8.16 bc
> > 5720-v1.32 ip=128.110.96.190 latency=0 link=yes multicast=yes port=twisted
> > pair speed=1Gbit/s
> > ?? ?? ?? ??resources: irq:16 memory:d58a0000-d58affff memory:d58b0000-d58bffff
> > memory:d58c0000-d58cffff memory:da000000-da03ffff
> > ?? *-network:1 DISABLED
> > ?? ?? ?? ??description: Ethernet interface
> > ?? ?? ?? ??product: NetXtreme BCM5720 Gigabit Ethernet PCIe
> > ?? ?? ?? ??vendor: Broadcom Inc. and subsidiaries
> > ?? ?? ?? ??physical id: 0.1
> > ?? ?? ?? ??bus info: pci@0000:02:00.1
> > ?? ?? ?? ??logical name: eno2
> > ?? ?? ?? ??version: 00
> > ?? ?? ?? ??serial: f0:1f:af:e2:04:7f
> > ?? ?? ?? ??capacity: 1Gbit/s
> > ?? ?? ?? ??width: 64 bits
> > ?? ?? ?? ??clock: 33MHz
> > ?? ?? ?? ??capabilities: pm vpd msi msix pciexpress bus_master cap_list rom
> > ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd
> > autonegotiation
> > ?? ?? ?? ??configuration: autonegotiation=on broadcast=yes driver=tg3
> > driverversion=5.15.0-151-generic firmware=FFV7.8.16 bc 5720-v1.32 latency=0
> > link=no multicast=yes port=twisted pair
> > ?? ?? ?? ??resources: irq:17 memory:d58d0000-d58dffff memory:d58e0000-d58effff
> > memory:d58f0000-d58fffff memory:da040000-da07ffff
> > -----------------------------------------------------------------------------------------------------------------------------------
> >
> > Could you please take a look and help me bring up the interface "enp8s0d1"?
> >
> > Thank you very much!
> >
> > Best,
> > Jinghua Wang
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "cloudlab-users" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email
> > to cloudlab-user...@googlegroups.com.
> > To view this discussion visit https://groups.google.com/d/msgid/cloudlab-users/
> > 8d33e398-61df-4e88-aa72-9b00cb2958fcn%40googlegroups.com.
>
> --
> You received this message because you are subscribed to the Google Groups "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cloudlab-user...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/cloudlab-users/20250828225347.GX57121%40flux.utah.edu.

Jinghua Wang

unread,
Aug 28, 2025, 10:50:13 PM (8 days ago) Aug 28
to cloudlab-users
Got it, thank you so much!

Jinghua

Reply all
Reply to author
Forward
0 new messages