Unable to See Mountpoint

20 views

Skip to first unread message

Md Hasanur Rashid

unread,

Feb 17, 2025, 11:52:30 AM2/17/25

to cloudlab-users

I have two machines where I am unable to see the mountpoint for the dataset which should be mounted during image setup.

Follow is the information to the current experiment that I am running:
Name: mrashid2-241880
State: ready
Profile: lustre_2_15_5_cluster
Creator: mrashid2
Project: DIRR

Following are the two machines that I am having the issue with:
node0 amd173 c6525-25g Utah amd173.utah.cloudlab.us

node7 amd113 c6525-25g Utah amd113.utah.cloudlab.us

Following are the errors I observed from the console log:
[ 7.743029] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)^M
[ 7.751805] mgag200 0000:c2:00.0: [drm] fb0: mgag200drmfb frame buffer device^M
[ 7.760887] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)^M
[ 7.780636] mlx5_core 0000:01:00.0: Supported tc offload range - chains: 4294967294, prios: 4294967295^M
[ 7.784312] ata3.00: failed to read native max address (err_mask=0x1)^M
[ 7.800897] mlx5_core 0000:01:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 basic)^M
[ 7.805148] ata3.00: HPA support seems broken, skipping HPA handling^M
[ 7.806651] ata3.00: ATA-11: MZ7KH480HAHQ0D3, HF58, max UDMA/133^M
[ 7.812117] mlx5_core 0000:01:00.1: firmware version: 16.28.4512^M
[ 7.816970] ata3.00: 937703088 sectors, multi 16: LBA48 NCQ (depth 32), AA^M
[ 7.823198] mlx5_core 0000:01:00.1: 126.016 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x16 link)^M
[ 7.830334] ata4.00: failed to read native max address (err_mask=0x1)^M
[ 7.901827] ata4.00: HPA support seems broken, skipping HPA handling^M
[ 7.903342] ata4.00: ATA-11: MZ7KH480HAHQ0D3, HF58, max UDMA/133^M
[ 7.914567] ata4.00: 937703088 sectors, multi 16: LBA48 NCQ (depth 32), AA^M

[ 20.740603] sh[1893]: Checking Testbed local storage configuration ...^M [ 20.957699] sh[1893]: Checking 'disk0'...^M [ 20.957827] sh[1893]: disk0: local disk at /dev/sda, exists^M [ 20.957897] sh[1893]: Checking 'disk1'...^M [ 20.957965] sh[1893]: disk1: local disk at /dev/sdb, exists^M [ 20.961396] sh[1893]: Checking 'bs_client0'...^M [ 20.968734] sh[1893]: bs_client0: mount of /dev/emulab/bs_client0 missing from fstab; sanity checking and re-adding...^M [ 20.975261] sh[1893]: *** bs_client0: unknown or no FS on /dev/emulab/bs_client0^M [ 20.975316] sh[1893]: *** Storage device 'bs_client0' incorrectly configured, doing nothing^M [ 20.975369] sh[1893]: *** /usr/local/etc/emulab/rc/rc.storage:^M [ 20.975408] sh[1893]: Could not process storage commands!^M [ 20.977600] sh[1893]: Failed running rc.storagelocal (512)! at /usr/local/etc/emulab/libsetup.pm line 1296.^M

A quick search on the error points me to this probable scenario: "A faulty SATA cable or controller issue might be interfering with proper disk detection."

Let me know if you need further information.

Mike Hibler

unread,

Feb 17, 2025, 5:25:54 PM2/17/25

to cloudla...@googlegroups.com

This was caused by loading a large "image-backed" dataset on each node.
The dataset load took long enough on a couple of machines to cause a timeout
in our initial setup process. This resulted in those nodes getting rebooted
(thinking they were hung) and leaving the dataset code in an unexpected
state after reboot. That is something we will have to address.

In the meantime I got the two nodes in question operational by clearing
out the datset state and starting them from scratch:

sudo /usr/local/etc/emulab/rc/rc.storage shutdown
sudo /usr/local/etc/emulab/rc/rc.storage fullreset
sudo /usr/local/etc/emulab/rc/rc.storage boot

Since the nodes are no longer under the initial setup watchdog, they don't
timeout/reboot and the dataset load finishes.

> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/cloudlab-users/
> 369e21d3-7225-4604-a9c4-7ac259cbe215n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages