Unable to See Mountpoint for Specified Blockstore on c220g5

Md Hasanur Rashid

unread,

Sep 10, 2024, 1:12:53 PM9/10/24

to cloudlab-users

Hi,

I have created profile to allocate 100 GB blockstore with the intention of saving future installation as dataset. Following is the relevant part in the profile that I have used to create the blockstore with custom size:

# Lustre Cluster
for i in range(params.n):
node = pg.RawPC("server"+str(i))
node.hardware_type = params.t
node.disk_image = disk_image_centOS_8
bs = node.Blockstore("bs_server" + str(i), "/custom-install")
bs.size = "100GB"

iface = node.addInterface("if"+str(i))
link.addInterface(iface)
rspec.addResource(node)

for i in range(params.m):
node = pg.RawPC("node"+str(i))
node.hardware_type = params.t
node.disk_image = disk_image_centOS_8
bs = node.Blockstore("bs_client" + str(i), "/custom-install")
bs.size = "100GB"

network_index = i + params.n
iface = node.addInterface("if"+str(network_index))
link.addInterface(iface)
rspec.addResource(node)

However, when I instantiate this profile in c220g5 machine, I don't see the mountpoint. Here's some relevant output on that:

[root@server0 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 447.1G 0 disk
├─sda1 8:1 0 16G 0 part /
├─sda2 8:2 0 3G 0 part
├─sda3 8:3 0 3G 0 part [SWAP]
└─sda4 8:4 0 425.1G 0 part
sdb 8:16 0 1.1T 0 disk
└─emulab-bs_server0 253:0 0 93.1G 0 lvm
[root@server0 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 94G 0 94G 0% /dev
tmpfs 94G 0 94G 0% /dev/shm
tmpfs 94G 18M 94G 1% /run
tmpfs 94G 0 94G 0% /sys/fs/cgroup
/dev/sda1 16G 3.2G 12G 22% /
ops.wisc.cloudlab.us:/proj/dirr-PG0 100G 25G 75G 25% /proj/dirr-PG0
ops.wisc.cloudlab.us:/share 50G 2.1G 48G 5% /share
tmpfs 19G 0 19G 0% /run/user/20006
[root@server0 ~]#

Follow is the information to the current experiment that I am running:

Name: mrashid2-216751
State: ready
Profile: lustre_image_creation_CentOS_8
Creator: mrashid2
Project: DIRR

When I created with c220g2 machines before, I could see the mountpoint. But not with the c220g5. Kindly let me know how I can resolve this issue. Looking forward to hearing back from you.

Best regards,
Hasan

Mike Hibler

unread,

Sep 11, 2024, 2:25:40 PM9/11/24

to cloudla...@googlegroups.com

Both nodes now seem to have a /custom-install filesystem mounted.
Though I see that both machines were rebooted within the last hour, did
you do something?

> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/
> cloudlab-users/dd0af47f-b3f4-429b-9f52-73882428016an%40googlegroups.com.

Message has been deleted

Md Hasanur Rashid

unread,

Sep 13, 2024, 10:58:12 AM9/13/24

to cloudlab-users

Hi Mike,

I have aborted that experiment. The reason the mount point was not created is because of a faulty HDD. Since issues with hardware is beyond my scope of capabilities, I have abandoned the experiment. Following is a summary of what lead to that conclusion:

Following is the console log that shows bs_server0 block storage is not properly mounted:

[ 23.396170] sh[1800]: disk0: local disk at /dev/sda, exists^M

[ 23.396223] sh[1800]: Checking 'disk1'...^M

[ 23.396268] sh[1800]: disk1: local disk at /dev/sdb, exists^M

[ 23.396318] sh[1800]: Checking 'bs_server0'...^M

[ 23.403198] sh[1800]: bs_server0: mount of /dev/emulab/bs_server0 missing from fstab; sanity checking and re-adding...^M

[ 23.410173] sh[1800]: *** bs_server0: unknown or no FS on /dev/emulab/bs_server0^M

[ 23.410228] sh[1800]: *** Storage device 'bs_server0' incorrectly configured, doing nothing^M

[ 23.410283] sh[1800]: *** /usr/local/etc/emulab/rc/rc.storage:^M

[ 23.410322] sh[1800]: Could not process storage commands!^M

[ 23.412560] sh[1800]: Failed running rc.storagelocal (512)! at /usr/local/etc/emulab/libsetup.pm line 1296.^M

Using dmesg to check if there are any relevant errors related to /dev/emulab/bs_server0 shows the following:

[root@server0 ~]# dmesg | grep -i sdb
[ 11.407577] sd 6:0:1:0: [sdb] 2344225968 512-byte logical blocks: (1.20 TB/1.09 TiB)
[ 11.423959] sd 6:0:1:0: [sdb] Write Protect is off
[ 11.443191] sd 6:0:1:0: [sdb] Mode Sense: d3 00 10 08
[ 11.467568] sd 6:0:1:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 11.493131] sd 6:0:1:0: [sdb] Attached SCSI disk
...
[40279.520950] sd 6:0:1:0: [sdb] tag#2310 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=1s
[40279.531544] sd 6:0:1:0: [sdb] tag#2310 Sense Key : Medium Error [current]
[40279.539210] sd 6:0:1:0: [sdb] tag#2310 Add. Sense: Read retries exhausted
[40279.546786] sd 6:0:1:0: [sdb] tag#2310 CDB: Read(10) 28 00 00 0c 08 00 00 00 08 00
[40279.555234] blk_update_request: critical medium error, dev sdb, sector 788483 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[40280.898098] sd 6:0:1:0: [sdb] tag#2312 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=1s
[40280.908688] sd 6:0:1:0: [sdb] tag#2312 Sense Key : Medium Error [current]
[40280.916360] sd 6:0:1:0: [sdb] tag#2312 Add. Sense: Read retries exhausted
[40280.923933] sd 6:0:1:0: [sdb] tag#2312 CDB: Read(10) 28 00 00 0c 08 00 00 00 08 00
[40280.932379] blk_update_request: critical medium error, dev sdb, sector 788483 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[root@server0 ~]#

The dmesg output shows multiple instances of "Medium Error" on sdb, which indicates that the drive has bad sectors and is experiencing read failures. These "Medium Error" messages suggest a hardware issue, and the drive is likely failing or already faulty.

Hope this provides the update on the issue.

Best,

Hasan

Mike Hibler

unread,

Sep 13, 2024, 11:21:01 AM9/13/24

to cloudla...@googlegroups.com

What node was this? We will take it out of service til the disk is fixed.

> cloudlab-users/4d276406-3dad-4f68-a820-3c189ea2242cn%40googlegroups.com.

Md Hasanur Rashid

unread,

Sep 16, 2024, 10:25:05 AM9/16/24

to cloudla...@googlegroups.com

Hi Mike,

Following is the node (c220g5-111225) that I suspect has a faulty HDD based on the above analysis.

Let me know if you need further information.

Best regards,
Hasan

You received this message because you are subscribed to a topic in the Google Groups "cloudlab-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cloudlab-users/x6Z3C8nQmXg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cloudlab-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloudlab-users/20240913152056.GA59218%40flux.utah.edu.

Reply all

Reply to author

Forward