Unable to See Mountpoint for Specified Blockstore on c220g5

33 views
Skip to first unread message

Md Hasanur Rashid

unread,
Sep 10, 2024, 1:12:53 PM9/10/24
to cloudlab-users
Hi,

I have created profile to allocate 100 GB blockstore with the intention of saving future installation as dataset. Following is the relevant part in the profile that I have used to create the blockstore with custom size:


# Lustre Cluster
for i in range(params.n):
    node = pg.RawPC("server"+str(i))
    node.hardware_type = params.t
    node.disk_image = disk_image_centOS_8
    bs = node.Blockstore("bs_server" + str(i), "/custom-install")
    bs.size = "100GB"
   
    iface = node.addInterface("if"+str(i))
    link.addInterface(iface)
    rspec.addResource(node)

for i in range(params.m):
    node = pg.RawPC("node"+str(i))
    node.hardware_type = params.t
    node.disk_image = disk_image_centOS_8
    bs = node.Blockstore("bs_client" + str(i), "/custom-install")
    bs.size = "100GB"
   
    network_index = i + params.n
    iface = node.addInterface("if"+str(network_index))
    link.addInterface(iface)
    rspec.addResource(node) 

However, when I instantiate this profile in c220g5 machine, I don't see the mountpoint. Here's some relevant output on that:

[root@server0 ~]# lsblk
NAME                MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                   8:0    0 447.1G  0 disk
├─sda1                8:1    0    16G  0 part /
├─sda2                8:2    0     3G  0 part
├─sda3                8:3    0     3G  0 part [SWAP]
└─sda4                8:4    0 425.1G  0 part
sdb                   8:16   0   1.1T  0 disk
└─emulab-bs_server0 253:0    0  93.1G  0 lvm  
[root@server0 ~]# df -h
Filesystem                           Size  Used Avail Use% Mounted on
devtmpfs                              94G     0   94G   0% /dev
tmpfs                                 94G     0   94G   0% /dev/shm
tmpfs                                 94G   18M   94G   1% /run
tmpfs                                 94G     0   94G   0% /sys/fs/cgroup
/dev/sda1                             16G  3.2G   12G  22% /
ops.wisc.cloudlab.us:/proj/dirr-PG0  100G   25G   75G  25% /proj/dirr-PG0
ops.wisc.cloudlab.us:/share           50G  2.1G   48G   5% /share
tmpfs                                 19G     0   19G   0% /run/user/20006
[root@server0 ~]#

Follow is the information to the current experiment that I am running:

Name: mrashid2-216751
State: ready
Profile: lustre_image_creation_CentOS_8
Creator: mrashid2
Project: DIRR

When I created with c220g2 machines before, I could see the mountpoint. But not with the c220g5. Kindly let me know how I can resolve this issue. Looking forward to hearing back from you.

Best regards,
Hasan

Mike Hibler

unread,
Sep 11, 2024, 2:25:40 PM9/11/24
to cloudla...@googlegroups.com
Both nodes now seem to have a /custom-install filesystem mounted.
Though I see that both machines were rebooted within the last hour, did
you do something?
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/
> cloudlab-users/dd0af47f-b3f4-429b-9f52-73882428016an%40googlegroups.com.

Message has been deleted

Md Hasanur Rashid

unread,
Sep 13, 2024, 10:58:12 AM9/13/24
to cloudlab-users
Hi Mike,

I have aborted that experiment. The reason the mount point was not created is because of a faulty HDD. Since issues with hardware is beyond my scope of capabilities, I have abandoned the experiment. Following is a summary of what lead to that conclusion:

Following is the console log that shows bs_server0 block storage is not properly mounted: 


[ 23.396170] sh[1800]: disk0: local disk at /dev/sda, exists^M 
[ 23.396223] sh[1800]: Checking 'disk1'...^M 
[ 23.396268] sh[1800]: disk1: local disk at /dev/sdb, exists^M 
[ 23.396318] sh[1800]: Checking 'bs_server0'...^M 
[ 23.403198] sh[1800]: bs_server0: mount of /dev/emulab/bs_server0 missing from fstab; sanity checking and re-adding...^M 
[ 23.410173] sh[1800]: *** bs_server0: unknown or no FS on /dev/emulab/bs_server0^M 
[ 23.410228] sh[1800]: *** Storage device 'bs_server0' incorrectly configured, doing nothing^M 
[ 23.410283] sh[1800]: *** /usr/local/etc/emulab/rc/rc.storage:^M 
[ 23.410322] sh[1800]: Could not process storage commands!^M 
[ 23.412560] sh[1800]: Failed running rc.storagelocal (512)! at /usr/local/etc/emulab/libsetup.pm line 1296.^M

Using dmesg to check if there are any relevant errors related to /dev/emulab/bs_server0 shows the following:

[root@server0 ~]# dmesg | grep -i sdb
[ 11.407577] sd 6:0:1:0: [sdb] 2344225968 512-byte logical blocks: (1.20 TB/1.09 TiB)
[ 11.423959] sd 6:0:1:0: [sdb] Write Protect is off
[ 11.443191] sd 6:0:1:0: [sdb] Mode Sense: d3 00 10 08
[ 11.467568] sd 6:0:1:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 11.493131] sd 6:0:1:0: [sdb] Attached SCSI disk
...
[40279.520950] sd 6:0:1:0: [sdb] tag#2310 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=1s
[40279.531544] sd 6:0:1:0: [sdb] tag#2310 Sense Key : Medium Error [current]
[40279.539210] sd 6:0:1:0: [sdb] tag#2310 Add. Sense: Read retries exhausted
[40279.546786] sd 6:0:1:0: [sdb] tag#2310 CDB: Read(10) 28 00 00 0c 08 00 00 00 08 00
[40279.555234] blk_update_request: critical medium error, dev sdb, sector 788483 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[40280.898098] sd 6:0:1:0: [sdb] tag#2312 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=1s
[40280.908688] sd 6:0:1:0: [sdb] tag#2312 Sense Key : Medium Error [current]
[40280.916360] sd 6:0:1:0: [sdb] tag#2312 Add. Sense: Read retries exhausted
[40280.923933] sd 6:0:1:0: [sdb] tag#2312 CDB: Read(10) 28 00 00 0c 08 00 00 00 08 00
[40280.932379] blk_update_request: critical medium error, dev sdb, sector 788483 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[root@server0 ~]#


The dmesg output shows multiple instances of "Medium Error" on sdb, which indicates that the drive has bad sectors and is experiencing read failures. These "Medium Error" messages suggest a hardware issue, and the drive is likely failing or already faulty.

Hope this provides the update on the issue.


Best,
Hasan

Mike Hibler

unread,
Sep 13, 2024, 11:21:01 AM9/13/24
to cloudla...@googlegroups.com
What node was this? We will take it out of service til the disk is fixed.
> cloudlab-users/4d276406-3dad-4f68-a820-3c189ea2242cn%40googlegroups.com.

Md Hasanur Rashid

unread,
Sep 16, 2024, 10:25:05 AM9/16/24
to cloudla...@googlegroups.com
Hi Mike,

Following is the node (c220g5-111225) that I suspect has a faulty HDD based on the above analysis.

image.png

Let me know if you need further information.

Best regards,
Hasan

You received this message because you are subscribed to a topic in the Google Groups "cloudlab-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cloudlab-users/x6Z3C8nQmXg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cloudlab-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloudlab-users/20240913152056.GA59218%40flux.utah.edu.
Reply all
Reply to author
Forward
0 new messages