Damaged disk, umount, fsck problems

316 views
Skip to first unread message

Konrád Lőrinczi

unread,
Sep 18, 2017, 5:14:55 AM9/18/17
to Alt-F
Using RC4, DNS-320.

It seems, that I have some errors on my disk, as rsync shows
rsync: read errors mapping ...
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1178) [sender=3.1.2]
kind error for a file.

So I tried to check disk from the admin, but I lost admin GUI connection, also the SSH connection, too (got connection refused error).
I had to pull out the power, and restart.

So I tried from command line:
umount /dev/sdb2
umount: can't umount /mnt/sdb2: Device or resource busy

[root@dns320]# fsck.ext4 /dev/sdb2
e2fsck 1.41.14 (22-Dec-2010)
/dev/sdb2 is mounted.
WARNING!!!  The filesystem is mounted.   If you continue you ***WILL***
cause ***SEVERE*** filesystem damage.
Do you really want to continue (y/n)? no
check aborted.

But if I stop all services from the admin, I lose all connections to box, including ssh & admin gui, too.


Is there any command to stop all services, exept ssh?
How can I umount correctly the sdb2 filesystem, to avoid force umount?


Thanks,
Konrad

João Cardoso

unread,
Sep 19, 2017, 11:26:42 AM9/19/17
to Alt-F


On Monday, 18 September 2017 10:14:55 UTC+1, Konrád Lőrinczi wrote:
Using RC4, DNS-320.

It seems, that I have some errors on my disk, as rsync shows
rsync: read errors mapping ...
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1178) [sender=3.1.2]
kind error for a file.

So I tried to check disk from the admin, but I lost admin GUI connection, also the SSH connection, too (got connection refused error).
I had to pull out the power, and restart.

Pressing the power button for ~7 seconds will not power off the box?
 

So I tried from command line:
umount /dev/sdb2
umount: can't umount /mnt/sdb2: Device or resource busy 

[root@dns320]# fsck.ext4 /dev/sdb2
e2fsck 1.41.14 (22-Dec-2010)
/dev/sdb2 is mounted.
WARNING!!!  The filesystem is mounted.   If you continue you ***WILL***
cause ***SEVERE*** filesystem damage.
Do you really want to continue (y/n)? no
check aborted.

But if I stop all services from the admin, I lose all connections to box, including ssh & admin gui, too.


Is there any command to stop all services, exept ssh?

Not sure (RC4 is old) but if you are already connected with ssh, issuing 'rcall stop' should not drop the connection (when using dropbear, not openssh).
In fact, 'inetd' should not be stopped by 'rcall' neither the webUI http server. Unless you have set them as "server mode" instead of the "inetd mode") -- Services->Network, inetd, dropbear, http.
It is not advisable to have the http running as server when flashing
 
How can I umount correctly the sdb2 filesystem, to avoid force umount?

Besides services that might be using the filesystem preventing its un mounting, if Alt-F packages are installed the filesystem might be used under the woods.
On latter Alt-F versions, under Packages->Alt-F there is the possibility to deactivate that (or disable for the next boot)
Also on latter versions, Under System->Utilities, there is a ReebbotAndCheck button, to force a fsck at power up, even if not scheduled.
In any Alt-F version a fsck is performed at boot and if the fs is not clean a full fsck is performed.

Ah, and RC4.1 is just RC4 with some bug fixes, and RC5 with the "network fix" applied is perfectly safe, I had it running on my home backup box for months (now running the 1.0S(napshot)
 


Thanks,
Konrad

Konrád Lőrinczi

unread,
Sep 19, 2017, 11:55:59 AM9/19/17
to al...@googlegroups.com
Well, using the rcall stop, dropped the SSH connection.
Maybe I use sshd, I don't remember. 


Konrad


--
You received this message because you are subscribed to the Google Groups "Alt-F" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alt-f+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/alt-f.
For more options, visit https://groups.google.com/d/optout.

Konrád Lőrinczi

unread,
Sep 19, 2017, 12:30:19 PM9/19/17
to al...@googlegroups.com
I was able to log in to ssh again.

Got this result:
[root@dns320]# umount /dev/sdb2
[root@dns320]# fsck.ext4 /dev/sdb2
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Device or resource busy while trying to open /dev/sdb2
Filesystem mounted or opened exclusively by another program?
[root@dns320]# mount
rootfs on / type rootfs (rw)
/dev/root on / type squashfs (ro,relatime)
tmpfs on /rootmnt type tmpfs (rw,relatime)
/dev/root on /rootmnt/ro type squashfs (ro,relatime)
aufs on / type aufs (rw,relatime,si=860d644c)
proc on /proc type proc (rw,relatime)
sysfs on /sys type sysfs (rw,relatime)
tmpfs on /tmp type tmpfs (rw,relatime,size=160768k)
devpts on /dev/pts type devpts (rw,relatime,mode=600)
/dev/loop0 on /rootmnt/sqimage type squashfs (ro,relatime)
/dev/md0 on /mnt/md0 type ext4 (rw,relatime,data=ordered)
[root@dns320]# fsck.ext4 /dev/sdb2
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Device or resource busy while trying to open /dev/sdb2
Filesystem mounted or opened exclusively by another program?


So it seems, I was able to unmount the sdb2, but fsck still says, it is busy. 

Any idea? 


Thanks, 
Konrad

João Cardoso

unread,
Sep 19, 2017, 2:48:25 PM9/19/17
to Alt-F
What does your RAID /dev/md0 components are? sda2/sdb2? then they are claimed by the RAID and can't (and shouldn't) be checked.
You can only fsck filesystems, not devices. You certainly want to check /dev/md0.


Any idea? 


Thanks, 
Konrad

Konrád Lőrinczi

unread,
Sep 19, 2017, 4:10:56 PM9/19/17
to al...@googlegroups.com

Trying to explain the history:
1) I had a hdd with errors /dev/sda (old hdd).

2) So I added a new hdd, which became /dev/sdb (new hdd1).

3) Mirrored partition tables (sda1 & sda2) of /dev/sda to /dev/sdb.

4) Now I removed old hdd /dev/sda.

5) Moved new hdd1 to left bay, so it became /dev/sda.

6) Made a raid1 from /dev/sda2 with wizard.

7) Added new hdd2 to the right bay as /dev/sdb. 

8) Added new hdd2 /dev/sdb2 as raid array member. 
This was the point, when the raid size was complained by fsck. 

9) So I created a new clean raid1 with /dev/sda2 and /dev/sdb2. 

10) I removed new hdd2 from  /dev/sdb

11) Added old hdd as /dev/sdb. 

12) unmounted old hdd filesystem /dev/sdb2

13) fsck.ext4 /dev/sdb2
Got error, that device is busy. 


Does it help understanding, what happened? 



Best regards, 
Konrad Lorinczi 

Konrád Lőrinczi

unread,
Sep 20, 2017, 12:44:44 AM9/20/17
to al...@googlegroups.com
Trying to explain the history (v2.0 as I modified something):
1) I had a hdd with errors /dev/sda (old hdd, left bay).

2) So I added a new hdd, which became /dev/sdb (new hdd1, right bay).

3) Mirrored partition tables (sda1 & sda2) of /dev/sda to /dev/sdb.

4) Now I removed old hdd /dev/sda from left bay.

5) Moved new hdd1 from right to left bay, so it became /dev/sda.

6) Made a raid1 array member from /dev/sda2 with wizard, so I had a degraded /dev/md0 raid1.

7) Added new hdd2 to the right bay as /dev/sdb. 

8) Added new hdd2 /dev/sdb2 as raid array member. 
This was the point, when the raid partition size was complained by fsck. 

9) So I created a new clean raid1 array /dev/md0 with /dev/sda2 and /dev/sdb2 partition members.

10) I removed new hdd2 from /dev/sdb (right bay). 
Now I have a degraded raid1 array with only one member, /dev/sda2 on left. 

11) Added old hdd as /dev/sdb to the right bay. 

12) copied files with rsync from /dev/sdb2 (old hdd) to /dev/md0 (degraded raid1 on left bay). 
One file was not possible to be rsync-ed, because it is not readable on old hdd. This is why I would like to execute fsck on /dev/sdb2 to fix that anomaly. 

13) unmounted old hdd filesystem /dev/sdb2 (right bay) to execute fsck on it. Usually I was not able to unmount it. 

14) after your suggestion to use 'rcall stop', I was able to unmount it. 

15) fsck.ext4 /dev/sdb2 (right bay). 
Got error, that device is busy. 


Does it help understanding, what happened?


/dev/md0 is a degraded raid1 array on left bay, /dev/sdb2 (old hdd) is on the right bay. 

Is ALT-F locking /dev/sdb2 because on left there is a degraded raid1 array member hdd and misses the /dev/sdb2?
But /dev/sdb2 does not even have a raid partition, just Linux partition.  



Best regards, 
Konrad Lorinczi 

João Cardoso

unread,
Sep 20, 2017, 2:26:32 PM9/20/17
to Alt-F


On Wednesday, 20 September 2017 05:44:44 UTC+1, Konrád Lőrinczi wrote:
Trying to explain the history (v2.0 as I modified something):
1) I had a hdd with errors /dev/sda (old hdd, left bay).

2) So I added a new hdd, which became /dev/sdb (new hdd1, right bay).

3) Mirrored partition tables (sda1 & sda2) of /dev/sda to /dev/sdb.

4) Now I removed old hdd /dev/sda from left bay.

5) Moved new hdd1 from right to left bay, so it became /dev/sda.

6) Made a raid1 array member from /dev/sda2 with wizard, so I had a degraded /dev/md0 raid1.

7) Added new hdd2 to the right bay as /dev/sdb. 

8) Added new hdd2 /dev/sdb2 as raid array member. 
This was the point, when the raid partition size was complained by fsck. 

9) So I created a new clean raid1 array /dev/md0 with /dev/sda2 and /dev/sdb2 partition members.

10) I removed new hdd2 from /dev/sdb (right bay). 
Now I have a degraded raid1 array with only one member, /dev/sda2 on left. 

11) Added old hdd as /dev/sdb to the right bay. 

12) copied files with rsync from /dev/sdb2 (old hdd) to /dev/md0 (degraded raid1 on left bay). 
One file was not possible to be rsync-ed, because it is not readable on old hdd. This is why I would like to execute fsck on /dev/sdb2 to fix that anomaly. 

13) unmounted old hdd filesystem /dev/sdb2 (right bay) to execute fsck on it. Usually I was not able to unmount it. 

14) after your suggestion to use 'rcall stop', I was able to unmount it. 

15) fsck.ext4 /dev/sdb2 (right bay). 
Got error, that device is busy. 


Does it help understanding, what happened?

I'm sorry to not provide a receipt. Nobody likes to play with other people data.

There are too many steps and what you say is what you think you have done. Mixing webUI and command line command only worsen things.
E.g., when you say you created a RAID you don't say if you put a filesystem on it. If using Disk->Wizard, it repartitions the disk, don't allow you to select the RAID components and it puts one fs on the created RAID; but if using Disk->RAID it allows you to create a RAID specifying its components but does not put a fs on the RAID. And if you used the command line, well, that is much more complex and requires the verbatim commands used, its output and sequence.
I don't understand, e.g., why did you create a RAID with two components in step 9 (did you put a fs on it afterwards?) to remove one of the disks on step 10. OK, it might have been an intermediate step, but did you Fail and Remove that disk before removing it from the RAID? (Disk->RAID, Component operations, Fail/Remove/Clear) If you didn't, RAID info is still available on it and it might be automounted the next time you plug it.

The only reason I see for you to be able to unmount sdb2 in step 14 and not be able to fsck it on step 15 is because it belongs to a RAID. Alt-F plays no role on that.

You can see the current RAID situation by executing the commands:
cat /proc/mdstat # lists all md devices, either active or not. Use them in the following command
mdadm --detail /dev/md<your-md-device> # or using the webUI Disk->RAID, RAID Operation, Details
mdadm --examine /dev/sd<your-disk-name-and-partition> # or using the webUI Disk->RAID, Component Operations, Partition

You can created a RAID even on a non-RAID partition (as D-Link fw does). Doing it the right way is just a matter of discipline, and the Alt-F webUI enforces that. But the command line allows it. Also, don't login the box while using the webUI or at least don't change directory using 'cd', as that might avoids being able to unmount disks.

From your description, sdb2 contains your data. Does it has an Alt-F folder at its base directory? Use mdadm --examine it to determine if it has RAID info on it. You can try the 'eject sdb' Alt-F command which will try to do some magic to release it (stopping RAIDs, unmounting, etc).
After being sure it has no Alt-F folder at its base (if it has, rename it to something else) nor has RAID info on it (use 'mdadm --zero-superblock /dev/sdb2' to remove it), and reboot the box.
And remember that a RAID component can seems to have a filesystem (or part of it) on it, but you should avoid mounting it. The 'blkid' command will list what *might* be filesystem on every box device. E.g, on my RAID1 md1 and its sda2/sdb2 components appears as ext3 (with the same UUID):

/dev/md1: UUID="c0266379-f8a4-42b9-a9a2-c1ad32fd8e6a" TYPE="ext3" 
/dev/sda2: UUID="c0266379-f8a4-42b9-a9a2-c1ad32fd8e6a" TYPE="ext3" 
/dev/sdb2: UUID="c0266379-f8a4-42b9-a9a2-c1ad32fd8e6a" TYPE="ext3" 

sorry I can't help more



/dev/md0 is a degraded raid1 array on left bay, /dev/sdb2 (old hdd) is on the right bay. 

Is ALT-F locking /dev/sdb2 because on left there is a degraded raid1 array member hdd and misses the /dev/sdb2?
But /dev/sdb2 does not even have a raid partition, just Linux partition.  



Best regards, 
Konrad Lorinczi 

Reply all
Reply to author
Forward
0 new messages