Shrinking Filesystem - Stuck at 75% for days.....

253 views
Skip to first unread message

Steve

unread,
Nov 13, 2017, 7:36:56 AM11/13/17
to Alt-F
Hi;

I am in the process of hopefully shrinking my DNS-320L's disk sizes from 2.0TB to 1.5TB.  I first shrunk the partition down to 1.4TB successfully but I needed to shrink it more (1.36tb to be exact), so I deleted more unnecessary files and tired to shrink it again.  This time it works for a couple of hours, then get's stuck at "shrinking... step 2: 75%".  Running tasks show this process (even when it's stuck) "root     R N   178m  72%  91% resize2fs -p /dev/md0 -M".

I have left this for two days and it never moves forward.  I ran a check / repair on both the RAID and filesystem and it completes without finding any errors.
Anyone have a idea on what's happening here?  

Steve

unread,
Nov 13, 2017, 9:52:07 AM11/13/17
to Alt-F
Also neglected to mention, when it's stuck at 75% there is no disk activity, but the process still shows as active - "root R N   178m  72%  91% resize2fs -p /dev/md0 -M"

João Cardoso

unread,
Nov 15, 2017, 1:46:49 PM11/15/17
to Alt-F
I can't help, and probably it is now too late, but you should not check a filesystem while it is being resized. Allowing any filesystem operation or partition manipulation in that circumstances is a webUI bug. That is because some webUI actions might deploy a fs mount, which should not be done.
The first step in the resize operation is a file system check, only if it succeeds the resize is done.

Your 320L should have enough memory (256MB) to handle the resize, and it did the first time you did it. The 'resize' program was using 178MB of memory and was working hard enough, with 91% of CPU being used.
As yet there was no disk activity, it looks like the resize program was stuck in a loop, might be a bug in the resize program, can't tell.

Have you shrink the partition after the first resize succeeds? If you did, then the resize program had less free space to reorganize files on the second time, and that might be the reason why it was not able to finish or apparently getting stuck?

I believe that the resize program is robust enough to be killed. What filesystem (ext2/3/4) was it running on?

sorry, but can't help more.

Steve

unread,
Nov 18, 2017, 3:33:21 PM11/18/17
to Alt-F
Hi João;

Thanks for the reply.  My problem turned out to be a faulty DNS-320L.  It was running really hot, and when I rebooted it I get the flashing blue boot LED and nothing.  I've tried to revive it but there is no web UI starting.  I have gone back to my DNS-323.  Not sure if you have any suggestions (links) on how I can possibly fix my DNS-320L.  It was nice while it worked.

Steve

Steve

unread,
Nov 19, 2017, 2:48:45 PM11/19/17
to Alt-F
João;

Just updating you on what's going on:
I moved my drives from my failed DNS-320L and put them in an old DNS-323 I had.  I tried (on Version 0.1RC5) to shrink my filesystem and once again it stuck at 75%.  When it's in this loop the drives start to heat up.  I thiink that's what killed my DNS-320L as I had left it for a day and a half to see if it would get past the 75% mark.

I've tried testing the disks, rebuilding the RAID array, repairing both the FS and the RAID array and nothing seems to change this loop.  Next I guess I'll have to try different drives?

On Wednesday, 15 November 2017 13:46:49 UTC-5, João Cardoso wrote:

João Cardoso

unread,
Nov 20, 2017, 4:37:49 PM11/20/17
to Alt-F
Thanks for the update Steve.

You have two separate issues:
1-the overheating
2-the stuck shrink

1-Is your 323 a rev-C1 box? Did you notice at what speed was the fan turning when the drives get too hot?

Did you tune the 'sysctrl' settings (Services->System, sysctrl, Configure) for your environment?
On the 323, the difference between the system temperature and the drives temperature is not that different, while in the 320L it might be up to 10 ºC different. As the fan speed is controlled by the system temperature, which is read at the box bottom main board which is cooler, on the 320L it is possible for the fan to turn too slow or even to stop, while the disks are hot and needing for a higher fan speed.
That is the reason why the sysctrl webUI as an option to "also take hdd temperature into account" checkbox (if the 'hddtemp' service is running, Services->System, hddtemp, but there is a catch, disks will not spin down).

In the event that you had "stop all services" (System->Utilities), then sysctrl is no longer controlling the fan speed, but for safety it leaves it turning at half speed. That might however be insufficient to cool the hdds when they are working for a long time.

So, I can understand why the boxes become too hot if you didn't anticipate and tune the fan speed control accordingly, e.g., setting it to always high, or adjusting the fan speed trip points for your local environment.

I have checked the shrink/resize code and there is nothing in it that might by itself cause the overheating, other than generating more heat for a long time from the disks than usual. The fan control or speed is not changed.

2-Regarding the stuck shrink operation:
Have you shrunk the partition (Disk->Partitioner) after the first successful shrink (*NOTE*)?. If you have, undo it by enlarging the partition first (doesn't need to be set exactly to the original size), then enlarge the filesystem (enlarging the filesystem is a fast operation). Then shrink the filesystem again. As now there is enough space in the filesystem and you have deleted files, perhaps the shrink succeeds.

If you have not shrink the partition (*NOTE*), I can only advise you to try to do it on a linux computer with a more recent resize program version (e2fsprogs). It might be a bug deployed for some particular reason.

NOTE: your posts are a bit ambiguous regarding RAID and partitions. If you have the filesystem on a RAID, operation sequences are a bit different: shrink filesystem -> shrink RAID -> shrink partitions. The enlarge sequence is: enlarge partitions -> enlarge RAID -> enlarge filesystem. 

If you have not resized the RAID nor partitions, there should be no need to play with the raid, there should be nothing wrong with it, but only logs/configuration could tell.

Steve

unread,
Nov 20, 2017, 9:11:12 PM11/20/17
to Alt-F
Hi João;

I guess I should have stated my ultimate goal was to downgrade my 2 2tb drives to 1.5tb drives.  That was the reason for me attempting to shrink the FS, then RAID, then swap disks.  I suspect the failure was on the DNS-320L's side, a faulty temp sensor or fan possibly as the whole thing was very hot and the fan was not turning when I found it like that.  When I rebooted it after that I get the "blinking blue light of death" on it so I believe it's a write off.  I bought it used so no big deal.  My DNS-323 has been rock solid since I started using ALT-F on it, I put my 2tb drives in and converted to non raid, now I am in the process of copying the 2tb contents to the new disks.

Anyway, thanks for your help and this awesome ALT-F solution to an otherwise crappy NAS offering from D-Link.

Cheers!
Reply all
Reply to author
Forward
0 new messages