DNS 323 B1 becomes inaccessible shortly after booted

195 views
Skip to first unread message

Csaba Mogyorosi

unread,
Jan 31, 2018, 6:22:38 PM1/31/18
to Alt-F

Dear fellow Alt-f users.
I am new to Alt-f, just started my journey few days ago. Let me try to summarize what happened and what I have tried so far and what is the current status is:
- Installed Alt-f 1.0 onto a DNS323 B1 fw 1.10
- I have a Raid1 600GB and a JBOD on the rest of the 2x1TB disks
- after the initial fsck I had a /dev/sdb4 mounted as RO
- all of a sudden I lose every connection WebUI, telnet, SSH I do not use SMB so I am not sure about that
- the only connection remains is the already opened ssh session, that stays on, but I cannot open a new one
- if I try to shutdown it does not shutdown, hangs with two steady purple/ping lights
- Alt-f is running as the leds are indicating as described in the wiki, however it does not restart nor shutdown the machine after 3-6 seconds
- if I try the 10<>20 sec reset, leds are fine, but no possibility to telnet on port 26 (Connection Closed immediately). I am using static IP.
- the only choice I have is to unplug the cable at this stage
- if I reboot with the disks in the box it will boot, but it closes all the connections from the beginning so I have no chance to log in at all (WebUI, ssh, telnet), however if I remove the disks before I reboot, I can access the WebUI and do a normal shutdown. After the shutdown I can insert the disks and boot. Until the fsck runs, I can use the WebUI and ssh as well to access the machine. Then it blocks all connections except for the running ssh session.
- I checked, inetd is running (ps -ef|grep inetd or top), tried to execute rcinetd start, but the command does not execute, gives me an error. tried "sh S41inetd restart" it did not finish, I had to ctrl C to get back the prompt. At this stage I gave a shutdown and I have seen the following:
cleanboot:stop
Warning sshd: Not running
Warning telnetd: Not running
- lost connection again
- I actually cannot do a proper shutdown or reboot which I don't understand
Any suggestion would be appreciated. Thank you in advance
Csaba

João Cardoso

unread,
Feb 1, 2018, 12:33:10 PM2/1/18
to Alt-F
Those are not Alt-F messages, I guess you have a ffp install or leftover (perhaps in sdb4) that is causing conflicts. Logs would provide more info
Do what you use to do with disks plugged and a running ssh, then execute the command

rcffp stop
rcffp disable
loadsave_settings
-sf
reboot


Did it works? If yes, disable all ffp enabled services (Services->User, ffp configure), or simply uninstall ffp (Packages->ffp) and, if you really need it, reinstall it under Alt-F.

Didn't work? We need logs. The 'logread > /mnt/<somedevice>/my.log' command will generate a log and you have to retrieve it from the box. The <somedevice> depends on your system, it might be 'md0', or 'md1' or 'sda2'... the 'ls /mnt/' command will tell you what is available

Csaba Mogyorosi

unread,
Feb 1, 2018, 2:17:07 PM2/1/18
to Alt-F
Hi Joao, thank you so much for you reply!  You are right I had in the Dlink firmware time ffp and fun_plug installed, but before the upgrade I have renamed them to ffp.orig and fun_plug.orig. I cannot remove them from the WebUI. I have executed the commands you suggested and so far it did not block my connection to the box :), I am already happy!!! here is the screenshot:

As for the log, I could create it on md0 and attached.

Now I have degraded and RO disks so I will manually run fsck.ext2 -fy /mnt/md1 and also on /mnt/sdb4. Next I am planning to reboot. If all goes well the next I would like to do is to convert the ext2  to ext4 and destroy Raid1 and JBOD and just have two standard ext4 disks. What order do you recommend doing it?

Current disks status 



Many thanks in advance, I really appreciate your help.


Csaba

my.log

João Cardoso

unread,
Feb 2, 2018, 1:40:32 PM2/2/18
to Alt-F


On Thursday, 1 February 2018 19:17:07 UTC, Csaba Mogyorosi wrote:
Hi Joao, thank you so much for you reply!  You are right I had in the Dlink firmware time ffp and fun_plug installed, but before the upgrade I have renamed them to ffp.orig and fun_plug.orig.

Well, the logs says that a ffp folder is found:
Feb  1 19:03:22 DNS-323-EAB29D user.notice hot_aux: ffp directory found in md0
 
But at this point I don't think that to be the (only?) issue. From the logs, smartd (SMART) says that your sdb (left) disk is at serious risk:

Feb  1 18:37:22 DNS-323-EAB29D daemon.crit smartd[3186]: Device: /dev/sdb [SAT], 1464 Currently unreadable (pending) sectors
Feb  1 18:37:22 DNS-323-EAB29D daemon.crit smartd[3186]: Device: /dev/sdb [SAT], 142 Offline uncorrectable sectors
 
And latter on, there are several attempts to read from it and that fails.
That is the reason why sdb4 is mounted RO and md0 (which uses both sda and sdb) is degraded.
md1 which is JBOD (probably concatenating  partitions of sda and sdb) is also mounted RO.
swap setup isn't show neither in the log or screenshots, but usually it uses both sda and sdb, so the "swap" errors in the log must come from sdb also.

So I think that your sdb (left) disk has failed and it causes all issues. Power off, remove it and poweron again. You should see no more issues, and your md0 should be degraded as it is now.
Backup it as you have no redundancy, and as both disks are from the same brand and type and probably made, bough and used similarly, the probably of occurring also an error in sda has increased.
Unfortunately your md1 will not appear anymore -- JBOD (linear RAID) is almost like a RAID0 -- one disk fails, the array fails.

However, you might be able to recover the portion of md1 data that resides in sda, so you might want to do that also.

In order to be able to work with both disks (trying to recover part of the data in md1), you have to disable swap in sdb, or errors in swap will turn the system unusable.
You can do that from the command line. After verifying that sdb1 is the sdb disk partition used for swap, by inspecting the 'cat /proc/swaps' command output, you can turn sdb1 swap off by using the 'swapoff /dev/sdb1' command. After that the system should become stable enough for you to try to do the backups.

I cannot remove them from the WebUI. I have executed the commands you suggested and so far it did not block my connection to the box :), I am already happy!!! here is the screenshot:

As for the log, I could create it on md0 and attached.

Now I have degraded and RO disks so I will manually run fsck.ext2 -fy /mnt/md1 and also on /mnt/sdb4.


fsck will fail, as it needs swap and swap uses sdb. You have to disable swap on sdb before trying anything else. Don't have many expectations regarding md1... you might be able to recover the first half of it.
 

Next I am planning to reboot. If all goes well the next I would like to do is to convert the ext2  to ext4 and destroy Raid1 and JBOD and just have two standard ext4 disks. What order do you recommend doing it?


Disable swap on sbd, backup md0, try to backup what is possible in md1. sdb4 most probably has nothing special on it, it is used by d-link only and hidden to the user.
Notice that as the filesystems are mounted read only, so no more harm can happens to the data they contain.
Only after backup you should try to do anything
Luck, and don't hesitate to made further questions. Full logs are essential to avoid turnaround questions. System->Utility, Logs -- kernel and system logs and system configuration.

Csaba Mogyorosi

unread,
Feb 2, 2018, 2:40:43 PM2/2/18
to Alt-F
Hi Joao,

Thank you for your precise answer and description. I have done many things ever since my last message on the box.
- YOU and the logs are absolutely right, after installing Alt-f, I have renamed back the ffp from ffp.orig which was a mistake and I thought it was only needed until the migration is done.
- rcffp disable was the key as afterwards I had no more connection issues and I could start doing the disk troubleshooting. So big thank you here!!! 
- since I had a degraded Raid1 and a RO md1 and RO sdb4 I did the following:
   - I was able to fix md1 unmounting and run a fsck -fy /dev/md1
   - I did the same for sdb4, it gave me a lot of errors, but at the end it become RW
- setting back the Raid1 was the most difficult, I did the following here:
    - under Disk/Raid I have selected sdb2 (this was the failing part of the raid1) under Component Options and put it to "fail"
    - I could not do much with it so I selected "clear" as a next step - now it disappeared from Component Options partitions
    - from command prompt run fsck -fy /dev/sdb2 and could mount the drive, still didn't know what do at this point as it did not appeared as sdb2 under Disk/Raid and I could not figure how to partition it back 
    - this time I was able to see full content of md0 and md1 data, so did the backup quickly - all my data are saved!!! you are right there is basically nothing under sdb4 and sda4. All data that was only on DNS 323 now backed up!
    - used the command mdadm --add /dev/md0 /dev/sdb2 it added back the missing part of the Raid1 and started a long about 2.5 hours process to rebuild the raid1 partition. It has gone through successfully 
    - I have rebooted the system and now I have an error free healthy system!!!!

As you mentioned these disks might need some maintenance, so my plan is to reformat them and create 2 standard ext4 disks. Please note that I will only use this box as a backup for a Qnap NAS, so for simplicity I will just use standard disks, unless you say it is not a wise idea. I will place it to a remote location as a backup nas.

Now I am here to start the disk maintenance and set up the box as new. Do you think if I go to Disk/Wizard and select "One big filesystem per disk, for easy management (standard)" and ext4
is the only thing I should do? Will it reformat and do everything for me?? What is your recommendation in regards to setting up the box as new and the formats (RAIDx, standard disk, etc) and also ext3 ot 4?

Thank you again!!! I can hardly wait for your recommendation as to how to move forward.

João Cardoso

unread,
Feb 2, 2018, 8:18:59 PM2/2/18
to Alt-F


On Friday, 2 February 2018 19:40:43 UTC, Csaba Mogyorosi wrote:
Hi Joao,

Thank you for your precise answer and description. I have done many things ever since my last message on the box.
- YOU and the logs are absolutely right, after installing Alt-f, I have renamed back the ffp from ffp.orig which was a mistake and I thought it was only needed until the migration is done.
- rcffp disable was the key as afterwards I had no more connection issues and I could start doing the disk troubleshooting. So big thank you here!!! 
- since I had a degraded Raid1 and a RO md1 and RO sdb4 I did the following:
   - I was able to fix md1 unmounting and run a fsck -fy /dev/md1
   - I did the same for sdb4, it gave me a lot of errors, but at the end it become RW
- setting back the Raid1 was the most difficult, I did the following here:
    - under Disk/Raid I have selected sdb2 (this was the failing part of the raid1) under Component Options and put it to "fail"
    - I could not do much with it so I selected "clear" as a next step - now it disappeared from Component Options partitions
    - from command prompt run fsck -fy /dev/sdb2 and could mount the drive, still didn't know what do at this point as it did not appeared as sdb2 under Disk/Raid and I could not figure how to partition it back 
    - this time I was able to see full content of md0 and md1 data, so did the backup quickly - all my data are saved!!! you are right there is basically nothing under sdb4 and sda4. All data that was only on DNS 323 now backed up!
    - used the command mdadm --add /dev/md0 /dev/sdb2 it added back the missing part of the Raid1 and started a long about 2.5 hours process to rebuild the raid1 partition. It has gone through successfully 
    - I have rebooted the system and now I have an error free healthy system!!!!

As you mentioned these disks might need some maintenance, so my plan is to reformat them and create 2 standard ext4 disks. Please note that I will only use this box as a backup for a Qnap NAS, so for simplicity I will just use standard disks, unless you say it is not a wise idea. I will place it to a remote location as a backup nas.

Now I am here to start the disk maintenance and set up the box as new. Do you think if I go to Disk/Wizard and select "One big filesystem per disk, for easy management (standard)" and ext4
is the only thing I should do? Will it reformat and do everything for me?? What is your recommendation in regards to setting up the box as new and the formats (RAIDx, standard disk, etc) and also ext3 ot 4?

Glad it is OK now.

But ffp can't be blamed for the logged disk errors and smartd reports.
I recommend you to do long SMART tests, Disk->Utilities, Health, and try to judge if you can trust the disk. The question is if that was a one time event or if errors are developing and starting to grow.

The Disk Wizard will repartitions and reformat the disk(s), but that is not going to solve the (growing?) disk errors.
It is possible to force pending disk sectors in error to be remapped to good sectors up to a certain extent, but that only makes sense if you are confident that the good sectors will not become bad soon.
Forcing a write on the pending bad sectors will remap them if possible, and writing something to every byte in the disk will make sure that no one will be left out. Reformatting does not do that, as creating new filesystems only writes to certain sectors; the e2fsprogs-badblocks alt-f package does that, and you can use it in a data destructive way, as it is much faster (although it is still very slow) and you have backups.

Csaba Mogyorosi

unread,
Feb 3, 2018, 3:09:40 AM2/3/18
to al...@googlegroups.com
Thank you Joao!!!

Following your instructions, I have started the Smart long tests one disk at a time. It is running now on sda and will last about 3 hours. Will do it on sdb when finished on sda and post the logs. sda logs are uploaded,
What are the steps I need to follow to run badblocks? I have few questions:

Shall I install e2fsprogs-badblocks alt-f package on a USB drive and run it from there? now it is installed on sda4
What command options I need to use and on which partitions or disks? badblocks /dev/md1 and then md0, sda4, sdb4? or I run them as /dev/sda and /dev/sdb?
What options/parameters you recommend using for the badblocks command (would be great to have a log afterwards)? 
What will happen to Alf-t, if I over wright the whole disks? How do I make sure that I do not brick the box?

Thanks a lot.
sda_smart.log
sda msart long test.txt
sdb_smart.log
sdb msart long test.txt

Csaba Mogyorosi

unread,
Feb 3, 2018, 12:19:29 PM2/3/18
to Alt-F
I am planning to run:
#badblocks -wsv -o /mnt/PENDRM-5LYV/sdb_badblocks.log /dev/sdb

it will wipe my sdb drive and send the sdb_badblocks.log to the connected pendrive. When its done, in about a day. I could go to Disk/Wizard and select "One big filesystem per disk, for easy management (standard)" and ext4 to re-image the disks. sda looks ok I am not planning to run the badblocks on it.

Would you agree on the above steps?

Thank you 

João Cardoso

unread,
Feb 3, 2018, 1:22:27 PM2/3/18
to al...@googlegroups.com
The smart test shows that the drives are pretty old, 53509 hours, or about 6 years on 24/7 usage
9 Power_On_Hours          0x0032   027   027   000    Old_age   Always       -       53509
 
and 
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       1263786
 
is also very low, meaning that you probably have a very low spindown timeout, which made the disk heads to retract too often. That is of concern.

The
197 Current_Pending_Sector  0x0032   192   192   000    Old_age   Always       -       1464
is not that bad, but significative, and
 198 Offline_Uncorrectable   0x0030   200   199   000    Old_age   Offline      -       142
also.

The test stopped by a "read failure", on LBA (512 bytes sector) 1990397, near the disk end, so you might want to start badblock a little before that zone first(*), to see if the drive will remap those sectors. If that isn't possible, the better is to throw the disk away. Afterwards a new smart long test (which is *much* faster than badblocks).

Of course, it is possible to collect a list of all badblocks and pass it to mkfs (or run badblock directly from mkfs), but that would imply that no new badblocks would develop in the future.

(*) use "badblocks -other-options last-block  first-block" to do that, but I think that you should use '-b 512' in this case to have a LBA (smart) sector/block (badblocks) correspondence. Or use a bigger badblock block size, which is faster, and made the necessary LBA adjustments.

The badblock manual page:

DESCRIPTION
       badblocks  is  used  to  search for bad blocks on a device (usually a disk parti-
       tion).  device is the special file corresponding to the device  (e.g  /dev/hdc1).
       last-block  is  the  last  block  to be checked; if it is not specified, the last
       block on the device is used as a default.  first-block is an  optional  parameter
       specifying  the  starting  block number for the test, which allows the testing to
       start in the middle of the disk.  If it is not specified the first block  on  the
       disk is used as a default.

       Important  note:  If  the output of badblocks is going to be fed to the e2fsck or
       mke2fs programs, it is important that the block size is properly specified, since
       the block numbers which are generated are very dependent on the block size in use
       by the filesystem(*).  For this reason, it is strongly recommended  that  users  not
       run  badblocks  directly,  but  rather use the -c option of the e2fsck and mke2fs
       programs.

(*) My note: the default block size of mkfs and family is 4096 bytes. The default 1024 bytes badblocks block size shows how old it is.


On Saturday, 3 February 2018 17:19:29 UTC, Csaba Mogyorosi wrote:
I am planning to run:
#badblocks -wsv -o /mnt/PENDRM-5LYV/sdb_badblocks.log /dev/sdb

it will wipe my sdb drive and send the sdb_badblocks.log to the connected pendrive. When its done, in about a day. I could go to Disk/Wizard and select "One big filesystem per disk, for easy management (standard)" and ext4 to re-image the disks. sda looks ok I am not planning to run the badblocks on it.

Would you agree on the above steps?

I don't have to :-) but it looks OK 

Csaba Mogyorosi

unread,
Feb 3, 2018, 2:04:28 PM2/3/18
to Alt-F
executed this command:

#badblocks -wsv -f -s -b 512 -o /mnt/PENDRM-5LYV/sdb_badblocks.log /dev/sdb 2000000 1700000 

 found 9 bad blocks
1932000
1932024
1932025
1932026
1932027
1932028
1932029
1932030
1932031

since they are all on the sdb4
 
 Device Boot      Start         End      Blocks  Id System
/dev/sdb1              63     1060289      530113+ 82 Linux swap
/dev/sdb2         2088450  1174399694   586155622+ 83 Linux
/dev/sdb3      1174399695  1953520064   389560185  83 Linux
/dev/sdb4         1060290     2088449      514080  83 Linux

I am running this command now:
badblocks -wsv -f -s -b 512 -o /mnt/PENDRM-5LYV/sdb4_badblocks.log /dev/sdb4 

When it finishes, I will run the Smart long again.

Thanks for the confirmation :) it is always nice to have an expert nodding on a newbie's idea!!!


On Saturday, February 3, 2018 at 7:22:27 PM UTC+1, João Cardoso wrote:
The smart test shows that the drives are pretty old, 53509 hours, or about 6 years on 24/7 usage
9 Power_On_Hours          0x0032   027   027   000    Old_age   Always       -       53509
 
and 
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       1263786
 
is also very low, meaning that you probably have a very low spindown timeout, which made the disk heads to retract too often. That is of concern.

The
197 Current_Pending_Sector  0x0032   192   192   000    Old_age   Always       -       1464
is not that bad, but significative, and
 198 Offline_Uncorrectable   0x0030   200   199   000    Old_age   Offline      -       142
also.

The test stopped by a "read failure", on LBA (512 bytes sector) 1990397, near the disk end, so you might want to start badblock a little before that zone first(*), to see if the drive will remap those sectors. If that isn't possible, the better is to throw the disk away. Afterwards a new smart long test (which is *much* faster than badblocks).

Of course, it is possible to collect a list of all badblocks and pass it to mkfs (or run badblock directly from mkfs), but that would imply that no new badblocks would develop in the future.

(*) use "badblocks -other-options last-block  first-block" to do that, but I think that you should use '-b 512' in this case to have a LBA (smart) sector/block (badblocks) correspondence. Or use a bigger badblock block size, which is faster, and made the necessary LBA adjustments.


João Cardoso

unread,
Feb 3, 2018, 3:21:51 PM2/3/18
to Alt-F


On Saturday, 3 February 2018 19:04:28 UTC, Csaba Mogyorosi wrote:
executed this command:

#badblocks -wsv -f -s -b 512 -o /mnt/PENDRM-5LYV/sdb_badblocks.log /dev/sdb 2000000 1700000 

You get the list from the whole disk, sdb, and latter on you are using it with the sdb4 disk partition. I *believe* that it does not works that way, the list you get applies to the disk, not to the partition, i.e., "last-block  first-block" are relative and refers to the device specified in the command. For badblocks, a start of 0 using sdb4 is like using a start of 1060290 when applied to the disk (per your fdisk -lu command).

Csaba Mogyorosi

unread,
Feb 3, 2018, 3:28:32 PM2/3/18
to Alt-F
yes I will need to make a calculation, sector 1 in the badblock list is actually 1+1060290 in sdb. Bigger problem is that I have over 7MB logfile created, means almost all the blocks ara bad. 

Csaba Mogyorosi

unread,
Mar 5, 2018, 11:08:16 AM3/5/18
to Alt-F
Hi Joao,

I just wanted to thank you again for your help. I have finalized my work on the DNS323. I have finally replaced the wrong disk, I could not fix the bad blocks whatever I tried it could not be fixed. Now with the new disk all is setup and I am using the box as a backup for a QNAP nas with rsync. Alt-f is excellent, it has made my DNS to be useful and operational again.
Reply all
Reply to author
Forward
0 new messages