DNS-323 not responding after a few hours

296 views
Skip to first unread message

Christian Perreault

unread,
Mar 11, 2015, 5:43:34 PM3/11/15
to al...@googlegroups.com
Hi! I installed Alt-F 0.1RC4 for the first time on my DNS-323 a week and a half ago. Each time, after a few hours running, it stops responding (no web admin page, no SSH, no network share).

I have to manually reboot it by holding the power button 3 seconds. I tried uninstalling all non native package (Alt-F packages: Transmissions and MiniDLNA, and FFP), but still, the problem is there. The power saving is greyed out (I believe always disabled). I tried changing the spindown time from 20 minutes to 0 (believed unlimited) but the problem is still there.

I looks like the box is going in some kind of hibernating mode, because when it stops responding there is no sound and some orange lights (left and right) blink.

Does anyone has any clue to solve this? What logs to look out, etc.

Thanks a lot!!

João Cardoso

unread,
Mar 12, 2015, 11:53:33 AM3/12/15
to al...@googlegroups.com


On Wednesday, March 11, 2015 at 9:43:34 PM UTC, Christian Perreault wrote:
Hi! I installed Alt-F 0.1RC4

RC4.1 is most recent.
 
for the first time on my DNS-323 a week and a half ago. Each time, after a few hours running, it stops responding (no web admin page, no SSH, no network share).

I have to manually reboot it by holding the power button 3 seconds.

Then Alt-F is running...
 
I tried uninstalling all non native package (Alt-F packages: Transmissions and MiniDLNA, and FFP), but still, the problem is there. The power saving is greyed out (I believe always disabled).

That is related with disks power saving, not the box itself.
 
I tried changing the spindown time from 20 minutes to 0 (believed unlimited) but the problem is still there.

Not related...
 

I looks like the box is going in some kind of hibernating mode, because when it stops responding there is no sound and some orange lights (left and right) blink.

That is explained in the "About Leds and Buttons" wiki: when disks spin down for lack of activity they enter into standby mode (the 20 minutes spindow you refered above), and the orange leds shortly blinks every three seconds, and the box power led is turned off.

Does anyone has any clue to solve this? What logs to look out, etc.

You say that you can't access the box through the network (but the power button works), so the problem has to be network related. Can you 'ping' the box? Are you using a static or a dynamic (DHCP) IP?
Please attach a "System Configuration" log (System->Utilities->Logs)

ssh/telnet the box and don't logout, so you can issue commands and see what is going on. You can, e.g., issue and let running the command 'watch ifconfig eth0' (CTRL-C to quit), to see if the box IP changes, or the ssh/telnet session closes itself.

 

Thanks a lot!!

Andrew Pywell

unread,
Mar 18, 2015, 10:37:29 PM3/18/15
to al...@googlegroups.com
When recently setting up my DNS-323 (B1) with Alt-F (RC4.1), I also experienced on two occasions a connectivity issue. In my case, on both occasions it was whilst I was copying (restoring) my data back to the device as well as a sync process that was occurring following initialization of the disks (I used the wizard to create new partition, RAID 1 etc from fresh). Since the file copy and sync have completed the issue hasn't occurred.

In my case the front panel button did not work and I had to resort to power removal and fsck (ext2 so took about 45 minutes which I'm good with). When the issue occurred, there was no response to samba, http web interface, SSH etc. It did respond to ping suggesting some part of the kernel is still operational. Never once had any similar failure on vendor 1.09. Can't really add much more other than to recount my experience.

João Cardoso

unread,
Mar 19, 2015, 12:15:55 PM3/19/15
to al...@googlegroups.com, andrew...@gmail.com


On Thursday, March 19, 2015 at 2:37:29 AM UTC, Andrew Pywell wrote:
When recently setting up my DNS-323 (B1) with Alt-F (RC4.1), I also experienced on two occasions a connectivity issue. In my case, on both occasions it was whilst I was copying (restoring) my data back to the device as well as a sync process that was occurring following initialization of the disks (I used the wizard to create new partition, RAID 1 etc from fresh). Since the file copy and sync have completed the issue hasn't occurred.

In my case the front panel button did not work and I had to resort to power removal and fsck (ext2 so took about 45 minutes which I'm good with). When the issue occurred, there was no response to samba, http web interface, SSH etc.


So all processes have died (no samba, ssh, webUI...) , but the kernel was still running (ping).
That could be caused because some process required too much memory and the kernel killed all process to reclaim memory. This is what is named "OOM killer". It should obviously not happens frequently.

I experienced that when trying to stress test the Firmware Updater on a DNS-323 without any swap active and using deliberately a wrong and huge  45MB fw file for a DNS-320L (trying to reproduce bug 369). Without swap,  /tmp only has 32MB available, so in order to fulfill the 45MB request the kernel started killing samba, then dropbear, then httpd, then inetd, then sysctrl... then everything else. Eventually it gave up, but no process was left running. I observed all this behaviour through a serial console.

You must be warned that rsync can consume a lot of memory, as it creates an internal file list (100bytes per file) even when no file is going to be transmitted. On my desktop computer home folder that means 352MB just for the file list!!!

Even if a rsync don't cause a OOM, because swap is active, paging (swapping) starts to occur, turning the whole system very slow and unusable. I can't obviously use rsync on my desktop home folder on a DNS-323, I have to use a DNS-325 (and I don't rsync the whole home folder)

That is the reason why sometimes there is a need to split a rsync into several rsyncs, where each one is limited on the number of files. There are some scripts dedicated to this purpose.
I tried using the following script to reduce rsync memory needs. It starts a rsync process using at most NFILES, waits for it to complete, then starts another rsync, etc.

#!/bin/sh

# cp -a did 48MB/s
# rsync did 15MB/s

#set -x

# create NFILES lines files list to rsync

if test $# != 2; then
 echo rsync
-split.sh source_dir dest_dir
 
exit 1
fi

SDIR
=$1/
DDIR
=${2%%/}

echo $SDIR $DDIR

OPTS
="-ahx --info=progress2" # --no-dirs

NFILES
=100000 # 10MB memory + 10MB in memory file flist, total 20MB per rsync process
#NFILES=10000 # 1MB memory

cd $SDIR
i
=0
rm
-f /tmp/foo

find
. | while read -r t; do
 echo
"$t" >> /tmp/foo
 i
=$((i+1))
 
if test $i = $NFILES; then
  wait
  i
=0
  mv
/tmp/foo /tmp/bar
  rsync
--files-from=/tmp/bar $OPTS $SDIR $DDIR &
 
fi
done

if test -s /tmp/foo; then
 wait
 rsync
--files-from=/tmp/foo $OPTS $SDIR $DDIR
fi

rm
-f /tmp/foo /tmp/bar


It did respond to ping suggesting some part of the kernel is still operational. Never once had any similar failure on vendor 1.09. Can't really add much more other than to recount my experience.

Another memory hog is fsck. With 2/3/4/6 TB disks and millions of files on them, fsck needs a lot of memory!
 
 

Andrew Pywell

unread,
Mar 19, 2015, 6:55:21 PM3/19/15
to al...@googlegroups.com
Actually I was referring to the sync process that occurs when a RAID 1 is created during the disk setup wizard. And in my case the data restore was via samba and robocopy (for which I hit a security/attributes issue and posted cause/resolution in the robocopy thread). Over the course of approx. 18 hours to restore my data after switch from vendor to Alt-f firmware, I had two hangs and none since. I kind of figured it could be hitting the wall resource wise at the time so didn't post.

Christian Perreault

unread,
Mar 23, 2015, 12:05:06 PM3/23/15
to al...@googlegroups.com, andrew...@gmail.com
Hi!
Thanks a lot for your help! I realized this was a "code 18" error. I noticed there were always many telnetd processes opening, consuming lot of CPU. I found it suspect and I realized there was an old setting on my router that made my NAS in the DMZ. Thus, there was robot telnet connections attempts from the Web. Some of them were successful so these malicious programs were crashing by box at some point. I disabled this DMZ setting and changed the admin password. My bad. Now I look forward to use ALT-F! Thanks again!

Christian

João Cardoso

unread,
Mar 23, 2015, 12:51:39 PM3/23/15
to al...@googlegroups.com, andrew...@gmail.com
Thanks for reporting back.
I hope that your experience will be useful for others. It should be written in caps: MY ROUTER THAT MADE MY NAS WAS IN THE DMZ.

I had a similar experience many, many years ago. For professional reasons I had my home computer sshd also in the DMZ, and when arriving home at night I noticed that the computer disk led blinked repeatedly every two or three seconds in sync with the router network led, even if I was not using the computer.
At first I didn't worry, but after a few days I found that suspicious and search the computer logs. Yes, a script kiddie was attempting to access my home computer using simple and trivial logins/passwords combinations. Fortunately I didn't have admin/admin as login/password, so there was no harm done. But since then I'm very careful with passwords, DMZ, automatic router port forwarding, upnp...

"Automagic" comes to a price! If you don't do it yourself, you don't know that it is active!

Paulo Elifaz Andrielli

unread,
Mar 23, 2015, 1:02:08 PM3/23/15
to al...@googlegroups.com
Just my 2 cents here......

When I was struggling to make my network, a friend of mine said: "Man, just add into the DMZ, and don´t worry!".

I found a way to solve my problem, and this friend (using his DMZ stuff) got BOTH 2TB DRIVES FORMATTED, when he was at work. He simply arrived home, and the second drive was being scratched by the guy who "hacked" his system.....

[]´s
Paulo

--
You received this message because you are subscribed to the Google Groups "Alt-F" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alt-f+un...@googlegroups.com.
Visit this group at http://groups.google.com/group/alt-f.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages