I have been bitten by that a couple of times, some years ago.
the dns-daemon is the easier to debug.
-If it is running, when you issue a command such as "dns320l-daemon -x GetTemperature" from the command line, the program you invoked will try to contact the running one in the background. If you get no answer, it means that the daemon is blocked for some reason.
-if you kill the daemon, 'init' will try to relaunch it, and you will see that in syslog; if after killing it you watch in syslog that init is trying to relaunch it in succession, and you get a message similar to "socket in use", it means that the daemon did not remove /var/run/dns320l.socket at exit, but you can "rm /var/run/dns320l.socket" for yourself.
-if you send a SIGUSR1 signal to the daemon, it will enter debug mode, syslog will have a lot a cryptic messages. A SIGUSR1 will make it return to info mode.
So, a healthy daemon means that you can use the cmd line for sending commands to it and you get answers, the /var/run/dns320l.socket exists, and it reacts to SIGUSR1/2
The daemon communicates via a serial line to the MCU (micro controller) , a small chip that actually reads buttons, temperature, set leds, fan speed, etc. The daemon regularly reads from the MCU the sensor temperature and writes its values to /tmp/sys/temp1_input; also regularly it reads /tmp/sys/pwm1 and sets the fan accordingly.
sysctrl will regularly read the temperature reading /tmp/sys/temp1_input and sets the fan speed writing to /tmp/sys/pwm1. Before reading and writing to those files a lock is obtained, otherwise wrong/parcial values would result.
sysctrl only has LOG_INFO and LOG_ERROR, and both go to syslog; *changes* in temp or fan speed are logged to fantemp.log, as you know. sysctrl it is not relaunched by init if killed.
You can test if it running OK by pressing e.g. the USB button and watching is syslog the event; also, if you write to the pwm1 file, its previous value will reappear after a few (tens?) of seconds: 'cat /tmp/sys/pwm1; echo 127 > /tmp/sys/pwm1; cat /tmp/sys/pwm1) the values are 0->fan off, 127->medium speed 255->high speed). If dns-daemon is running OK it will react to those changes (you can kill sysctrl to preserve your changes and ear the fan speed changing) -- of course, when you write to the file you did not get a lock first, so mixed results might happen (the lock is obtained be creating the directory /var/lock/temp-lock if it does not exists (an atomic operation) , 'mkdir /var/lock/temp-lock' will return 0 if the lock is obtained, 0 otherwise , 'rmdir /var/lock/temp-lock' removes the lock -- the dns-daemon uses the same mechanism)
So you now have the big picture.
The problem is that the next time it happens you only have one chance to debug it, the next opportunity will happen only in a couple of weeks... As I told above, I had a couple of similar issues in the past, and was not able to diagnose the *real* cause. Fixing the issue, yes, its easy -- or by killing the daemon, or removing the socket file, or killing and restarting sysctr. The cause for the lock (I believe it's a dead-lock or a race condition) remains unknown to me.
Let me know if you get more highlights on this,
Thanks
João
Thanks.