wviewd randomly crashing

81 views
Skip to first unread message

Peter Dohrmann

unread,
Jan 4, 2016, 11:59:30 PM1/4/16
to wview
I've been running wview flawlessly for several years.  My current version is 5.21.7 with an Oregon WMR200 using Debian 7.  Without any changes to my config that I can think of, in the past couple of months wviewd randomly hangs every day or two (no particular pattern) and is followed by radmrouted and htmlgend falling over.  I don't know why.  Here is an extract from wview.log:

Jan  5 11:50:05 weather htmlgend[3417]: <1451955005430> : Adding 5 minute sample for 2016-01-05 11:50...
Jan  5 11:50:10 weather htmlgend[3417]: <1451955010341> : Generated: 91 ms: 37 images, 17 template files
Jan  5 11:51:06 weather wviewd[3412]: <1451955066305> : wmr: sending RESET to console.
Jan  5 11:51:06 weather wviewd[3412]: <1451955066305> : wmr: Sending reset to console...
Jan  5 11:51:06 weather wviewd[3412]: <1451955066408> : wmr: opening HID device...
Jan  5 11:51:07 weather wviewd[3412]: <1451955067467> : packet checksum error
Jan  5 11:51:07 weather wviewd[3412]: <1451955067467> : DBG: Dumping 0x8068aac, 16 bytes:
Jan  5 11:51:07 weather wviewd[3412]: <1451955067467> : wviewd: recv sig 11: shutting down!
Jan  5 11:51:10 weather radmrouted[3400]: <1451955070250> : radQueueSend: write failed on fd 7: Broken pipe
Jan  5 11:51:10 weather radmrouted[3400]: <1451955070250> : SendToClient: wviewd: radProcessQueueSend failed!
Jan  5 11:51:10 weather radmrouted[3400]: <1451955070251> : QueueMsgHandler: wviewd: SendToClient failed!
Jan  5 11:51:10 weather radmrouted[3400]: <1451955070251> : radQueueSend: write failed on fd 7: Broken pipe
Jan  5 11:51:10 weather radmrouted[3400]: <1451955070251> : SendToClient: wviewd: radProcessQueueSend failed!
Jan  5 11:51:10 weather radmrouted[3400]: <1451955070251> : QueueMsgHandler: wviewd: SendToClient failed!
Jan  5 11:51:30 weather radmrouted[3400]: <1451955090396> : radQueueSend: write failed on fd 7: Broken pipe
Jan  5 11:51:30 weather radmrouted[3400]: <1451955090396> : SendToClient: wviewd: radProcessQueueSend failed!
Jan  5 11:51:30 weather radmrouted[3400]: <1451955090396> : QueueMsgHandler: wviewd: SendToClient failed!
Jan  5 11:52:10 weather htmlgend[3417]: <1451955130250> : wviewd NOT responding!

Any ideas?  I've disabled various cron jobs to see if they were colliding but that made no difference.
Any help appreciated.

Peter

Graham Eddy

unread,
Jan 5, 2016, 12:48:50 AM1/5/16
to wview Google Group
hmm, there appear to be two things happening here, not sure if they are related
  1. “wmr: sending RESET to console” after thread has been running okay for a while means there has been no data received from station for too long and this is a final attempt to ‘bump’ it
  2. "DBG: Dumping 0x8068aac, 16 bytes:” (with no data actually dumped) followed by “wviewd: recv sig 11: shutting down!” strongly implies a wviewd (or radlib) bug - either bad pointer to buffer or incorrect length of buffer - which is what actually crashes wviewd
you could turn on all possible debugging/tracing of the usb dataflow to see whether the station is at fault, the usb cable or socket at fault or the usb driver  at fault - if all those eliminated, log a defect report on wviewd

alternatively, if you’re compiling it yourself, insert some tracing code into wmrusbprotocol.c around the “wmr: sending RESET to console” statement to track the buffer pointer and its length, and the durations between receiving data packets from station
___________
Graham Eddy

--
You received this message because you are subscribed to the Google Groups "wview" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wview+un...@googlegroups.com.
To post to this group, send email to wv...@googlegroups.com.
Visit this group at https://groups.google.com/group/wview.
For more options, visit https://groups.google.com/d/optout.

Peter Dohrmann

unread,
Jan 5, 2016, 1:39:30 AM1/5/16
to wview
Thanks Graham, much appreciated.  I'm an intermediate-level user and would appreciate your tips on how I might best turn on all possible debugging/tracing of the usb dataflow to see where the fault lies.  My system is about 6 years old.  Not sure if the WMR200 is prone to eventually fail or not.  If I can follow your advice it will be instructive.  BTW, I didn't compile my installation.




Graham Eddy

unread,
Jan 5, 2016, 2:04:51 AM1/5/16
to wview Google Group
haven’t done it myself, and i’m on macs nowadays rather than linux.
however, start by looking at https://wiki.ubuntu.com/Kernel/Debugging/USB and note that one of the first things you would be looking for is a lack of data coming back from the station
___________
Graham Eddy

On 5 Jan 2016, at 5:39 PM, Peter Dohrmann <peter.d...@gmail.com> wrote:

Thanks Graham, much appreciated.  I'm an intermediate-level user and would appreciate your tips on how I might best turn on all possible debugging/tracing of the usb dataflow to see where the fault lies.  My system is about 6 years old.  Not sure if the WMR200 is prone to eventually fail or not.  If I can follow your advice it will be instructive.  BTW, I didn't compile my installation.





Peter D.

unread,
Jan 18, 2016, 8:26:48 PM1/18/16
to wview
Just a follow up with a good outcome.  The problems all disappeared after turning the power off to the WMR200 base station, removing the batteries for 30 mins, then restoring power and doing the initial setup of the unit.  Perfect since then for more than 5 days.  Previously the wview was stopping randomly every 6-12 hours (for USB reasons outlined by Graham in this post) but the problem wasn't wview.

I hope this might help someone else avoid weeks of futility.

PD
Reply all
Reply to author
Forward
0 new messages