radmrouted failure

10 views
Skip to first unread message

Ken Hornstein

unread,
Jun 25, 2009, 6:35:47 PM6/25/09
to wv...@googlegroups.com
Okay, wview was working solid for a while ... but now I'm getting this failure
about once per day. At some point I get this log message:

Jun 22 09:33:00 modred radmrouted[398]: <1245677580596> : radBufferRls: trying t
o release already free buffer or corrupt header

And then things go to crap, and nothing works until I restart everything.
I haven't yet dived into the code to figure out the problem, but does
anyone have an idea where I should start?

--Ken

kanewolf

unread,
Jun 25, 2009, 7:53:21 PM6/25/09
to wview
What hardware platform? Is it really possible that there is a memory
problem -- bad DIMM????

Ken Hornstein

unread,
Jun 25, 2009, 8:19:08 PM6/25/09
to wv...@googlegroups.com
>What hardware platform? Is it really possible that there is a memory
>problem -- bad DIMM????

It's a TS-7200 from Technologic Systems (http://www.embeddedarm.com,
I have no interest in them other than being a customer). It doesn't
_feel_ like a memory problem ... in my experience those crop up as
random problems, and I have certainly run a lot of stuff on this
box. I don't have any issues with it crashing or anything. Nevertheless,
it's not like I can try other memory; it's all soldered on. And this exact
same problem has happened 3 times now. I could be wrong, of course, but
I personally don't like saying "memory problem" until I've eliminated
everything else. I was just hoping to not to have to dive into the
radlib code just yet.

--Ken

Matt Soffen

unread,
Jun 25, 2009, 11:20:23 PM6/25/09
to wv...@googlegroups.com
One other question to ask is, When does it happen ?

Have you stopped and restarted radlib/wview  multiple times ?

That's when I've seen the problem happen.

Matt

Mark S. Teel

unread,
Jun 26, 2009, 12:23:51 AM6/26/09
to wv...@googlegroups.com
In fact, memory problems are typically not that random. They occur in
the same location(s). Further, although you might like to believe that
every time the operating system and applications load that it is a
random disbursement through memory with no two "runs" landing in the
same memory locations, in reality it is quite the contrary.

You also apparently assume you are the third or fourth person to ever
use radlib. Otherwise, your "feel" would direct you to questions
concerning what is different for your platform than the many other
systems who run happily with radlib for years (yes years). To assume
first it is a radlib bug does not give me much confidence in your
debugging intuition.

Debugging is a process of reduction - that is reducing the number of
possibilities by ruling them out. It may be a radlib bug, but is that
the most likely cause given its maturity?

I have seen this type of problem reported 2 or 3 times - ever - and they
were DRAM problems. I'm not saying it is, but I'm highly skeptical it
is radlib. Application misuse of radlib is more likely, although also
remote. Otherwise, we would have more reports than just you.

Mark

Ken Hornstein

unread,
Jun 28, 2009, 10:19:39 PM6/28/09
to wv...@googlegroups.com
>In fact, memory problems are typically not that random. They occur in
>the same location(s). Further, although you might like to believe that
>every time the operating system and applications load that it is a
>random disbursement through memory with no two "runs" landing in the
>same memory locations, in reality it is quite the contrary.

Respectfully ... yes, I know that. But some of the backstory is
that I've used this box for plenty of other things, and many of
them were memory-intensive. If there was a memory error, I would
have expected it to show up before now. Yes, sure, a memory error
could have developed since I did that work (but it wasn't that long
ago). But in my experience, memory errors are really rare. It
just seems like that's the wrong place to start looking. Others
may differ in that opinion, of course.

>You also apparently assume you are the third or fourth person to ever
>use radlib. Otherwise, your "feel" would direct you to questions
>concerning what is different for your platform than the many other
>systems who run happily with radlib for years (yes years). To assume
>first it is a radlib bug does not give me much confidence in your
>debugging intuition.

Again, respectfully ... I would refer you back to my original note.
In it I said a) I'm getting this error, b) I haven't yet started
to debug it, c) does anyone have any idea where to start? In a
followup note, yes, I did say that I was hoping to not have to "dive
into the radlib code just yet". But I did not ever say anywhere, "Hey,
I think this is a bug in radlib". Sure, I did acknowledge that I was
going to have to dive into the radlib code, but by that I meant I
was going to have to look at the code, understand what that error
message _meant_, and what could be the possible cause of that error.
But I am not going in with any preconceptions as to the cause of this
error, and I don't believe that I implied that that a bug in radlib
was the cause.

>Debugging is a process of reduction - that is reducing the number of
>possibilities by ruling them out. It may be a radlib bug, but is that
>the most likely cause given its maturity?

Respectfully ... I have no idea of the history of radlib; I had
never heard of it before I installed wview. I mean that as no
slight against radlib; there are plenty of thing I haven't heard
of. I only mention that to indicate that I had no idea how mature
of a package it was. And as for ruling things out ... well, given
the situation with this hardware, short of buying another box I
have no easy way of testing the memory, since it's soldered on.
And I suspect that I'm probably one of the few people using radlib
on a NetBSD/arm system (certainly on this particular hardware). So
I'm sort of on the fringe here, and I know that. The obvious (to
me) debugging solution in this case would be to start with running
a debugger on radmrouted and see where that takes me. I am not
adverse to the problem being with NetBSD (I did build a debugging
version of the C library for this system to track down the alignment
problem I reported in an earlier email). If there is something
else you would suggest on how to debug this problem, I would gladly
give it a try.

--Ken

Matt Soffen

unread,
Jun 28, 2009, 10:47:34 PM6/28/09
to wv...@googlegroups.com
Perhaps one thing you could do to determine "where" it is dying, is to use "/usr/bin/gcore" on the process before it dies (i.e. Add it to the signal handling logic).  Then at least you could figure where the heck it is hanging up.

Matt

Ken Hornstein

unread,
Jun 28, 2009, 11:26:35 PM6/28/09
to wv...@googlegroups.com
>Perhaps one thing you could do to determine "where" it is dying, is to use
>"/usr/bin/gcore" on the process before it dies (i.e. Add it to the signal
>handling logic). Then at least you could figure where the heck it is
>hanging up.

Hrm ... maybe I'm missing something, but will that actually help
in this particular case? I think the "interesting" stuff happens
the first time that radBufferRls() fails; I'm not sure it's easily
possible for me to have gcore run when that happens, especially since no
signal is posted. Perhaps the simplest thing to do is have it radlib
call abort() for that error and examine the core file after that happens.

One additional data point: before I wrote a startup script for
NetBSD, I wasn't starting up wvpmond (I was just starting up the
daemons by hand), and things ran without any problems (probably at
the most a week or two between reboots). It was only after I wrote
the startup script and started using wvpmond that I ran into this
problem. Starting Friday I decided to disable wvpmond, and so far
I haven't yet had radmrouted fail. I'll give that another week to
see if that's affects this problem or not, then I'll go from there.

--Ken

Mark S. Teel

unread,
Jun 28, 2009, 11:32:43 PM6/28/09
to wv...@googlegroups.com
That makes a bit more sense. Given that wvpmond is in the business of
stopping/restarting non-responsive processes, there may be something a
bit different for netBSD.
Reply all
Reply to author
Forward
0 new messages