My daemon app keeps terminating and i'm trying to figure out why and
where. I have a good idea tho. I'm using a push - pop FIFO system for
log messages. I have a syslog message after the push and after the pop
function and see both appearingh all the time and hence i think that
both, push() and pop() works fine. However, the messages get popped
off in a thread from where i send the data off. I'm using following
code:
pthread_mutex_lock(&log_mtx);
Qstr=pop(&msglist, &locLen);
syslog(LOG_ERR, "DUBUG: popped succesfully!");
strncpy(res,Qstr,LOG_BUF_SIZE);
res[LOG_BUF_SIZE]='\0';
strcat(res,"\n");
pthread_mutex_unlock(&log_mtx);
if (log_sock>-1) {
sndlen=strlen(res);
reclen=send(log_sock, res, sndlen, 0);
if (reclen != sndlen) {
if (reclen==-1){
syslog(LOG_ERR, "nlog: send failed...,errno: %s!", strerror
( errno ));
}
syslog(LOG_ERR, "nlog: \"%s\" not sent, strlen(%d) - sent %d",res,
sndlen, reclen);
}
else {
syslog(LOG_ERR, "DEBUG: sent okay, receive ack");
The last message I see in syslog before my app crashes is "DUBUG:
popped succesfully!"
Now the declaration of those variables are as follows:
char res[LOG_BUF_SIZE+2]={0};
char *Qstr;
int locLen;
int sndlen;
while msglist is from type char** and QLen from type int and declared
globally.
LOG_BUF_SIZE is a define and set to 512.
Does anyone have a clue where my app is terminating and or should i be
looking somewhere else all together? The system worked fine before i
implemented this new FIFO system...
Thanks for hints and suggestions!
Ron
Based on the information you've given, I'd suspect that pop() was
returning an invalid pointer or NULL...? Maybe you could log the value
of Qstr just to be sure. (The crash could absolutely be elsewhere,
though.)
Can you get a core dump? If you, you might be able to bring it up in a
debugger and have it show you exactly the line it's crashing on:
http://beej.us/guide/bggdb/#coredump
-Beej
I'd check the return value of pop. Is it a reasonable pointer? NULL?
If that doesn't solve it, I'd try tools, like valgrind. Or running in
a debugger, or enabling cores.
-David
Yep as Beej had suggested, I included a check for NULL:
if (Qstr!=NULL)
syslog(LOG_ERR, "DUBUG: popped succesfully!");
else {
syslog(LOG_ERR, "DUBUG: popped not succesful!");
break;
}
>
> If that doesn't solve it, I'd try tools, like valgrind. Or running in
> a debugger, or enabling cores.
will try to use valgrind - don't know if it will run on my embedded
platform tho...
Debugger unfortunately will not be an option.... enabling cores - what
do you mean?
Thanks!
Ron
I just compiled valgrind on our dev system but no way i'll be able to
copy it onto our target system as the /opt/valgrind-3.2.3 directory is
81MB big... :(... ah, i'm getting desperate.... what can i do?
Other hints, recommodations and suggestions are highly appreciated!
Thanks!
Ron
The threading stuff will be rough, and I can't help much with it.
The first thing I'd do, honestly, would be to run the app in a debugger.
If you can get gdb or the equivalent on your target, then you should be able
to find out where it's crashing and in which thread.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
int push(char ***list, char *str, int curlen){
char **temp;
int i=0;
temp = realloc((*list),(curlen+1)*sizeof(*list));
if (temp==NULL){
syslog(LOG_ERR, "push(): Error reallocating memory for msglist");
for (i=curlen;i>=0;i--) {
free((*list)[i]);
QLen--;
}
free(list);
return -1;
}
(*list)=temp;
(*list)[curlen]=malloc (strlen(str)+1);
strcpy((*list)[curlen],str);
return ++curlen;
}
//-------------------------------------------------------
char *pop(char ***list, int *size) {
if(!*size) return 0;
char *first = (*list)[0];
memmove(*list, *list + 1, (*size) * sizeof(**list));
(*size) --;
(*list) = realloc(*list, (*size) * sizeof(**list));
return first;
}
//-------------------------------------------------------
Then I'm calling the push() like this:
pthread_mutex_lock(&log_mtx);
strcpy(string,Header);
strcat(string,buf);
syslog(level, buf);
QLen=push(&msglist, string, QLen);
//WriteToFile(string);
syslog(LOG_ERR, "DUBUG: pushed succesfully!");
pthread_mutex_unlock(&log_mtx);
and the pop() call is pasted above.
Thanks x1000 for all the ideas and hints!
Disable watchdogs for debugging. :)
> int push(char ***list, char *str, int curlen){
> char **temp;
> int i=0;
>
> temp = realloc((*list),(curlen+1)*sizeof(*list));
Although it almost certainly doesn't matter, I'm pretty sure you want
sizeof(**list) here for consistency.
I would also say that this function is crying out to be cleaned
up, in a couple of very significant ways. You should not have the
length be a separate argument; "list" should be of a type such that
you can always do push(list, value); without having to know.
> //-------------------------------------------------------
> char *pop(char ***list, int *size) {
> if(!*size) return 0;
>
> char *first = (*list)[0];
>
> memmove(*list, *list + 1, (*size) * sizeof(**list));
> (*size) --;
> (*list) = realloc(*list, (*size) * sizeof(**list));
>
> return first;
> }
Same issues here -- size tracked elsewhere.
Suggestion (totally untested, etc.):
struct list {
struct list *next;
char *s;
};
void push(struct list **l, char *s) {
struct list *new;
if (!l)
return;
new = malloc(sizeof(*new));
new->next = *l;
new->s = s;
*l = new;
}
char *pop(struct list **l) {
struct list *old;
char *s;
if (!l)
return NULL;
old = *l;
if (old) {
*l = old->next;
s = old->s;
free(old);
return s;
} else {
return NULL;
}
}
Usage:
struct list *l = NULL;
push(&l, "foo");
push(&l, "bar");
pop(&l); /* => "bar" */
pop(&l); /* => "foo" */
pop(&l); /* => NULL */
Your current model is HUGELY vulnerable to possible clashes involving
the current length getting out of sync somehow, but you never need it if
you use an actual list.
If you do want to use an array, think about something like this:
struct list {
char **data;
size_t curlen;
size_t allocated;
};
void push(struct list *l, char *s) {
if (l->curlen >= l->allocated) {
char **new = malloc(allocated * 2 * sizeof(*data));
if (new) {
memcpy(new, data, curlen * sizeof(*data));
free(l->data);
l->data = new;
l->allocated *= 2;
} else {
assert(!"I wrote an error handler");
}
}
l->data[l->curlen++] = s;
}
char *pop(struct list *l) {
if (l->curlen > 0) {
return l->data[--l->curlen];
}
}
(And yeah, this code is probably full of errors.)
Did that "fix" it?
> Here it says that it would say smthng like "Segmentation fault (core
> dumped)" but i don't even get ANY back log on the screen cause the app
> is running in daemon mode...
This is better asked on comp.unix.programmer, but you can use
setrlimit() to unlimit the writing of core files. If it's a daemon, it
should be cwd /, but it might not be able to write the file there
depending on which user its running as... but I've never tried any of
this with a daemon so YMMV.
-Beej
> Hi There,
>
> My daemon app keeps terminating and i'm trying to figure out why and
> where. I have a good idea tho. I'm using a push - pop FIFO system for
> log messages. I have a syslog message after the push and after the pop
> function and see both appearingh all the time and hence i think that
> both, push() and pop() works fine. However, the messages get popped off
> in a thread from where i send the data off. I'm using following code:
>
> pthread_mutex_lock(&log_mtx);
> Qstr=pop(&msglist, &locLen);
> syslog(LOG_ERR, "DUBUG: popped succesfully!");
> strncpy(res,Qstr,LOG_BUF_SIZE);
> res[LOG_BUF_SIZE]='\0';
This looks wrong. I think you mean:
res[LOG_BUF_SIZE-1] = 0;
But you would also mean that your strncpy should copy one character less.
> strcat(res,"\n");
This looks clumsy to me.
It also implies that your strncpy should copy yet one less character.
> pthread_mutex_unlock(&log_mtx);
> if (log_sock>-1) {
You could have tested this earlier and bailed out ASAP.
> sndlen=strlen(res);
This is ugly. IMHO.
HTH,
AvK
I assumed you were on a linux flavor. Enabling cores means usually
setting ulimit to allow cores to dump on fatal signals, probably not
relevant to your embedded system. Umm, add more syslogs, repeat as
needed, to narrow down your problem seems like a good bet. You should
be able to figure out exactly what bad pointer causes the segfault,
and trace the problem that way.
You could crash on push if the malloc failed, which I assume is
possible on your limited memory embedded system. You could be messed
up on pop if somehow a shrinking realloc fails.
-David
Test your code on a hosted platform before running it on the embedded
target. Assuming your target is Linux, this shouldn't be too hard.
--
Ian Collins
> On Tue, 05 Jan 2010 11:40:30 -0800, cerr wrote:
>
>> My daemon app keeps terminating and i'm trying to figure out why and
>> where. I have a good idea tho. I'm using a push - pop FIFO system for
>> log messages. I have a syslog message after the push and after the pop
>> function and see both appearingh all the time and hence i think that
>> both, push() and pop() works fine. However, the messages get popped off
>> in a thread from where i send the data off. I'm using following code:
>>
>> pthread_mutex_lock(&log_mtx);
>> Qstr=pop(&msglist, &locLen);
>> syslog(LOG_ERR, "DUBUG: popped succesfully!");
>> strncpy(res,Qstr,LOG_BUF_SIZE);
>> res[LOG_BUF_SIZE]='\0';
>
> This looks wrong. I think you mean:
> res[LOG_BUF_SIZE-1] = 0;
Yes, it *looks* wrong, but...
> But you would also mean that your strncpy should copy one character less.
>
>> strcat(res,"\n");
>
> This looks clumsy to me.
> It also implies that your strncpy should copy yet one less
> character.
res is declared to be BUF_SIZE+2 characters long in code now snipped.
The real problem for anyone reading this is the BUF_SIZE is
miss-named. It is not the size of the buffer.
<snip>
--
Ben.
This is the real answer for the OP. Even if you solve this problem
without gdb, you'll run into another unsolvable problem sooner or
later. Disabling the watchdog timer for debugging is mandatory.
> Moi <ro...@invalid.address.org> writes:
>
>> On Tue, 05 Jan 2010 11:40:30 -0800, cerr wrote:
>>
>
>>> off in a thread from where i send the data off. I'm using following
>>> code:
>>>
>>> pthread_mutex_lock(&log_mtx);
>>> Qstr=pop(&msglist, &locLen);
>>> syslog(LOG_ERR, "DUBUG: popped succesfully!");
>>> strncpy(res,Qstr,LOG_BUF_SIZE);
>>> res[LOG_BUF_SIZE]='\0';
>>
>> This looks wrong. I think you mean:
>> res[LOG_BUF_SIZE-1] = 0;
>
> Yes, it *looks* wrong, but...
>
>
>
> res is declared to be BUF_SIZE+2 characters long in code now snipped.
> The real problem for anyone reading this is the BUF_SIZE is miss-named.
> It is not the size of the buffer.
Yes, I could not agree more.
What the OP also does wrong is putting the declaration of res[]
_after_ its use.
This is disastrous for lazy people like me, who bail out on the first
spotted error, or the first page with (assumed) errors on it.
Also, I think the usage of the mutex-locks is rather strange.
I would expect the locks to be _inside_ the push() pop() functions.
The locks also do not preserves the order of the messages, if that would
be important to the OP.
AvK
Exactly but I don't know yet how i can do this, need to figure that
out first, it seems as if there's several processes that depend on
each other and if one fails, the system reboots... :o
why would this be LOG_BUF_SIZE-1? I defined the variable like
char res[LOG_BUF_SIZE+2]={0};
and hence there's space for a null character at the end...
>
> > strcat(res,"\n");
>
> This looks clumsy to me.
> It also implies that your strncpy should copy yet one less character.
yes, you're right here, it should be:
strncpy(res,Qstr,LOG_BUF_SIZE-1);
then if it didn't copy the last null character to close the string i
add it one element before the last:
res[LOG_BUF_SIZE-1]='\0';
because i need to have enough room for a new line:
strcat(res,"\n");
would this do it you think?
>
> > pthread_mutex_unlock(&log_mtx);
> > if (log_sock>-1) {
>
> You could have tested this earlier and bailed out ASAP.
check if the siocket is alright...yeah i can move this further up.
>
> > sndlen=strlen(res);
>
> This is ugly. IMHO.
That may be true and i do appreciate the help i'm getting to clean-up
this code.
Thanks a lot!
Ron
Hm, why would you put them inside? I pout them outside so i can hold
the pop() and push() functions stand-alone, so they can be used in
multi threaded environments but if it's single threaded you don't
require mutexes, eh?
Also why would the order of messages not be preserved? i don't
understand...:o
Ron
The watchdog probably has to be initialized at some point during boot
up. Just find that line and comment it out. Anyway, I hate to sound
like a nanny about this, but being able to disable the watchdog timer
really is the only sensible way to develop.
This actually makes a whole lot of sense yes!
Great idea and it's so much simpler! :)
Let me try ti implement this!
Ron
Exactly and on top of that i wrote a little KeepAlive application that
is providing the hardware heartbeat so the power supply doesn't
disconnect the power...sweet! :)
Okay,
Trying to use remote debugging with gdb, compiled the binary with the -
ggdb3 flag but whatsoever i do not get any clear information on what's
going on. The output i'm seeing is:
(gdb) target remote 192.168.101.55:1100
Remote debugging using 192.168.101.55:1100
0xb7e29aec in ?? ()
(gdb) continue
Continuing.
Program received signal SIGSTOP, Stopped (signal).
0xb7e29aec in ?? ()
(gdb) contiue
Undefined command: "contiue". Try "help".
(gdb) continue
Continuing.
Program received signal SIGPIPE, Broken pipe.
0xb7e29aec in ?? ()
(gdb)
Any clues? :o The SIGPIPE signal actually terminated my binary... :o
> On Jan 6, 4:09 am, Moi <r...@invalid.address.org> wrote:
>> On Tue, 05 Jan 2010 11:40:30 -0800, cerr wrote:
>> > Qstr=pop(&msglist, &locLen);
>> > syslog(LOG_ERR, "DUBUG: popped succesfully!");
>> > strncpy(res,Qstr,LOG_BUF_SIZE);
>> > res[LOG_BUF_SIZE]='\0';
>>
>> This looks wrong. I think you mean:
>> res[LOG_BUF_SIZE-1] = 0;
>>
>> But you would also mean that your strncpy should copy one character
>> less.
>
> why would this be LOG_BUF_SIZE-1? I defined the variable like char
> res[LOG_BUF_SIZE+2]={0};
Which was discussed upthread; it is very uncommon.
> and hence there's space for a null character at the end...
>>
>> > strcat(res,"\n");
>>
>> This looks clumsy to me.
>> It also implies that your strncpy should copy yet one less character.
> yes, you're right here, it should be: strncpy(res,Qstr,LOG_BUF_SIZE-1);
> then if it didn't copy the last null character to close the string i add
> it one element before the last:
> res[LOG_BUF_SIZE-1]='\0';
> because i need to have enough room for a new line: strcat(res,"\n");
>
>> > sndlen=strlen(res);
>>
>> This is ugly. IMHO.
> That may be true and i do appreciate the help i'm getting to clean-up
> this code.
IMO most people would write something like:
**************/
char res[SOME_SIZE];
int siz, done;
char *data;
data = pop(...);
len = strlen(data); /* or maybe pop() could return the length (maybe by reference)
** , since it already knows it.
*/
if (len >= sizeof res) len = sizeof res -1;
memcpy(res, data, len);
res[len++] = '\n';
done = send( fd, res, len, ...);
if (done != len) ... etc ...
/*************
NOTE: the above code relies at only one place on
the data being nul-terminated. It does not *have*
to be nul-terminated, since send() and write() et.al have
a length argument.
HTH,
AvK
I'm not sure, but my guess is that you didn't specify a symbol file.
After the target 'remote command', use 'load' and 'symbol-file'.
Something along the lines of:
load foo.out.elf
symbol-file foo.out.elf
...substitute your actual filename, of course.
> Any clues? :o The SIGPIPE signal actually terminated my binary... :o
This is about 99% likely to not be a C problem. SIGPIPE is what you
get when you try to write data to a pipe that's been closed. What this
usually means is that some OTHER program aborted.
Could try ignoring it (sigignore) . SIGPIPE may be benign, or may
indicate a problem. Only you/your app knows. e.g. in
cat foo | head -20
cat will get a SIGPIPE as I recall, nothing bad happening.
-David
But when I ignore the SIGPIPE in gdb i get the message and then it
tells me that app exited because of SIGPIPE. And what would "some
other" program may be? My app doesn't depend on anything really...
Hi Squeamizh,
Ok, I used "load /my/binary/on/the/local/fs" after "target remote" and
the app on my target actually terminated if i try "symbol-file /my/
binary/on/the/local/fs" only - i don't get any change on the report
when it terminates on SIGPIPE. :( How do I get the app still going
after the load?
Thanks,
Ron
How do you assume that? :o
I would like to dump cores, I set "ulimit -a" but I never see a core
dumped anywhere, what file would i be looking for?
> Umm, add more syslogs, repeat as
> needed, to narrow down your problem seems like a good bet. You should
> be able to figure out exactly what bad pointer causes the segfault,
> and trace the problem that way.
It doesn't seem to be a segfault but a SIGPIPE that causes my program
to crash, gdb says:
Program received signal SIGPIPE, Broken pipe.
Program terminated with signal SIGPIPE, Broken pipe.
The program no longer exists.
(gdb)
>
> You could crash on push if the malloc failed, which I assume is
> possible on your limited memory embedded system. You could be messed
> up on pop if somehow a shrinking realloc fails.
# free
shows me plenty of free memory...
ulimit -c unlimited
may do what you want. Cores often dump where the executable is, or in
some configurable directory.
>
> > Umm, add more syslogs, repeat as
> > needed, to narrow down your problem seems like a good bet. You should
> > be able to figure out exactly what bad pointer causes the segfault,
> > and trace the problem that way.
>
> It doesn't seem to be a segfault but a SIGPIPE that causes my program
> to crash, gdb says:
>
> Program received signal SIGPIPE, Broken pipe.
>
> Program terminated with signal SIGPIPE, Broken pipe.
> The program no longer exists.
> (gdb)
>
>
>
> > You could crash on push if the malloc failed, which I assume is
> > possible on your limited memory embedded system. You could be messed
> > up on pop if somehow a shrinking realloc fails.
>
> # free
>
> shows me plenty of free memory...
Dunno what to suggest, other than add syslogs until you feel you
understand
-David
A debugger's not an option? What is this, 1980? Have you thought about
disabling the watchdogs?
Yes, well I've gotten overthis, disabling the watchdogs alone wouldn't
help, I needed to write a little KeepAlive application to provide a
heartbeat to the power supply so this would not cut the power when the
heartbeat isn't provided by my "main app" (cause it's stopped in a
debugger) - so we're good if you add 30 years, it's 2010 ;)
Standard C doesn't have sockets, select(), or send().
Aside from whether discussion of non-standard featuers is topical here
or not, you're really likely to get better help in a different
newsgroup, likely comp.unix.programmer.
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Okay, thanks for redirecting me. I'll try my luck there...
--
Ron
Most likely, you want to ignore SIGPIPE, which will cause send to return
-1 in that case. But comp.unix.programmer is definitely a better forum.
--
Larry Jones
Mom must've put my cape in the wrong drawer. -- Calvin
> > > > > > � � � � sndlen=strlen(res);
> > > > > > � � � � reclen=send(log_sock, res, sndlen, 0);
> > > > > > � � � � if (reclen != sndlen) {
> > > > > > � � � � � if (reclen==-1){
> > > > > > � � � � � � syslog(LOG_ERR, "nlog: send failed...,errno: %s!", strerror
> > > > > > ( errno ));
> > > > > > � � � � � }
> > > > > > � � � � � syslog(LOG_ERR, "nlog: \"%s\" not sent, strlen(%d) - sent %d",res,
> > > > > > sndlen, reclen);
> > > > > > � � � � }
> > > > > > � � � � else {
> > > > > > � � � � � syslog(LOG_ERR, "DEBUG: sent okay, receive ack");
> It doesn't seem to be a segfault but a SIGPIPE that causes my program
> to crash, gdb says:
>
> Program received signal SIGPIPE, Broken pipe.
>
> Program terminated with signal SIGPIPE, Broken pipe.
> The program no longer exists.
> (gdb)
On at least some socket stacks, send() to a TCP socket that's become
disconnected gives SIGPIPE (not just write() to an actual pipe).
Assuming you don't actually need SIGPIPE anywhere else in your
program, signal(SIGPIPE,SIG_IGN) at startup, and then see what errno
you get on the send() attempt; it's likely to be informative.