Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Analysing intermitent more CPU for a process.

54 views
Skip to first unread message

kunal...@gmail.com

unread,
Mar 18, 2013, 4:15:10 PM3/18/13
to
HI,

If this group is not the right place can someone pls guide me to correct group?

Currently I'm trying to investigate an issue of randomly more CPU(not high) consumption of random application thread, running on 2.6.21.7 windriver 2.0 64 bit.
Timing the thread for CPU, took strace which looks like(mid-snippet):
=====================
126sendto(65, "\t\0\0\0\0\0\0\0\305\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377"..., 164, 0, {sa_family=AF_INET, sin_port=htons(7701), sin_addr=inet_addr("172.18.31.1")}, 16) = 164
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAKE, 1) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
=========================

And with more option:
Process 7622 attached - interrupt to quit
Process 7622 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.045077 67 676 82 futex
0.00 0.000000 0 5 write
0.00 0.000000 0 9 stat
0.00 0.000000 0 183 writev
0.00 0.000000 0 243 sendto
0.00 0.000000 0 140 msgsnd
0.00 0.000000 0 427 times
0.00 0.000000 0 183 gettid
------ ----------- ----------- --------- --------- ----------------
100.00 0.045077 1866 82 total


Application does use pthread rw locks/malloc etc.

I've verified leap second bug, seems to be present, but cant doubt fully as other processes on same card don't show comparative CPU.

Is there a way to back trace futex to specific api? OR attaching any profile runtime, excluding perf(not present)/lstrace(got to try this)?

Thanks,
~Kunal

Jorgen Grahn

unread,
Mar 19, 2013, 5:02:29 AM3/19/13
to
On Mon, 2013-03-18, kunal...@gmail.com wrote:
...
> 126sendto(65, "\t\0\0\0\0\0\0\0\305\n\0\0\0\0\0\0\0\0\0\0\0\0
\0\0\377"..., 164, 0, {sa_family=AF_INET, sin_port=htons(7701),
sin_addr=inet_addr("172.18.31.1")}, 16) = 164
> futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
> futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
...

> Is there a way to back trace futex to specific api?

I usually try to combine what strace(1) tells me with what ltrace(1)
tells me. tcpdump(1) can also be very helpful, if your application
mostly does network I/O.

The other key things for me are
- being able to test the code, i.e. feed it with realistic and
unrealistic inputs
- understanding the code by simply reading it

> OR attaching any
> profile runtime, excluding perf(not present)/lstrace(got to try this)?

I have not tried any such tools, but perhaps I should have.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Kunal Ekawde

unread,
Mar 19, 2013, 5:32:53 AM3/19/13
to
Thanks for reply Jorgen,

On Tuesday, March 19, 2013 2:32:29 PM UTC+5:30, Jorgen Grahn wrote:
> On Mon, 2013-03-18, kunal wrote:
>
> ...
>
> > 126sendto(65, "\t\0\0\0\0\0\0\0\305\n\0\0\0\0\0\0\0\0\0\0\0\0
>
> \0\0\377"..., 164, 0, {sa_family=AF_INET, sin_port=htons(7701),
>
> sin_addr=inet_addr("172.18.31.1")}, 16) = 164
>
> > futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = 0
>
> > futex(0x2b22b8000020, FUTEX_WAIT, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
>
> ...
>
>
>
> > Is there a way to back trace futex to specific api?
>
>
>
> I usually try to combine what strace(1) tells me with what ltrace(1)
>
> tells me. tcpdump(1) can also be very helpful, if your application
>
> mostly does network I/O.

Ok, i shall try ltrace.

>
>
>
> The other key things for me are
>
> - being able to test the code, i.e. feed it with realistic and
>
> unrealistic inputs
>
> - understanding the code by simply reading it
>
>
>
> > OR attaching any
>
> > profile runtime, excluding perf(not present)/lstrace(got to try this)?
>
>
>
> I have not tried any such tools, but perhaps I should have.
Sorry, typo error, I meant lsstack.
>
>
>
> /Jorgen
>
>
>
> --
>
> // Jorgen Grahn <grahn@ Oo o. . .
>
> \X/ snipabacken.se> O o .

The problem here is that the process is live in network and not able reproduce this scenario in lab and most annoying is that it comes up after 2-3 days after restart so i can't run with oprofile(needs process restart) for such long time.

Jorgen Grahn

unread,
Mar 19, 2013, 2:58:45 PM3/19/13
to
On Tue, 2013-03-19, Kunal Ekawde wrote:
> On Tuesday, March 19, 2013 2:32:29 PM UTC+5:30, Jorgen Grahn wrote:
...
>> I usually try to combine what strace(1) tells me with what ltrace(1)
>> tells me. tcpdump(1) can also be very helpful, if your application
>> mostly does network I/O.
>
> Ok, i shall try ltrace.

Note that ltrace seems a bit more dangerous than strace (and strace
isn't always safe either!) Even a well-behaved process or thread may
die if you attach ltrace to it.

>> > profile runtime, excluding perf(not present)/lstrace(got to try this)?
>> I have not tried any such tools, but perhaps I should have.
> Sorry, typo error, I meant lsstack.

Ok, I know 'lsstack' as 'pstack'. I use it sometimes.

> The problem here is that the process is live in network and not able
> reproduce this scenario in lab and most annoying is that it comes up
> after 2-3 days after restart so i can't run with oprofile(needs
> process restart) for such long time.

I see. I have been in that situation, too. Well, it is detective
work ... and the tools are just tools. Your mind and your experience
has to do the real work. Good luck!
0 new messages