Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

b2g 100% cpu usage

56 views
Skip to first unread message

pancake

unread,
May 24, 2013, 5:37:36 AM5/24/13
to dev-b2g
Yesterday I built b2g BRANCH=master for Keon, but the touch events was
mostly failing and the feeling was laggy.

By checking adb shell top i noticed that b2g was taking 100% cpu all the
time. So I strace it... and got this infinite loop.

The thing is that 'recv' on fd 1077949332 (wtf) is returning 0?? is this
an strace bug?

also see this:

read(35, 0xbeb9f680, 32) = -1 EAGAIN (Try again)

fd35 is a pipe (according to /proc/fd/35). So I guess it's a broken one,
but which one?

Also, I tried to run b2g by hand and got this error:

[TabChild] SHOW (w,h)= (0, 0)
loading app://communications.gaiamobile.org/ftu/index.html, 1
###################################### forms.js loaded
############################### browserElementPanning.js loaded
######################## BrowserElementChildPreload.js loaded
[Child 836] ###!!! ABORT: constructor for actor failed: file
/home/pancake/prg/B2G/objdir-gecko/ipc/ipdl/PLayerTransactionChild.cpp,
line 228


(works fine if i do 'start b2g' instead of '/system/b2g/b2g')

any idea?


-----8<-----[ strace log ]-------8<--------

recv(1077949332, 0x81, 2147483647,
MSG_DONTROUTE|MSG_PEEK|MSG_EOR|MSG_TRUNC|MSG_ERRQUEUE|MSG_DONTWAIT|MSG_CONFIRM|MSG_NOSIGNAL|0xfff20000)
= 0
gettimeofday({1369385114, 795038}, NULL) = 0
gettimeofday({1369385114, 795343}, NULL) = 0
gettimeofday({1369385114, 795526}, NULL) = 0
gettimeofday({1369385114, 795648}, NULL) = 0
msgget(0x1, 0xbeb9f7d0, 0, 0x1) = 0
msgget(0x1, 0xbeb9f7e0, 0, 0x1) = 0
gettimeofday({1369385114, 795953}, NULL) = 0
gettimeofday({1369385114, 796076}, NULL) = 0
gettimeofday({1369385114, 796533}, NULL) = 0
gettimeofday({1369385114, 797052}, NULL) = 0
gettimeofday({1369385114, 797174}, NULL) = 0
gettimeofday({1369385114, 797510}, NULL) = 0
gettimeofday({1369385114, 797632}, NULL) = 0
gettimeofday({1369385114, 797907}, NULL) = 0
gettimeofday({1369385114, 798090}, NULL) = 0
gettimeofday({1369385114, 798212}, NULL) = 0
recv(1077949332, 0x81, 2147483647,
MSG_DONTROUTE|MSG_EOR|MSG_TRUNC|MSG_ERRQUEUE|MSG_DONTWAIT|MSG_CONFIRM|MSG_NOSIGNAL|0xfff20000)
= 0
gettimeofday({1369385114, 798914}, NULL) = 0
gettimeofday({1369385114, 799219}, NULL) = 0
gettimeofday({1369385114, 799402}, NULL) = 0
gettimeofday({1369385114, 799524}, NULL) = 0
msgget(0x1, 0xbeb9f7d0, 0, 0x1) = 0
msgget(0x1, 0xbeb9f7e0, 0, 0x1) = 0
gettimeofday({1369385114, 799829}, NULL) = 0
gettimeofday({1369385114, 799982}, NULL) = 0
gettimeofday({1369385114, 800409}, NULL) = 0
gettimeofday({1369385114, 800989}, NULL) = 0
gettimeofday({1369385114, 801141}, NULL) = 0
gettimeofday({1369385114, 801508}, NULL) = 0
gettimeofday({1369385114, 801630}, NULL) = 0
gettimeofday({1369385114, 801874}, NULL) = 0
gettimeofday({1369385114, 802087}, NULL) = 0
gettimeofday({1369385114, 802210}, NULL) = 0
recv(1077949332, 0x81, 2147483647,
MSG_PEEK|MSG_EOR|MSG_TRUNC|MSG_ERRQUEUE|MSG_DONTWAIT|MSG_CONFIRM|MSG_NOSIGNAL|0xfff20000)
= 0
gettimeofday({1369385114, 802911}, NULL) = 0
gettimeofday({1369385114, 803217}, NULL) = 0
gettimeofday({1369385114, 803430}, NULL) = 0
gettimeofday({1369385114, 803552}, NULL) = 0
msgget(0x1, 0xbeb9f7d0, 0, 0x1) = 0
msgget(0x1, 0xbeb9f7e0, 0, 0x1) = 0
gettimeofday({1369385114, 803857}, NULL) = 0
gettimeofday({1369385114, 803980}, NULL) = 0
gettimeofday({1369385114, 804468}, NULL) = 0
gettimeofday({1369385114, 805017}, NULL) = 0
gettimeofday({1369385114, 805170}, NULL) = 0
gettimeofday({1369385114, 805505}, NULL) = 0
gettimeofday({1369385114, 805628}, NULL) = 0
gettimeofday({1369385114, 805902}, NULL) = 0
gettimeofday({1369385114, 806085}, NULL) = 0
gettimeofday({1369385114, 806207}, NULL) = 0
recv(1077949332, 0x81, 2147483647,
MSG_EOR|MSG_TRUNC|MSG_ERRQUEUE|MSG_DONTWAIT|MSG_CONFIRM|MSG_NOSIGNAL|0xfff20000)
= 0
gettimeofday({1369385114, 806909}, NULL) = 0
gettimeofday({1369385114, 807214}, NULL) = 0
gettimeofday({1369385114, 807428}, NULL) = 0
gettimeofday({1369385114, 807550}, NULL) = 0
msgget(0x1, 0xbeb9f7d0, 0, 0x1) = 0
semget(0x22, 0xbeb9f6b0, 0x10, 0) = 1
read(35, "w", 32) = 1
read(35, 0xbeb9f680, 32) = -1 EAGAIN (Try again)
msgget(0x1, 0xbeb9f7e0, 0, 0x1) = 0
gettimeofday({1369385114, 808130}, NULL) = 0
gettimeofday({1369385114, 808283}, NULL) = 0
gettimeofday({1369385114, 808710}, NULL) = 0
gettimeofday({1369385114, 809229}, NULL) = 0
gettimeofday({1369385114, 809381}, NULL) = 0
gettimeofday({1369385114, 809717}, NULL) = 0
gettimeofday({1369385114, 809839}, NULL) = 0
gettimeofday({1369385114, 810114}, NULL) = 0
gettimeofday({1369385114, 810297}, NULL) = 0
gettimeofday({1369385114, 810419}, NULL) = 0
recv(1077949332, 0x81, 2147483647,
MSG_DONTROUTE|MSG_PEEK|MSG_CTRUNC|MSG_EOR|MSG_ERRQUEUE|MSG_DONTWAIT|MSG_CONFIRM|MSG_PROBE|MSG_NOSIGNAL|0xfff20000)
= 0
write(36, "w", 1) = 1

pancake

unread,
May 24, 2013, 5:57:24 AM5/24/13
to dev-b2g
This is the firmware i built today (still reproducing the same problem)

https://www.dropbox.com/s/hctwhgec9958e8m/gp-keon-b2g-20130524-bug100cpu.tar.gz
> _______________________________________________
> dev-b2g mailing list
> dev...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-b2g
>

Kan-Ru Chen (陳侃如)

unread,
May 24, 2013, 7:29:27 AM5/24/13
to mozilla...@lists.mozilla.org
pancake <pan...@youterm.com> writes:

> Also, I tried to run b2g by hand and got this error:
>
> [TabChild] SHOW (w,h)= (0, 0)
> loading app://communications.gaiamobile.org/ftu/index.html, 1
> ###################################### forms.js loaded
> ############################### browserElementPanning.js loaded
> ######################## BrowserElementChildPreload.js loaded
> [Child 836] ###!!! ABORT: constructor for actor failed: file
> /home/pancake/prg/B2G/objdir-gecko/ipc/ipdl/PLayerTransactionChild.cpp,
> line 228
>
> (works fine if i do 'start b2g' instead of '/system/b2g/b2g')
>
> any idea?

I don't know why the recv went crazy but one cannot run /system/b2g/b2g
directly because some environment variables have to be set. Instead
of running /system/b2g/b2g, you could run '/system/bin/b2g.sh' or just 'b2g.sh'

Kanru

Justin Lebar

unread,
May 24, 2013, 8:59:42 AM5/24/13
to pancake, dev-b2g
This should be very easy to debug.

Just attach gdb to the relevant process (./run-gdb.sh attach PID --
you can find the pid by running adb shell b2g-ps), then break on
|read|, then you should quite quickly find out who's calling read in a
loop.

The real question will be why we get an infinite loop of EAGAIN's...

On Fri, May 24, 2013 at 5:57 AM, pancake <pan...@youterm.com> wrote:
> This is the firmware i built today (still reproducing the same problem)
>
> https://www.dropbox.com/s/hctwhgec9958e8m/gp-keon-b2g-20130524-bug100cpu.tar.gz
>
>
> On 05/24/13 11:37, pancake wrote:
>>
>> Yesterday I built b2g BRANCH=master for Keon, but the touch events was
>> mostly failing and the feeling was laggy.
>>
>> By checking adb shell top i noticed that b2g was taking 100% cpu all the
>> time. So I strace it... and got this infinite loop.
>>
>> The thing is that 'recv' on fd 1077949332 (wtf) is returning 0?? is this
>> an strace bug?
>>
>> also see this:
>>
>> read(35, 0xbeb9f680, 32) = -1 EAGAIN (Try again)
>>
>> fd35 is a pipe (according to /proc/fd/35). So I guess it's a broken one,
>> but which one?
>>
>> Also, I tried to run b2g by hand and got this error:
>>
>> [TabChild] SHOW (w,h)= (0, 0)
>> loading app://communications.gaiamobile.org/ftu/index.html, 1
>> ###################################### forms.js loaded
>> ############################### browserElementPanning.js loaded
>> ######################## BrowserElementChildPreload.js loaded
>> [Child 836] ###!!! ABORT: constructor for actor failed: file
>> /home/pancake/prg/B2G/objdir-gecko/ipc/ipdl/PLayerTransactionChild.cpp,
>> line 228
>>
>>
>> (works fine if i do 'start b2g' instead of '/system/b2g/b2g')
>>
>> any idea?
>>
>>

pancake

unread,
May 24, 2013, 10:01:27 AM5/24/13
to dev...@lists.mozilla.org
On 05/24/13 14:59, Justin Lebar wrote:
> This should be very easy to debug.
>
> Just attach gdb to the relevant process (./run-gdb.sh attach PID --
> you can find the pid by running adb shell b2g-ps), then break on
> |read|, then you should quite quickly find out who's calling read in a
> loop.
>
> The real question will be why we get an infinite loop of EAGAIN's...

b2g has 36 threads running.. after setting a breakpoint on "read" i got
the thread.

this is the backtrace:

(gdb) bt
#0 read () at bionic/libc/arch-arm/syscalls/read.S:8
#1 0x41379de6 in pipeHandler (fd=35, data=<value optimized out>)
at /home/pancake/prg/B2G/gecko/widget/gonk/nsAppShell.cpp:101
#2 0x41379cbe in FdHandler::run (this=0x4391bca0, mayWait=<value
optimized out>)
at /home/pancake/prg/B2G/gecko/widget/gonk/nsAppShell.h:50
#3 nsAppShell::ProcessNextNativeEvent (this=0x4391bca0, mayWait=<value
optimized out>)
at /home/pancake/prg/B2G/gecko/widget/gonk/nsAppShell.cpp:733
#4 0x41397aae in nsBaseAppShell::DoProcessNextNativeEvent (this=0x23,
mayWait=128, recursionDepth=32)
at /home/pancake/prg/B2G/gecko/widget/xpwidgets/nsBaseAppShell.cpp:139
#5 0x41397b74 in nsBaseAppShell::OnProcessNextEvent (this=0x4391bca0,
thr=0x40309940, mayWait=false, recursionDepth=0)
at /home/pancake/prg/B2G/gecko/widget/xpwidgets/nsBaseAppShell.cpp:286
#6 0x4162ea52 in nsThread::ProcessNextEvent (this=0x40309940,
mayWait=false, result=0xbe932857)
at /home/pancake/prg/B2G/gecko/xpcom/threads/nsThread.cpp:600
#7 0x4160ed9e in NS_ProcessNextEvent (thread=0x4391bca0, mayWait=false)
at /home/pancake/prg/B2G/objdir-gecko/xpcom/build/nsThreadUtils.cpp:238
#8 0x413e83a8 in mozilla::ipc::MessagePump::Run (this=0x40302460,
aDelegate=0x4032e0c0)
at /home/pancake/prg/B2G/gecko/ipc/glue/MessagePump.cpp:82
#9 0x4165cdcc in MessageLoop::RunInternal (this=0x0)
at
/home/pancake/prg/B2G/gecko/ipc/chromium/src/base/message_loop.cc:219
#10 0x4165ce82 in MessageLoop::RunHandler (this=0x4032e0c0)
at
/home/pancake/prg/B2G/gecko/ipc/chromium/src/base/message_loop.cc:212
#11 MessageLoop::Run (this=0x4032e0c0) at
/home/pancake/prg/B2G/gecko/ipc/chromium/src/base/message_loop.cc:186
#12 0x41397568 in nsBaseAppShell::Run (this=0x4391bca0)
at /home/pancake/prg/B2G/gecko/widget/xpwidgets/nsBaseAppShell.cpp:163
#13 0x412f775c in nsAppStartup::Run (this=0x43a4b790)
at
/home/pancake/prg/B2G/gecko/toolkit/components/startup/nsAppStartup.cpp:269
#14 0x40c81de4 in XREMain::XRE_mainRun (this=0xbe932a14) at
/home/pancake/prg/B2G/gecko/toolkit/xre/nsAppRunner.cpp:3872
#15 0x40c845e6 in XREMain::XRE_main (this=0xbe932a14, argc=<value
optimized out>, argv=<value optimized out>,
aAppData=0x1f178) at
/home/pancake/prg/B2G/gecko/toolkit/xre/nsAppRunner.cpp:3939
#16 0x40c84748 in XRE_main (argc=1, argv=0xbe934bf4, aAppData=0x1f178,
aFlags=<value optimized out>)
at /home/pancake/prg/B2G/gecko/toolkit/xre/nsAppRunner.cpp:4140
#17 0x0000997e in do_main (argc=1, argv=0xbe934bf4) at
/home/pancake/prg/B2G/gecko/b2g/app/nsBrowserApp.cpp:168
#18 main (argc=1, argv=0xbe934bf4) at
/home/pancake/prg/B2G/gecko/b2g/app/nsBrowserApp.cpp:261


If I check in /proc i can see that the pair of this pipe is still alive:

root@android:/proc/991/fd # ls -l | grep 7045
lr-x------ root root 2013-05-24 15:47 35 -> pipe:[7045]
l-wx------ root root 2013-05-24 15:47 36 -> pipe:[7045]


Going up in the backtrace i found this:

(gdb) list
96 pipeHandler(int fd, FdHandler *data)
97 {
98 ssize_t len;
99 do {
100 char tmp[32];
101 len = read(fd, tmp, sizeof(tmp));
102 } while (len > 0);
103 }
104
105 struct Touch {

I have set a breakpoint on |write| with a condition of 'fd=36', but i
was unable to trigger it.. so i dont know what is this thread doing..

On Peak b2g takes 50% (because it's dual core) and runs a little better,
but some touch events are misstaken because of this intensive cpu use..

What else can I do? If you want to join #b2g we can discuss this better
in the irc.

Thanks

--pancake

pancake

unread,
May 24, 2013, 10:54:45 AM5/24/13
to dev...@lists.mozilla.org
It must be in one of those commits:

http://lolcathost.org/b/spotthebug.txt

i will try to git bisect... can anyone else reproduce?

Dave Hylands

unread,
May 24, 2013, 11:52:14 AM5/24/13
to pancake, dev...@lists.mozilla.org
Hi pancake

----- Original Message -----
> From: "pancake" <pan...@youterm.com>
> To: dev...@lists.mozilla.org
> Sent: Friday, May 24, 2013 7:01:27 AM
> Subject: Re: [b2g] b2g 100% cpu usage
>
> On 05/24/13 14:59, Justin Lebar wrote:
> > This should be very easy to debug.
> >
> > Just attach gdb to the relevant process (./run-gdb.sh attach PID --
> > you can find the pid by running adb shell b2g-ps), then break on
> > |read|, then you should quite quickly find out who's calling read
> > |in a
> > loop.
> >
> > The real question will be why we get an infinite loop of
> > EAGAIN's...
>
> b2g has 36 threads running.. after setting a breakpoint on "read" i
> got
> the thread.

Well, you got a thread that calls read, not necessarily "the" thread which is consuming 100% of the CPU.

I'd recommend that you do:

top -t -m 20

to see which thread is causing the CPU usage.

Dave Hylands

Thinker K.F. Li

unread,
May 25, 2013, 6:57:45 AM5/25/13
to dev...@lists.mozilla.org
pipeHandler() is for signaling of dispatching input events or other
events. If you can break on nsAppShell::NotifyNativeEvent() and paste
the backtrace here. It may be helpful.


pancake <pan...@youterm.com> writes:

> On 05/24/13 14:59, Justin Lebar wrote:
>> This should be very easy to debug.
>>
>> Just attach gdb to the relevant process (./run-gdb.sh attach PID --
>> you can find the pid by running adb shell b2g-ps), then break on
>> |read|, then you should quite quickly find out who's calling read in a
>> loop.
>>
>> The real question will be why we get an infinite loop of EAGAIN's...
>
> b2g has 36 threads running.. after setting a breakpoint on "read" i
> got the thread.
>
> this is the backtrace:
>
> (gdb) bt
> #0 read () at bionic/libc/arch-arm/syscalls/read.S:8
> #1 0x41379de6 in pipeHandler (fd=35, data=<value optimized out>)
> at /home/pancake/prg/B2G/gecko/widget/gonk/nsAppShell.cpp:101
> #2 0x41379cbe in FdHandler::run (this=0x4391bca0, mayWait=<value
> optimized out>)
> at /home/pancake/prg/B2G/gecko/widget/gonk/nsAppShell.h:50
......... skip ...............

--
Sinker
--
天教懶漫帶疏狂

Dave Hylands

unread,
May 25, 2013, 10:37:41 AM5/25/13
to Thinker K.F. Li, dev...@lists.mozilla.org, Michael Wu
I don't see how that code should be consuming 100% of the CPU. On any error, len should be coming back as -1 which will break out of the loop. It will, however toss data though.

If the loop executes more than once, the data read from the everything except the last iteration is thrown away.

Dave Hylands

----- Original Message -----
> From: "Thinker K.F. Li" <thi...@codemud.net>
> To: dev...@lists.mozilla.org
> Sent: Saturday, May 25, 2013 3:57:45 AM
> Subject: Re: [b2g] b2g 100% cpu usage
>
0 new messages