I'm sure its not the hardware, so how to diagnose why my phone reboots
seemingly randomly.
Nexus One running stock 2.3.4
My previous Nexus One suffered the same issue running 2.2.x. I sent
it in to find the replacement suffers the same problem.
Note: I am in Japan so wonder if something related to the radio could
be the fault. Still, my GDP2 running stock 1.6 works fine. I run the
same apps on these (GDP2 1.6 and Nexus One 2.3.4) so am confused by
what it might be.
Well ok either 2.x doesn't like Japan or my apps but how to figure out
which one. Stop using all apps? Shouldn't there be a log if an app
took the system down? no? wishful thinking?
What should I do to figure this out?
Shawn
PS yes it rebooted while writing this message. I wanted to say that
it seems perhaps more frequent when using the sdcard but the phone was
just there idle...
usually the cause for a reboot is an unhandled exception in the system
server. you monitor the logs from logcat to know what happened.
of course, getting it at the right moment is a sort of game of chance,
but its a start.
As Chris Stratton mentioned, there are two different kinds of event
that look similar. One is a hard [kernel- or radio-level] reboot of
the entire device; the other is a restart of only the Android runtime
while the kernel keeps running properly.
If you see the initial static splash image for some period of time,
followed by the Android boot animation, then it's a hard reboot. If
instead the crash takes the device immediately to the animated portion
of the sequence, it's a runtime restart.
If it's a runtime restart, you need to try to pull a bugreport from
the device as soon as possible -- during the boot animation, even;
certainly as soon as possible after you hit the home application
again. Even 30 seconds can make a difference here; there is a *lot*
of log activity during early boot, and what you want to do is make
sure you capture the content of the logs leading up to the runtime
restart.
Just capturing 'logcat' is not sufficient; you need a full bugreport.
There's a reason all that information is in the bugreport. :)
So, let's assume that it was a hard reboot: a kernel- or radio-level
crash and reset. In this case, the single most valuable section of
the bugreport is the "LAST KMSG". That's a copy of the dmesg output
from the previous boot session of the kernel -- the one that ended
with a kernel or similar crash. Look at the end of that section to
see whether there was some sort of kernel-level issue. Oftentimes
you'll find that there's a kernel panic being reported, or some such
thing. At that point you're off and running to diagnose what the
problem was, taking your usual approach to debugging kernel crashes.
There can also be a panic log in the bugreport, which may contain
other significant information.
It's possible that there was a radio-side problem. In that sort of
case, the end of the "LAST KMSG" kernel log will read something like
this:
[ 535.383270] ARM9 has CRASHED
[ 535.383666] smem: DIAG 'MOD SM-TM 00000 '
Obviously you will then worth with your radio code provider to address
the issue.
Now, if the device crash was just a runtime restart, not a
kernel-level crash, then the bugreport has different useful
information. Most importantly, the various logs (main, radio, event)
remain intact and active through a runtime restart, so the logcat
block, the event log, and the radio log all show continuity across
whatever it was that caused the Android runtime to crash. Learn how
to read the event log and the logcat output in particular.
Runtime restarts are almost always for one of two reasons: either a
component running in the system_server process has crashed outright,
or something has caused the system_server's process primary looper
thread to deadlock. The deadlock case is announced by a line in the
logcat buffer reading something very like this:
04-04 14:06:16.888 885 1089 W Watchdog: *** WATCHDOG KILLING SYSTEM
PROCESS: null
I'm going to talk about the NON-deadlock cases first; they're simpler
to diagnose: just look in the logcat trace for the messages about the
system process crashing. There will be a Dalvik stack trace there to
the point at which the fatal exception was thrown, and you then know
where to go looking for the bug. As always, the event log and the
primary text log contain different information; both are useful for
reviewing the state and activity of the device leading up to a crash.
If the crash *was* due to a deadlock, things get a little interesting.
When the watchdog declares a deadlock and forcibly kills the system
process, it first captures the current stack trace of every
system_server process thread into the usual ANR stack trace file
(/data/anr/traces.txt on most devices). This file is automatically
included in a bugreport -- yet another reason to pull a full bugreport
rather than just logcat.
Look at the timestamp of where the watchdog declared a problem. In
the example line I gave above, it was 04-04 14:06:16.888. That tells
you which set of stack traces you're interested in within the "LAST
ANR" section of the bugreport. Find the system_server stack dumps
that were captured at the time of the watchdog's message. For
example, in the bugreport that my example came from, the "LAST ANR"
section was headed:
------ VM TRACES AT LAST ANR (/data/anr/traces.txt: 2011-04-04 14:06:14) ------
and the file holds *two* sets of system_process stacks. One of them
starts like this:
----- pid 885 at 2011-04-04 14:05:42 -----
Cmd line: system_server
and the second starts like this:
----- pid 885 at 2011-04-04 14:06:12 -----
Cmd line: system_server
Note that the first was taken 30 seconds before the second, and the
second was just a few seconds before the watchdog message about
killing the system process to restart the system. The watchdog takes
a thread snapshot after 30 seconds of the looper being unresponsive,
then another snapshot after a full minute of unresponsiveness; and
it's after that full minute that it declares the deadlock and restarts
everything. You probably want to start with the later thread stacks,
the ones that reflect the state of affairs when the watchdog finally
gave up.
In most Android applications the primary looper thread is called
"main", but this is *not* true of the system_process. The primary
system_process looper thread is titled "android.server.ServerThread".
Find that thread's stack trace that was taken at the time the watchdog
declared the deadlock, and you'll be off and running as far as
diagnosing what has caused that looper thread to be unresponsive for >
60 seconds.
At this point you've got the information you need to figure out what
locks are being contended for, what unapproved long-running operations
might be mistakenly being run on the system process's main looper,
etc. Dalvik's stack dump output is very useful, especially since it
tells you things like which other thread currently holds a lock being
requested.
Good luck!
--
christopher tate
android framework engineer
> --
> You received this message because you are subscribed to the Google Groups "android-platform" group.
> To post to this group, send email to android-...@googlegroups.com.
> To unsubscribe from this group, send email to android-platfo...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/android-platform?hl=en.
>
>
JBQ
--
Jean-Baptiste M. "JBQ" Queru
Software Engineer, Android Open-Source Project, Google.
Questions sent directly to me that have no reason for being private
will likely get ignored or forwarded to a public forum with no further
warning.
OK below is what I found.
[ 1818.629425] ARM9 has CRASHED
[ 1818.630249] smem: DIAG '[WCDMA] 09:29:27 FATAL: IST0 (mdsptask.c:00476) (1)
[ 1818.630279] '
[ 1818.631134]
[ 1818.631866] Restarting Linux version
2.6.37.4-cyanogenmod-01332-g7f230e8 (shade@toxygene) (gcc version
4.4.3 (GCC) ) #1 PREEMPT Tue Apr 12 12:54:14 EDT 2011
[ 1818.631927]
Does this look like a radio problem?
Shawn
FYI, I've seen this on two different handset running stock ROMs as
well as this one now on cynogenmod.
Build: GRI40
Bootloader: 0.35.0017
Radio: 5.08.00.04
Network: SoftBank
Kernel: Linux version 2.6.37.4-cyanogenmod-01332-g7f230e8
(shade@toxygene) (gcc version 4.4.3 (GCC) ) #1 PREEMPT Tue Apr 12
12:54:14 EDT 2011
Command line: board_mahimahi.disable_uart3=0
board_mahimahi.usb_h2w_sw=0 board_mahimahi.disable_sdcard=0
diag.enabled=0 board_mahimahi.debug_uart=0 smisize=0
androidboot.baseband=5.08.00.04 androidboot.cid=GOOGL001
androidboot.carrier=GOOGLE androidboot.mid=PB9910000
androidboot.keycaps=qwerty androidboot.mode=normal
androidboot.serialno=MB047P800301 androidboot.bootloader=0.35.0017
no_console_suspend=1 wire.search_count=5
[ 2836.578948] ARM9 has CRASHED
[ 2836.579284] smem: DIAG '[WCDMA] 22:27:18 FATAL: IST0
(mcalwcdma_demod.c:02059) (1)
[ 2836.579284] '
[ 2836.579620]
[ 2836.579864] Restarting Linux version
2.6.37.4-cyanogenmod-01332-g7f230e8 (shade@toxygene) (gcc version
4.4.3 (GCC) ) #1 PREEMPT Tue Apr 12 12:54:14 EDT 2011
[ 2836.579864]
below is another
>> [ 1818.629425] ARM9 has CRASHED
>> [ 1818.630249] smem: DIAG '[WCDMA] 09:29:27 FATAL: IST0 (mdsptask.c:00476) (1)
>> [ 1818.630279] '
>> [ 1818.631134]
>> [ 1818.631866] Restarting Linux version
>> 2.6.37.4-cyanogenmod-01332-g7f230e8 (shade@toxygene) (gcc version
>> 4.4.3 (GCC) ) #1 PREEMPT Tue Apr 12 12:54:14 EDT 2011
>> [ 1818.631927]
> Build: GRI40
> Bootloader: 0.35.0017
> Radio: 5.08.00.04
> Network: SoftBank
> Kernel: Linux version 2.6.37.4-cyanogenmod-01332-g7f230e8
> (shade@toxygene) (gcc version 4.4.3 (GCC) ) #1 PREEMPT Tue Apr 12
> 12:54:14 EDT 2011
> Command line: board_mahimahi.disable_uart3=0
> board_mahimahi.usb_h2w_sw=0 board_mahimahi.disable_sdcard=0
> diag.enabled=0 board_mahimahi.debug_uart=0 smisize=0
> androidboot.baseband=5.08.00.04 androidboot.cid=GOOGL001
> androidboot.carrier=GOOGLE androidboot.mid=PB9910000
> androidboot.keycaps=qwerty androidboot.mode=normal
> androidboot.serialno=MB047P800301 androidboot.bootloader=0.35.0017
> no_console_suspend=1 wire.search_count=5
Note: This is definately not CynogenMod related as sending the phone
with the original ROM back to HTC and trying a new handset did NOT
solve anything.
Here is another bugreport of what keeps my nexus one crashing. Where
would I file a bug report on this?
[ 2836.578948] ARM9 has CRASHED
[ 2836.579284] smem: DIAG '[WCDMA] 22:27:18 FATAL: IST0
(mcalwcdma_demod.c:02059) (1)
[ 2836.579284] '
[ 2836.579620]
I think the question is, what would cause this to spontaneously start
occurring? I've been on 2.3.4 for weeks now, with ample free space on
my Nexus One and no new apps or anything like that... what makes the
radio code just start freaking out? Meanwhile, my friend one office
over with the same exactly kernel hasn't had any issues...
> Which radio image version do you have ?
Radio: 5.08.00.04
> Can you try flashing this one - http://forum.cyanogenmod.com/topic/2547-new-radio-5080004/
I also tried 5.12.00.8 - same problem.
Shawn