Debian 8.1 / kernel 4.1.x test releases are unstable

566 views
Skip to first unread message

Graham

unread,
Jul 12, 2015, 10:17:22 AM7/12/15
to beagl...@googlegroups.com
I have several Rev C BBB units, that have been in use enough to be considered "trusted" hardware.

All of them, running Debian 8.1 kernel 3.14 are rock solid.  By that, I mean that they will run for months without problems.
Maybe longer, I have not left them undisturbed for any longer than that.

I have tried to run the Debian 8.1 / kernel 4.1.x test releases, and they all autonomously reboot several times per day.
No regular or reproducible pattern, nothing in the syslog, other than the reboot process itself.

This includes the 2015-07-05 kernel 4.1.1 release. (bone-debian-8.1-lxqt-4gb-armhf-2015-07-05-4gb.img)

All I do is put it on a 16 GB card, expand the available memory to 16 GB from 4 GB, and turn off the 4 blinky lights. No other changes. The unit will reboot at random, several times per day.

This has been common to all the kerenl 4.1 releases.  Although a random pattern to the reboots, it is easy enough to reproduce.  If you turn off the blinky lights, it is easy to recognize, since they come back on when it reboots.  Or examine the syslog.

--- Graham

==

dl4mea

unread,
Jul 12, 2015, 12:22:25 PM7/12/15
to beagl...@googlegroups.com
I absolutely agree with Graham's report. I also saw plenty of unexplainable resets of the Beaglebone, same as Graham says when just having them on the table, naked, no cape, just flashed with fresh image. My power supplies are 5V/2A from a german quality vendor and I'm using them in hundreds, the power supply is not the reason. There is no information in journalctl -f, no information on the RS232 console, the board just resets without any indication.

I made a test with a bigger number of Beaglebone (white) and Beaglebone-Black, and these are the results within 24h. All but two of them are operating this release:
uname -a
Linux bb151f 4.1.0-rc8-bone9 #1 Wed Jun 17 00:05:43 UTC 2015 armv7l GNU/Linux

These are my test results:
MACBBCommentNumber of Resets / uptime
00:18:31:e0:54:35Whitestable since power on (1day)
-
bc:6a:29:cc:a5:aeWhitestable since power on (7 days)
-
00:18:31:8b:59:4eWhitestable-
d4:94:a1:85:c2:3dWhitestable
running 4.1.1-bone9
-
78:a5:04:cd:cf:b3Blackstable-
78:a5:04:ce:13:21Blackstable since power on (12days)
-
d0:5f:b8:d7:53:ecBlackunstablereboots every 4-6h
6c:ec:eb:5d:26:09Blackunstable, even with
4.2.0-rc1-bone1
3
d0:39:72:45:1c:f1Blackinstable,
got 1x stuck in U-Boot
6
78:a5:04:ca:a9:4eBlack
3
78:a5:04:fe:f6:11Black
3
78:a5:04:cf:4f:8eBlack
6
78:a5:04:db:5d:63Black
3
54:4a:16:c5:ea:75Black
2
78:a5:04:cf:84:5aBlack
3
78:a5:04:fd:93:dcBlack
2
78:a5:04:fe:de:13Black
5
78:a5:04:cf:5a:40Black
4
78:a5:04:cf:6c:1fBlack
6
6c:ec:eb:a5:15:1fBlack
2
78:a5:04:cf:65:48Black
4
78:a5:04:ca:8f:34Black
3

I'm sampling uptime of all boards every 30min, and sometimes the simple script that collects that data gets stuck, so the true number is definitly higher.

All my Beaglebone (white) under test are rock-solid, and so are also two BB-Black. Those two BB-Black are elder devices from the first BB-Black on the market, while those that are unstable are mostly latest production from Embest.

As Graham reports, all the boards are stable when running something before 4.x.x, in my case this is the very old 3.8 Angstrom:
# uname -a
Linux beaglebone 3.8.13 #1 SMP Tue Jul 30 11:56:13 CEST 2013 armv7l GNU/Linux

# lsb_release -a
Distributor ID: Angstrom
Description:    Angstrom GNU/Linux v2012.12 (Core edition)
Release:        v2012.12
Codename:       Core edition

I am now changing my worst candidates back to
uname -a
Linux bb1cf1 3.19.3-bone4 #1 Fri Mar 27 16:05:22 UTC 2015 armv7l GNU/Linux

Any comment or help would be greatly appreciated. If I can add some testing, let me know.

--- Guenter (dl4mea)

Peter Hurley

unread,
Jul 12, 2015, 12:55:42 PM7/12/15
to beagl...@googlegroups.com
The common debugging method for problems like this is to bisect.
However, if the start and end points are 3.14 and 4.1.x, respectively, that would be prohibitive.
Best to find a closer start point than 3.14.

Also, is 4.1.x stable if you don't mess with the image?

Regards,
Peter Hurley


Message has been deleted

dl4mea

unread,
Jul 12, 2015, 1:19:59 PM7/12/15
to beagl...@googlegroups.com
Instabilities have been found by just flashing elinux.org images from, for example,
http://elinux.org/Beagleboard:BeagleBoneBlack_Debian#Jessie_Snapshot_console, in special Flasher: (console) (BeagleBone Black eMMC)
and letting the board idle with network + serial console connected


Also, is 4.1.x stable if you don't mess with the image

I am willing to try with serveral images, as I have 15 boards suffering from this under supervision
But I don't understand which sequence to go for.
There are so many, if I look for them in
apt-cache search linux-image

--- Guenter (dl4mea)
 

Robert Nelson

unread,
Jul 12, 2015, 1:46:48 PM7/12/15
to Beagle Board

| grep ti | grep 4.1

BTW, we had similar issues when we started testing 3.14.. I saw it happen on a board Thursday, won't be able to dig into it again to Monday.

>
> --- Guenter (dl4mea)
>  
>
> --
> For more options, visit http://beagleboard.org/discuss
> ---
> You received this message because you are subscribed to the Google Groups "BeagleBoard" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

William Hermans

unread,
Jul 12, 2015, 2:02:43 PM7/12/15
to beagl...@googlegroups.com
watchdog was the first thing that popped into my mind heh.

Graham Haddock

unread,
Jul 12, 2015, 2:19:02 PM7/12/15
to beagl...@googlegroups.com
I will try it by reloading a totally untouched "bone-debian-8.1-lxqt-4gb-armhf-2015-07-05-4gb.img",
and report back.  No cape, trusted Rev.C hardware and power supply. All communications via
Ethernet.



By my saying that 3.14 is rock solid, this includes up to "bone-debian-8.1-lxqt-4gb-armhf-2015-06-15-4gb.img",
which was the last non-kernel-4 test release.

Same hardware, same power supplies, same Ethernet connection. No other hardware or connections.


--- Graham

==




Message has been deleted

William Hermans

unread,
Jul 12, 2015, 3:19:54 PM7/12/15
to beagl...@googlegroups.com
I've had this, or something similar happen to me a few times. When I did apt-get update again right after, it succeeded. But I'm still not sure of the cause.

On Sun, Jul 12, 2015 at 11:48 AM, 'dl4mea' via BeagleBoard <beagl...@googlegroups.com> wrote:
If I look on one - but not the target "worst case" Beaglebone, I see only one package matching Robert's suggestion
apt-cache search linux-image | grep ti | grep 4.1
linux-image-4.1.1-ti-r2 - Linux kernel, version 4.1.1-ti-r2

However, if I want to apt-get update on the two current worst case targets, I am getting

Get:10 http://ftp.us.debian.org jessie-updates/non-free armhf Packages [20 B]
Fetched 9247 kB in 20s (457 kB/s)
W: Failed to fetch http://repos.rcn-ee.com/debian/dists/jessie/main/binary-armhf/Packages  Hash Sum mismatch

E: Some index files failed to download. They have been ignored, or old ones used instead.

This system has installed
uname -a
Linux bb6c1f 4.1.0-rc8-bone9 #1 Wed Jun 17 00:05:43 UTC 2015 armv7l GNU/Linux

Is this a temporary hickup or any other idea?

William Hermans

unread,
Jul 12, 2015, 3:22:50 PM7/12/15
to beagl...@googlegroups.com
Anyway, guys, give me an idea of what you're doing on these boards. When you get random system reset, and I'll test here too. I have a couple free beaglebones I can run arbitrary tests on at the moment.

William Hermans

unread,
Jul 12, 2015, 3:26:33 PM7/12/15
to beagl...@googlegroups.com
By the way, currently on sdcard I am running wheezy 7.8 I believe.
debian@beaglebone:~$ cat /etc/dogtag
BeagleBoard.org Debian Image 2015-03-01
debian@beaglebone:~$ uname -a
Linux beaglebone 3.8.13-bone70 #1 SMP Fri Jan 23 02:15:42 UTC 2015 armv7l GNU/Linux

So I could apt-get install linux-image-4.1<whatever> and see if this could be related to the rootfs, or what.

dl4mea

unread,
Jul 12, 2015, 3:32:46 PM7/12/15
to beagl...@googlegroups.com

Graham Haddock

unread,
Jul 12, 2015, 3:35:30 PM7/12/15
to beagl...@googlegroups.com
Hi William:
Doing nothing with the board.  It is just sitting on the side connected to +5V power and Ethernet.
So, for example, late last night (Central US time) I loaded "bone-debian-8.1-lxqt-4gb-armhf-2015-07-05-4gb.img"
onto a trusted uSD card expanded the memory using gparted to the full 16GB, and turned off the four blue
blinky lights. No other changes.

Then I went to bed.

Reading syslog,
(Times are GMT, boot completion defined as systemd updating the time to network time.

the initial boot (completion) was at JUL 12, 05:09:27 
the lab was quiet, lights off, nothing running.
The BBB automously rebooted at 08:25:33, 13:13:22, and 14:32:27

I am now rerunning with untouched reload of "bone-debian-8.1-lxqt-4gb-armhf-2015-07-05-4gb.img"
Just load, install and boot.  Talk to command line by SSH.

--- Graham

==

You received this message because you are subscribed to a topic in the Google Groups "BeagleBoard" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beagleboard/lF1X1XINjDo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beagleboard...@googlegroups.com.

William Hermans

unread,
Jul 12, 2015, 3:47:02 PM7/12/15
to beagl...@googlegroups.com
Hi William:
Doing nothing with the board.  It is just sitting on the side connected to +5V power and Ethernet.
So, for example, late last night (Central US time) I loaded "bone-debian-8.1-lxqt-4gb-armhf-2015-07-05-4gb.img"
onto a trusted uSD card expanded the memory using gparted to the full 16GB, and turned off the four blue
blinky lights. No other changes.

Then I went to bed.

Reading syslog,
(Times are GMT, boot completion defined as systemd updating the time to network time.

the initial boot (completion) was at JUL 12, 05:09:27 
the lab was quiet, lights off, nothing running.
The BBB automously rebooted at 08:25:33, 13:13:22, and 14:32:27

I am now rerunning with untouched reload of "bone-debian-8.1-lxqt-4gb-armhf-2015-07-05-4gb.img"
Just load, install and boot.  Talk to command line by SSH.

--- Graham

OK. Well, my own personal feelings is that this could be related to systemd. Somehow. I have no proof so substantiate that.

So, I'll work on the problem bottom to top. What I mean by this is that I'll start with a rootfs I know that works. In my case wheezy 7.8. I've got that running now with

debian@beaglebone:~$ uname -a
Linux beaglebone 4.1.0-rc8-bone9 #1 Tue Jun 16 23:45:22 UTC 2015 armv7l GNU/Linux.

I'll let it sit and idle for a day or so. After that, I'll download and flash the Jessie image, install sysv, disable systemd. Then start the "test" over again.

Oh and yeah if one of you can do me a favor and run one of your boards as is with

sudo cpufreq-set -g performance and see if the problem clears up ?
 

dl4mea

unread,
Jul 12, 2015, 3:50:28 PM7/12/15
to beagl...@googlegroups.com
I now have one of the worst case rebooters running on 3.19.3-bone4 (already installed 8h ago)
root@bb1cf1:~# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 8.1 (jessie)
Release:        8.1
Codename:       jessie
root@bb1cf1:~# uname -a
Linux rc1cf1 3.19.3-bone4 #1 Fri Mar 27 16:05:22 UTC 2015 armv7l GNU/Linux
root@bb1cf1:~# uptime
 19:47:17 up  7:49,  1 user,  load average: 0.47, 0.17, 0.09

and one on Robert's suggestion 4.1.1-ti-r2
root@rc6c1f:~# uname -a
Linux rc6c1f 4.1.1-ti-r2 #1 SMP PREEMPT Wed Jul 8 17:03:29 UTC 2015 armv7l GNU/Linux
root@rc6c1f:~# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 8.1 (jessie)
Release:        8.1
Codename:       jessie
root@rc6c1f:~# uname -a
Linux rc6c1f 4.1.1-ti-r2 #1 SMP PREEMPT Wed Jul 8 17:03:29 UTC 2015 armv7l GNU/Linux
root@rc6c1f:~# uptime
 19:49:39 up 22 min,  1 user,  load average: 1.09, 0.69, 0.31



William Hermans

unread,
Jul 12, 2015, 4:06:58 PM7/12/15
to beagl...@googlegroups.com
Something else that might help troubleshoot this issue if we can get a "snapshot" of each system by way of ps aux and store them somewhere for later examination. maybe pastebin.

http://pastebin.com/ydneAtne

Peter Hurley

unread,
Jul 12, 2015, 4:29:32 PM7/12/15
to Beagle Board
On 07/12/2015 03:35 PM, Graham Haddock wrote:
> Hi William:
> Doing nothing with the board. It is just sitting on the side connected to +5V power and Ethernet.
> So, for example, late last night (Central US time) I loaded "bone-debian-8.1-lxqt-4gb-armhf-2015-07-05-4gb.img"
> onto a trusted uSD card expanded the memory using gparted to the full 16GB, and turned off the four blue
> blinky lights. No other changes.
>
> Then I went to bed.
>
> Reading syslog,
> (Times are GMT, boot completion defined as systemd updating the time to network time.
>
> the initial boot (completion) was at JUL 12, 05:09:27
> the lab was quiet, lights off, nothing running.
> The BBB automously rebooted at 08:25:33, 13:13:22, and 14:32:27
>
> I am now rerunning with untouched reload of "bone-debian-8.1-lxqt-4gb-armhf-2015-07-05-4gb.img"
> Just load, install and boot. Talk to command line by SSH.

Yeah, I think the transition to linear irq domain (added at 3.18) made cpsw
a little extra flaky. Plus the new omap_8250 serial driver is not bug-free;
just found a flow control bug in the h/w last week.

I've had ssh shells go sideways on occasion, but not with that kind of
regularity or effect.

Like I said, the right diagnostic method is bisecting the kernel.
It's going to take a while (multiple days) if several hours are required to
distinguish good from bad kernel.

Regards,
Peter Hurley

William Hermans

unread,
Jul 12, 2015, 4:42:07 PM7/12/15
to beagl...@googlegroups.com
Yeah, I think the transition to linear irq domain (added at 3.18) made cpsw
a little extra flaky. Plus the new omap_8250 serial driver is not bug-free;
just found a flow control bug in the h/w last week.

I've had ssh shells go sideways on occasion, but not with that kind of
regularity or effect.

Like I said, the right diagnostic method is bisecting the kernel.
It's going to take a while (multiple days) if several hours are required to
distinguish good from bad kernel.

Regards,
Peter Hurley

Hi Peter. "bisecting the kernel" is unknown to me. As in the meaning, But I was wondering if some sort of remote, and very verbose logging might not help ?  Currently I'm in the process of reading / learning advanced Linux programming, and have all these crazy ideas of what we could do. Just not sure what to "trap" and exactly 100% how to trap it.

William Hermans

unread,
Jul 12, 2015, 5:01:00 PM7/12/15
to beagl...@googlegroups.com
ah I see. following my own advice comes in handy sometimes . . . as in GitBisect. A bit out of my abilities.

Peter Hurley

unread,
Jul 12, 2015, 5:16:30 PM7/12/15
to beagl...@googlegroups.com
On 07/12/2015 04:41 PM, William Hermans wrote:
> /Yeah, I think the transition to linear irq domain (added at 3.18) made cpsw/
> /a little extra flaky. Plus the new omap_8250 serial driver is not bug-free;/
> /just found a flow control bug in the h/w last week./
> //
> /I've had ssh shells go sideways on occasion, but not with that kind of/
> /regularity or effect./
> //
> /Like I said, the right diagnostic method is bisecting the kernel./
> /It's going to take a while (multiple days) if several hours are required to/
> /distinguish good from bad kernel./
> //
> /Regards,/
> /Peter Hurley/
>
>
> Hi Peter. "bisecting the kernel" is unknown to me. As in the meaning, But I was wondering if some sort of remote, and very verbose logging might not help ? Currently I'm in the process of reading / learning advanced Linux programming, and have all these crazy ideas of what we could do. Just not sure what to "trap" and exactly 100% how to trap it.

Linux mainline kernel source is really just a massive linear series of patches,
one after the other, all tracked by git. Bisecting is a method of reducing the
number of patches under test by 1/2 at each iteration to arrive at a problem commit.

So, for example, let's say that I have a problem that cropped up on 4.2-rc1,
but the problem wasn't happening on 4.1-rc7.

I start a bisect with git:
$ git bisect start v4.2-rc1 v4.1-rc7
Bisecting: 6261 revisions left to test after this (roughly 13 steps)
[4570a37169d4b44d316f40b2ccc681dc93fedc7b] Merge tag 'sound-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

I build this kernel, test it, and mark it good or bad. Let's say the problem
doesn't exhibit in this kernel.

$ git bisect good
Bisecting: 3371 revisions left to test after this (roughly 12 steps)
[8d7804a2f03dbd34940fcb426450c730adf29dae] Merge tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Now I build this kernel, test it, and mark it good or bad. Let's say bad this time.

$git bisect bad
[3d9f96d850e4bbfae24dc9aee03033dd77c81596] Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Each time, the number of commits under test are being reduced by 1/2.
For a small kernel like Beaglebone this is no big deal, couple of minutes build
time. For a 64-bit distro kernel, this can take several days for 14 iterations.

Having a bunch of BBBs, all testing the same kernel at the same time
significantly improves the confidence at each iteration that the kernel
is "good" or "bad" (since obviously a problem that takes time to manifest may
be mistakenly identified as "good" and then the bisect will narrow on the wrong
commits).

Instrumenting a problem like this is basically impossible.

Regards,
Peter Hurley

William Hermans

unread,
Jul 12, 2015, 5:25:19 PM7/12/15
to beagl...@googlegroups.com
Thanks Peter for the in depth explanation. I was actually just reading a very detailed blog post by a person bug hunting in fedora 20 . . . the blog post could be considered a book in of its self, and wow yes, lots of learning to do before I can achieve the same myself.


Regards,
Peter Hurley

dl4mea

unread,
Jul 12, 2015, 11:35:30 PM7/12/15
to beagl...@googlegroups.com
Results from overnight test:

I used the worst rebooters for some tests:

(1) System bb1cf1 got installed with 3.19.3-bone4: no more reboot
uptime
 03:23:37 up 14:50,  1 user,  load average: 0.00, 0.01, 0.05

(2) System bb6c1f: installed with 4.1.1-ti-r2 #1 SMP PREEMPT Wed Jul 8 17:03:29 UTC 2015 armv7l GNU/Linux: 2 reboots
Jul 13 00:55:17 bb6c1f kernel: [    0.000000] Booting Linux on physical CPU 0x0
Jul 13 01:51:02 bb6c1f kernel: [    0.000000] Booting Linux on physical CPU 0x0

(3) System bb4f8e still has 4.1.0-rc8-bone9 #1 Wed Jun 17 00:05:43 UTC 2015 armv7l GNU/Linux: but cpufreq-set -g performance: no more reboot
uptime
 03:29:57 up  9:01,  1 user,  load average: 0.02, 0.06, 0.05

As I have to leave for the day, I will let all my systems run for at least 12h without changes.
If then still like this, I will do (1) and (3) on some more devices.

@RobertCNelson: If you have further suggestions which image to test, let me know.

--- Guenter (dl4mea)

William Hermans

unread,
Jul 13, 2015, 1:13:27 AM7/13/15
to beagl...@googlegroups.com
(3) System bb4f8e still has 4.1.0-rc8-bone9 #1 Wed Jun 17 00:05:43 UTC 2015 armv7l GNU/Linux: but cpufreq-set -g performance: no more reboot

Interesting . . . If memory serves correctly, that was the "fix" for an older kernel. So possibly older code crept into the newer ?

--

Graham Haddock

unread,
Jul 13, 2015, 9:42:10 AM7/13/15
to beagl...@googlegroups.com
OK.
I took my Rev.C unit (1c:ba:8c:d9:5e:dd) and loaded "bone-debian-8.1-lxqt-4gb-armhf-2015-07-05-4gb.img" onto a
16 GB uSD card.  Unit, power supply and card are "trusted."

Absolutely no changes to the image, just install, boot, run. No updates, additions or modifications. 
No cape, only connections are 5V power and Ethernet.
Times are GMT/UTC. I define the boot completion as the time when systemd updates the internal time from the network.

Initial boot completion: Jul 12 19:12:09
Autonomous reboot:    Jul 13 10:54:19

This time it took 15 hours for the autonomous reboot to occur. I'll let this one keep going, and report.

--- Graham

==

You received this message because you are subscribed to a topic in the Google Groups "BeagleBoard" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beagleboard/lF1X1XINjDo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beagleboard...@googlegroups.com.

Graham Haddock

unread,
Jul 13, 2015, 9:46:47 AM7/13/15
to beagl...@googlegroups.com
What does "cpufreq-set -g performance" do?
Sounds like it would lock the BBB at max CPU clock speed, or at least, not let it go down to the lowest speeds.

I found the generic Debian docs on cpufreq-set, but not the BBB specific instruction set and meanings.

--- Graham

Robert Nelson

unread,
Jul 13, 2015, 9:54:37 AM7/13/15
to Beagle Board
On Mon, Jul 13, 2015 at 8:46 AM, Graham Haddock <gra...@flexradio.com> wrote:
> What does "cpufreq-set -g performance" do?
> Sounds like it would lock the BBB at max CPU clock speed, or at least, not
> let it go down to the lowest speeds.

Correct...

> I found the generic Debian docs on cpufreq-set, but not the BBB specific
> instruction set and meanings.

It's a generic kernel interface, nothing bbb specific about it..

Regards,

--
Robert Nelson
https://rcn-ee.com/

William Hermans

unread,
Jul 13, 2015, 3:01:28 PM7/13/15
to beagl...@googlegroups.com
debian@beaglebone:~$ uptime
 11:58:48 up 23:22,  1 user,  load average: 0.22, 0.07, 0.06
debian@beaglebone:~$ uname -a
Linux beaglebone 4.1.0-rc8-bone9 #1 Tue Jun 16 23:45:22 UTC 2015 armv7l GNU/Linux

debian@beaglebone:~$ cat /etc/dogtag
BeagleBoard.org Debian Image 2015-03-01
debian@beaglebone:~$ cpufreq-info
cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
Report errors and bugs to cpu...@vger.kernel.org, please.
analyzing CPU 0:
  driver: cpufreq-dt
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 300 us.
  hardware limits: 300 MHz - 1000 MHz
  available frequency steps: 300 MHz, 600 MHz, 800 MHz, 1000 MHz
  available cpufreq governors: conservative, ondemand, userspace, powersave, performance
  current policy: frequency should be within 300 MHz and 1000 MHz.
                  The governor "ondemand" may decide which speed to use
                  within this range.
  current CPU frequency is 300 MHz.
  cpufreq stats: 300 MHz:0.04%, 600 MHz:0.00%, 800 MHz:0.00%, 1000 MHz:99.96%  (4)
 
I'll let it idle longer, but pretty sure it will have no problems. This board is an element14 REVC for what that's worth.

William Hermans

unread,
Jul 13, 2015, 3:04:33 PM7/13/15
to beagl...@googlegroups.com
Oh, and in case this might be relevant. The board is powered by USB, with only ethernet plugged in.

rh_

unread,
Jul 13, 2015, 3:52:09 PM7/13/15
to beagl...@googlegroups.com
On Sun, 12 Jul 2015 07:17:22 -0700 (PDT)
Graham <gra...@flex-radio.com> wrote:

> I have tried to run the Debian 8.1 / kernel 4.1.x test releases, and
> they all autonomously reboot several times per day.
> No regular or reproducible pattern, nothing in the syslog, other than
> the reboot process itself.

Interesting. I'd look between 4.0-rcx and 4.0. Good luck.

William Hermans

unread,
Jul 13, 2015, 5:19:52 PM7/13/15
to beagl...@googlegroups.com
Ok, so what I am wondering is: Why if this is kernel does the kernel work fine for me. When using a Wheezy 7.8 rootfs ? It is fairly safe to say that in my own case this is not related to kernel, or kernel modules . . .right ?

Nuno Gonçalves

unread,
Jul 13, 2015, 6:08:50 PM7/13/15
to beagl...@googlegroups.com
Tried cpufreq-set -g performance on a BBB but got a reset after a few hours anyway. Problem appears to be some other...

dl4mea

unread,
Jul 14, 2015, 1:04:44 AM7/14/15
to beagl...@googlegroups.com
Results after two days overnight test:

(1) System bb1cf1 got installed with 3.19.3-bone4: still no reboot
uptime
 04:19:11 up 1 day, 13:51,  2 users,  load average: 0.00, 0.01, 0.05

(2) System bb6c1f: installed with 4.1.1-ti-r2 #1 SMP PREEMPT Wed Jul 8 17:03:29 UTC 2015 armv7l GNU/Linux: 2 reboots to a total of 4
Jul 13 00:55:17 bb6c1f kernel: [    0.000000] Booting Linux on physical CPU 0x0
Jul 13 01:51:02 bb6c1f kernel: [    0.000000] Booting Linux on physical CPU 0x0
Jul 13 21:46:08 rc6c1f kernel: [    0.000000] Booting Linux on physical CPU 0x0
Jul 14 04:02:45 rc6c1f kernel: [    0.000000] Booting Linux on physical CPU 0x0

(3) System bb4f8e still has 4.1.0-rc8-bone9 #1 Wed Jun 17 00:05:43 UTC 2015 armv7l GNU/Linux: but cpufreq-set -g performance: rebooted 15:50 after around 20h uptime, before it had 6 reboots within 24h

(4) some other systems ran with cpufreq-set -g performance, feeling is that the number of reboots decreased

My conclusion:
  • cpufreq-set -g performane seems to improve the situation, but does not solve it.
  • 3.19.3-bone4 is stable
--- Guenter (dl4mea)

William Hermans

unread,
Jul 14, 2015, 1:19:39 AM7/14/15
to beagl...@googlegroups.com
Still trucking along here:


debian@beaglebone:~$ uname -a
Linux beaglebone 4.1.0-rc8-bone9 #1 Tue Jun 16 23:45:22 UTC 2015 armv7l GNU/Linux
debian@beaglebone:~$ uptime
 22:18:07 up 1 day,  9:41,  1 user,  load average: 0.08, 0.03, 0.05

By the way, I'm using default "ondemand" cpufreq governor

--

William Hermans

unread,
Jul 14, 2015, 1:38:41 AM7/14/15
to beagl...@googlegroups.com
Could this possibly be related to how "clean" provided AC mains is ? I'm just curious, as we've never had any of these problems, but we're also completely off grid. Also for the record our power here is very stable and clean. No blips, spikes, or any abnormalities one might see being connected to grid power.


William Hermans

unread,
Jul 14, 2015, 1:57:36 AM7/14/15
to beagl...@googlegroups.com
Anyway my last comment was a bit of a stretch. Seeing as this only effects some boards, and not all. However it does strike me as odd that both of you are having issues with the same kernel I'm running _right_now_. When it is running rock solid so far for me.

Which leads me to believe that *something* on the rootfs is perhaps somehow to blame. Either that, on something on these failing boards is somehow slightly out of tolerance. I'll let this run a while longer just to make sure before moving on to a Jessie image.

Do also keep in mind that while we do not own 100's of BBB's we do own 5, and none of these have ever shutdown without a reason . . . 2 A5A's and 3 element14 REVC's

Robert Nelson

unread,
Jul 14, 2015, 9:04:49 AM7/14/15
to Beagle Board
Please give 4.1.2-ti-r3 some testing, as it has pm/cpuidle fixes from ti.

sudo apt-get update
sudo apt-get install linux-image-4.1.2-ti-r3

dl4mea

unread,
Jul 14, 2015, 3:34:16 PM7/14/15
to beagl...@googlegroups.com
I installed 4.1.2-ti-r3 on 13 devices. Without executing cpufreq-set -g performance.
First impression is not good, as I had 3 reboots since then, but more info after the night about 8h.

William Hermans

unread,
Jul 14, 2015, 4:49:57 PM7/14/15
to beagl...@googlegroups.com
Please give 4.1.2-ti-r3 some testing, as it has pm/cpuidle fixes from ti.

sudo apt-get update
sudo apt-get install linux-image-4.1.2-ti-r3

Robert, would it be helpful if I ran this on the wheezy 7.8 image ?

On Tue, Jul 14, 2015 at 12:34 PM, 'dl4mea' via BeagleBoard <beagl...@googlegroups.com> wrote:
I installed 4.1.2-ti-r3 on 13 devices. Without executing cpufreq-set -g performance.
First impression is not good, as I had 3 reboots since then, but more info after the night about 8h.

William Hermans

unread,
Jul 14, 2015, 4:51:58 PM7/14/15
to beagl...@googlegroups.com
err, Wheezy 7.8 rootfs is what I meant to type . . .

Robert Nelson

unread,
Jul 14, 2015, 4:53:35 PM7/14/15
to Beagle Board
On Tue, Jul 14, 2015 at 3:49 PM, William Hermans <yyr...@gmail.com> wrote:
>> Please give 4.1.2-ti-r3 some testing, as it has pm/cpuidle fixes from ti.
>>
>> sudo apt-get update
>> sudo apt-get install linux-image-4.1.2-ti-r3
>
>
> Robert, would it be helpful if I ran this on the wheezy 7.8 image ?

That shouldn't matter.. Mine rebooted after 3 hours.. I'm also
unplugging the ethernet to test an offline "npm install xyz" script..
what's odd, it rebooted on "git pull"..

William Hermans

unread,
Jul 14, 2015, 4:58:14 PM7/14/15
to beagl...@googlegroups.com
Ok. Well before I reboot and run that linux-image . .

debian@beaglebone:~$ uptime
 13:57:28 up 2 days,  1:21,  1 user,  load average: 0.20, 0.17, 0.11

debian@beaglebone:~$ uname -a
Linux beaglebone 4.1.0-rc8-bone9 #1 Tue Jun 16 23:45:22 UTC 2015 armv7l GNU/Linu

dl4mea

unread,
Jul 14, 2015, 11:36:07 PM7/14/15
to beagl...@googlegroups.com
Here my results of of 12h testing:

bba94e: 4 reboots
bbf611: no reboot
bb4f8e: 2 reboots
bb5d6e: 4 reboots
bbea75: 1 reboot
bb845a: no reboot
bb93dc: 2 reboots
bbde13: 2 reboots
bb5a40: 2 reboots
bb6c1f: 4 reboots
bb151f: 1 reboot
bb6548: 1 reboot
bb8f34: 1 reboot

In parallel, I have set up one system with 3.19-3-bone4. That was stable for 12h while simply idling. Now while running some software on it it seems it does not have the same problem of the 4.1.x but instead it shows an "unexpected IRQ trap at vector 00" error after around 6h of operation (several times). For me that looks like 3.19.3-bone4 did not have the problem we're looking for in 4.1.x now.

-- Günter

William Hermans

unread,
Jul 15, 2015, 5:34:14 AM7/15/15
to beagl...@googlegroups.com
Idle:

debian@beaglebone:~$ uname -a
Linux beaglebone 4.1.2-ti-r3 #1 SMP PREEMPT Tue Jul 14 06:54:47 UTC 2015 armv7l GNU/Linux
debian@beaglebone:~$ uptime
 02:33:27 up 12:35,  1 user,  load average: 0.01, 0.02, 0.05


William Hermans

unread,
Jul 17, 2015, 4:33:18 PM7/17/15
to beagl...@googlegroups.com
UPDATE

debian@beaglebone:~$ uptime
 13:29:11 up 2 days, 23:30,  1 user,  load average: 0.07, 0.03, 0.05

debian@beaglebone:~$ uname -a
Linux beaglebone 4.1.2-ti-r3 #1 SMP PREEMPT Tue Jul 14 06:54:47 UTC 2015 armv7l GNU/Linux

Just idling. At some point I'll probably rewrite the CPU loading application I wrote early after the BBB's release( was just a simple 15 lines of code or some such ). But currently in the process of refactoring some code I consider to have a much higher priority . . .

Nuno Gonçalves

unread,
Jul 17, 2015, 8:48:37 PM7/17/15
to beagl...@googlegroups.com
Running now 4.1.2-ti-r4.

In 8 hours had 2 resets.

Resets seem to happen more frequently if the BBB is idling as already mentioned by dl4mea.

Nuno

dl4mea

unread,
Jul 18, 2015, 3:47:46 AM7/18/15
to beagl...@googlegroups.com
Just to let you know that my test of 13 BBB is still available

Here my results of of 3 days + 12h testing:

bba94e: 13 reboots
bbf611: 9 reboots
bb4f8e: 12 reboots
bb5d63: 12 reboots
bbea75: 14 reboots
bb845a: 12 reboots
bb93dc: 10 reboots
bbde13: 6 reboots
bb5a40: 13 reboots
bb6c1f: 14 reboots
bb151f: 7 reboots
bb6548: 14 reboot
bb8f34: 9 reboots

The reboots are equally distributed over time.

I have two BB-White running:
root@bb22:~# cat /proc/device-tree/model
TI AM335x
BeagleBone
root@bb22:~# uname -a
Linux bb22 4.1.0-rc8-bone9 #1 Wed Jun 17 00:05:43 UTC 2015 armv7l GNU/Linux
root@bb22
:~# uptime
 
07:39:40 up 8 days,  2:55,  1 user,  load average: 0.25, 0.21, 0.23
root@bb22
:~# ssh root@bb110

root@bb110:~# cat /proc/device-tree/model
TI AM335x BeagleBone
root@bb110:~# uname -a
Linux bb110 4.1.0-rc8-bone9 #1 Wed Jun 17 00:05:43 UTC 2015 armv7l GNU/Linux
root@bb110:~# uptime
 07:40:01 up 13 days, 20:07,  2 users,  load average: 0.38, 0.36, 0.32

--- Günter (dl4mea)


dl4mea

unread,
Jul 18, 2015, 4:01:31 AM7/18/15
to beagl...@googlegroups.com
Sorry, I did not recognize that there are two more BB-White on my table:

root@bb5435:~# cat /proc/device-tree/model
TI AM335x
BeagleBone
root@bb5435
:~# uname -a
Linux bb5435 4.1.0-rc8-bone9 #1 Wed Jun 17 00:05:43 UTC 2015 armv7l GNU/Linux
root@bb5435
:~# uptime
 
07:57:01 up 7 days, 12:42,  2 users,  load average: 0.00, 0.01, 0.05

root@bb66
:~# cat /proc/device-tree/model
TI AM335x
BeagleBone
root@bb66
:~# uname -a
Linux bb66 4.1.1-bone9 #1 Tue Jun 30 06:09:30 UTC 2015 armv7l GNU/Linux
root@bb66
:~# uptime
 
07:59:39 up 6 days, 21:02,  2 users,  load average: 0.00, 0.02, 0.05

They do not all run since the same power-on time, so uptime might be different due to time of actual first startup.

William Hermans

unread,
Jul 18, 2015, 7:59:12 PM7/18/15
to beagl...@googlegroups.com
You all read the "BBB intermittently rebooting" post ? Maxim was saying a similar problem was happening to some with the 3.2 kernel. Then grounding vbus or vUSB made the problem go away.

Not sure if this will fix it for all of you, but I can say that I am powering via USB, and have had zero problems with the same kernels you all are. Also I have been assuming you all are using the 5v barrel jack . . .

Graham

unread,
Jul 18, 2015, 8:14:36 PM7/18/15
to beagl...@googlegroups.com
For what it is worth...

I loaded the console version "bone-debian-8.1-console-armhf-2015-07-12-2gb.img" onto a uSD card.
Then booted and installed "linux-image-4.1.2-ti-r4"

Then rebooted, and let it run doing nothing else.
It autonomously rebooted after seven hours.

--- Graham

==

evilwulfie

unread,
Jul 18, 2015, 8:21:47 PM7/18/15
to beagl...@googlegroups.com
powered how

Graham Haddock

unread,
Jul 18, 2015, 8:28:48 PM7/18/15
to beagl...@googlegroups.com
I power off the 5Volt power connector.
The only other connection to the unit is Ethernet cable.
--- Graham

==

You received this message because you are subscribed to a topic in the Google Groups "BeagleBoard" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beagleboard/lF1X1XINjDo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beagleboard...@googlegroups.com.

Nuno Gonçalves

unread,
Jul 18, 2015, 8:34:33 PM7/18/15
to beagl...@googlegroups.com
Also on the barrel jack.

For what is worth I believe the OMAP and PMIC reset reason registers should be part of the boot log so future reboot problems can be sorted.

Nuno

evilwulfie

unread,
Jul 18, 2015, 8:46:21 PM7/18/15
to beagl...@googlegroups.com
My friend powers his from USB and all is fine.
sounds like its related to the Vusb line issue posted here recently

Graham Haddock

unread,
Jul 18, 2015, 11:35:40 PM7/18/15
to beagl...@googlegroups.com
What version Debian, and which version kernel is your friend running?

evilwulfie

unread,
Jul 18, 2015, 11:42:10 PM7/18/15
to beagl...@googlegroups.com
william hermans posts

same one your using if i read correctly

i had it running on one powered by a wallwart-USB power supply and no issues as well
but its been off for a while as i am not currently using it

William Hermans

unread,
Jul 18, 2015, 11:42:23 PM7/18/15
to beagl...@googlegroups.com
HIs friend is me ;)


debian@beaglebone:~$ uname -a
Linux beaglebone 4.1.2-ti-r3 #1 SMP PREEMPT Tue Jul 14 06:54:47 UTC 2015 armv7l GNU/Linux
debian@beaglebone:~$ cat /etc/dogtag
BeagleBoard.org Debian Image 2015-03-01
debian@beaglebone:~$ uptime
 20:42:18 up 4 days,  6:43,  1 user,  load average: 0.00, 0.01, 0.05

William Hermans

unread,
Jul 18, 2015, 11:43:53 PM7/18/15
to beagl...@googlegroups.com
The rootfs is Wheezy 7.8 for what that's worth. But Robert already said that shouldn't matter.

William Hermans

unread,
Jul 18, 2015, 11:45:32 PM7/18/15
to beagl...@googlegroups.com
grrr, and as usual . . . I keep forgetting to comment on stuff I've been thinking about all day, in one post :/

My "concerns" about systemd are long gone now. Every one of these linux-images I've run have booted up, and stayed using systemd.

dl4mea

unread,
Jul 21, 2015, 1:53:07 AM7/21/15
to beagl...@googlegroups.com
Since we now have two threads about the same problem...
https://groups.google.com/forum/#!topic/beagleboard/2yOpE3XYJ1Y

My 13 BB-Black under test are powered from external +5V with these power supplies:
http://www.deutronic.com/products/power-supplies/ac-adapter/esc15g-15-watt.html
Average reboot number of each BB-B was each about 3 per day.

After reading this thread, I simply connected a USB cable from the front side Type-A to the back side Mini-USB and since then the number of reboots drastically decreased, within the last 12h I saw in total just two.
Other systems under my control but not located in my lab are showing the same improvement.

--- Günter (dl4mea)

Reply all
Reply to author
Forward
0 new messages