POGS Computation Errors - Calling for help.

63 views
Skip to first unread message

Daniel Carrion

unread,
Jul 31, 2013, 11:05:14 AM7/31/13
to boinc-andr...@googlegroups.com
Hello

There seems to be a bug when running on BOINC for Android where POGS is exiting due to SIGILL and SIGSEGV errors. The problem is that this is not happening on all devices, including my Samsung Galaxy S3 and ODROID-U2. I'm hoping that the dumps in /data/tombstones will be able to tell me what was going on when the application exited. It's clearly memory related but it's not clear as to why.

If there's anyone out there who's running POGS and getting these errors who is willing to assist in troubleshooting please contact me or jump over onto the forums at http://pogs.theskynet.org/pogs. If I can find someone with a device experiencing this problem and willing to troubleshoot then I can hopefully get to the bottom of this.

Regards

Daniel

Iris Ailin-Pyzik

unread,
Jul 31, 2013, 11:12:34 AM7/31/13
to boinc-andr...@googlegroups.com
Daniel -

I had those repeated errors on POGS, to the point where I quit running those tasks.  Asteroids ran with an occasional computation error, and I'd never had one on Einstein until yesterday.

I just looked in tombstones (not sure if it is truly the right directory or not) and found nothing at all there.

I'm willing to help, how do we proceed?

Iris

Daniel Carrion

unread,
Jul 31, 2013, 11:50:39 AM7/31/13
to Iris Ailin-Pyzik, boinc-andr...@googlegroups.com
Hi Iris

Thanks for responding. I had a look and noticed that a couple of your Asteroid tasks also exited with SIGSEGV.

Bugger. I was hoping /data/tombstones would contain something but I guess that might just be for core applications? This was on CM10.1 where I found stuff in mine.

I'm going to build an application around the BOINC wrapper to simulate these errors to find the best way of getting users to provide full dumps/traces for applications exiting like that.

Regards

Daniel

--
You received this message because you are subscribed to the Google Groups "BOINC Android Testing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to boinc-android-te...@googlegroups.com.
To post to this group, send email to boinc-andr...@googlegroups.com.
Visit this group at http://groups.google.com/group/boinc-android-testing.
To view this discussion on the web visit https://groups.google.com/d/msgid/boinc-android-testing/b4b479cd-0df1-4d31-a4fd-89a65ca673fa%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Daniel Carrion

unread,
Jul 31, 2013, 1:25:11 PM7/31/13
to Iris Ailin-Pyzik, boinc-andr...@googlegroups.com
OK, I just simulated a SEGSEGV by compiling a bad fit_sed app and it seems to have generated a stack trace in /data/tombstones. I'm wondering if this is something that needs to be enabled. Or maybe it's instead located at /tombstones on different devices. 

root@android:/data/tombstones # head -10 tombstone_05
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'unknown'
Revision: '0'
pid: 24055, tid: 24055, name: fit_sed  >>> ../../projects/pogs.theskynet.org_pogs/fit_sed <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 000004b0
    r0 00000004  r1 befffa94  r2 befffaa8  r3 000004b0
    r4 000082f4  r5 befffa94  r6 00000004  r7 befffaa8
    r8 00000000  r9 00000000  sl 00000000  fp befffa5c
    ip 4001200c  sp befffa50  lr 40025735  pc 00008308  cpsr 60010050
    d0  0000000000000000  d1  0000000000000000

Regards

Daniel

Iris Ailin-Pyzik

unread,
Jul 31, 2013, 2:47:03 PM7/31/13
to Daniel Carrion, boinc-andr...@googlegroups.com
Daniel -

I let another POGS task load, and I'll let it run.  If it is true to form, I'll get an error, and I'll check tombstones again and let you know.

Iris

Daniel Carrion

unread,
Aug 1, 2013, 2:11:48 AM8/1/13
to Iris Ailin-Pyzik, boinc-andr...@googlegroups.com
So just an update on this. Iris has provided me with some output from NativeBOINC BugCatcher mode. It seems that it is the wrapper bombing out. Extract from log files

One with SIGILL

---- NATIVEBOINC BUGCATCH REPORT HEADER ----
BugReportTime:1375312087.187
CommandLine: "../../projects/pogs.theskynet.org_pogs/wrapper_arm-android-linux-gnu_340"
Manufacturer:motorola
ModelName:XT897
AndroidVer:4.1.2
KernelVer:3.0.42-gbd030d1
NativeBOINCVer:0.4.4.2.2b
Pid:5990
Signal:4

Other with SIGSEGV:

---- NATIVEBOINC BUGCATCH REPORT HEADER ----
BugReportTime:1375317764.520
CommandLine: "../../projects/pogs.theskynet.org_pogs/wrapper_arm-android-linux-gnu_340"
Manufacturer:motorola
ModelName:XT897
AndroidVer:4.1.2
KernelVer:3.0.42-gbd030d1
NativeBOINCVer:0.4.4.2.2b
Pid:6000
Signal:11

At  first glance of the stack trace, seems to be when wrapper goes to start new task. I've contacted Mateusz from NativeBOINC for his thoughts on the output. 

In the meantime I will be whacking in a whole lot of debug output into the wrapper code to see exactly where it exits out. I would like to attach GDB to the running process but at the moment I don't want to have to bother people in doing this. It's unfortunate I can't reproduce this.

Thank you Iris for your assistance thus far.

Regards

Daniel

Iris Ailin-Pyzik

unread,
Aug 1, 2013, 8:18:40 AM8/1/13
to Daniel Carrion, boinc-andr...@googlegroups.com
OK, just to clear somethings up with this, because I think the thread is convoluted.

I was running BOINC and having all these errors on POGS, so Daniel asked me to try NativeBOINC to see if he could make more sense from the bugcatcher feature.  The dumps below are from bugcatcher.

To complicate further -

Last night I left my phone running NativeBOINC and now it will not run POGS at all.  It downloads a task, and almost immediately says it is done but there is no output file, and fetches another.  I rebooted the phone - same deal.  See

http://pogs.theskynet.org/pogs/results.php?hostid=10438
http://pogs.theskynet.org/pogs/results.php?hostid=10529

It seems to be treating the host as a new one after I re-started NativeBOINC.

I suspended the POGS project and deleted the tasks, added Einstein which ran well under BOINC, and it seems to be happily chugging away on that one.

Just by way of contrast, this is an old Dell Streak that lives in our kitchen for internet radio, and has two errors but otherwise is running POGS under BOINC with no problems -

http://pogs.theskynet.org/pogs/results.php?hostid=9083

Iris

Daniel Carrion

unread,
Aug 1, 2013, 8:49:44 AM8/1/13
to Iris Ailin-Pyzik, boinc-andr...@googlegroups.com
I apologize if I've confused the situation. I think for now it's probably best if users experiencing the same errors on their devices stop crunching  POGS until we get this issue resolved. It would be great if those that are and are willing to help troubleshoot could head over to POGS forums and post.

From Iris' assistance troubleshooting, we know that there is a definite problem with the wrapper exiting with SIGILL and SIGSEGV on some devices. I can see these errors on other Android hosts crunching POGS. The dumps Iris provided confirm this, however, it's not clear as to why it's happening.

I am now preparing a wrapper version that will have better debugging and logging to see what exactly causes it to crash. Once ready I will post for willing users experiencing the problem to help me test it and then review the stderr logs once the tasks report back.

At the same time I've also contacted the NativeBOINC developer to see if he can provide some insight. I'm also having a poke around other Android projects to see if they are getting similar problems.

I again apologize for the confusion and making the thread convoluted. It may be better if this troubleshooting happen on the POGS forum instead of this group as it's hard to follow as an email correspondence.

Regards

Daniel

Daniel Carrion

unread,
Aug 5, 2013, 3:02:13 AM8/5/13
to boinc-andr...@googlegroups.com, Iris Ailin-Pyzik, Kevin Vinsen, David Anderson
Hello All

We've set up a test project site to troubleshoot problems with POGS tasks on Android. We invite everyone currently experiencing computation error problems to help us test. The wrapper being distributed for this project has debugging flags turned on and more logging.

The test project site to attach to: http://pogstest.theskynet.org/pogstest

You will need to create a new account.

For those that are able, we ask that you run logcat in the background before POGS tasks begin running. You can do this from a terminal emulator on your Android or by invoking adb from the SDK and then running:

logcat -s DEBUG > /sdcard/Download/logcat.txt

If the tasks crash, there should be a some useful dump information in that logcat.txt file you can send me.

We appreciate any assistance in helping us debug this problem. If you are helping out, please post over at http://pogs.theskynet.org/pogs/forum_thread.php?id=250.

Kind Regards

Daniel

Daniel Carrion

unread,
Aug 6, 2013, 2:42:30 AM8/6/13
to boinc-andr...@googlegroups.com, Iris Ailin-Pyzik, Kevin Vinsen, David Anderson
Hello

Thanks to everyone that has attached so far. I can see there's errors in the task out...hope there's something useful in that logcat dump :).

I have released another version that removes redirection to child specific stderr output file. Please update/reset project on your phones to get this version.

Cheers

Daniel

Jérôme Cadet

unread,
Aug 6, 2013, 12:40:04 PM8/6/13
to boinc-andr...@googlegroups.com, Iris Ailin-Pyzik, Kevin Vinsen, David Anderson
Got my first POGS WU termintated OK on my SG2S after 13 hours of calculation :)
Daniel

To unsubscribe from this group and stop receiving emails from it, send an email to boinc-android-testing+unsub...@googlegroups.com.

Daniel Carrion

unread,
Aug 6, 2013, 12:55:08 PM8/6/13
to Jérôme Cadet, boinc-andr...@googlegroups.com, Iris Ailin-Pyzik, Kevin Vinsen, David Anderson
Thanks Jerome. Good to know it works on the Samsung Galaxy SII. I can see you're running kernel version 3.0.31. So I'm guessing you're on Android 4.1.2??


To unsubscribe from this group and stop receiving emails from it, send an email to boinc-android-te...@googlegroups.com.

To post to this group, send email to boinc-andr...@googlegroups.com.
Visit this group at http://groups.google.com/group/boinc-android-testing.

Jérôme Cadet

unread,
Aug 6, 2013, 1:09:06 PM8/6/13
to Daniel Carrion, boinc-andr...@googlegroups.com, Iris Ailin-Pyzik, Kevin Vinsen, David Anderson
Hi

I'm using CyanogenMod 10.1.0-i9100G (this is the G variant of the SG2S) with android 4.2.2 with a rooted kernel (3.0.31-CM-gc2ca3a7).

J.


2013/8/6 Daniel Carrion <dcarr...@gmail.com>

Daniel Carrion

unread,
Aug 7, 2013, 2:18:31 AM8/7/13
to boinc-andr...@googlegroups.com, Iris Ailin-Pyzik, Kevin Vinsen, David Anderson
Hello Again

I have pushed out yet another version that outputs during task poll loop for 60 counts. If you can, please attach so we can see if this reveals any patterns as to where it's exiting.

Regards

Daniel

Daniel Carrion

unread,
Aug 11, 2013, 5:43:42 PM8/11/13
to boinc-andr...@googlegroups.com, BOINC Developers Mailing List, Iris Ailin-Pyzik, Kevin Vinsen, David Anderson
Greetings

There is one more version that's just been released to http://pogstest.theskynet.org/pogstest. This one is built from a recent commit to BOINC repo, which excludes continuous cpu time calculation for Android to overcome a bug that crashes the wrapper. It's pretty much identical to version 0.09 that was recently released. I ask that all users testing to update/reset to receive version 0.10 app tasks. This will probably be the one released to http://pogs.theskynet.org/pogs.

Additionally, I am going to keep an instance of these wrapper tasks up at my project site here, as we still need to get to the bottom of the problem. Anyone that has this issue and willing to spare an Android device out of hours please attach to the project so I can collect relevant debugging output for analysis.

Thanks to all that have helped thus far.

Cheers

Daniel
Reply all
Reply to author
Forward
0 new messages