gdb / beaglebone / xenomai / threads

414 views
Skip to first unread message

Drew Moore

unread,
May 20, 2014, 5:46:39 PM5/20/14
to machi...@googlegroups.com
Hey guys..

I'm having trouble running my multi-threaded xenomai-enabled app under gdb on a beaglebone black.

Shouldn't this work? This link says it should..
http://www.xenomai.org/index.php/FAQs#How_can_GDB_be_used.3F

I'm using the latest machinekit debian images. (uname Thu May 8 but I see the trouble in the other recent images as well.)
I'm trying to distill it down to a simple example that shows the behavior:

When I compile xenomai/examples/native/trivial_periodic.c with -g and single-step through main, demo_task exits prematurely.
If I just run it in gdb, it runs as expected.. but if I create a new trivial thread (one that simply sleeps) after demo_task has been created, it seems that the act of creating the new thread kills the previous thread.. in gdb. If I create it before demo_task, then the new trivial thread dies, but demo_task runs. If I single step, both threads die. If I sleep for a few seconds between create and start, both threads die.. in gdb.. but all of these examples run as expected outside of gdb.

One thing I find confusing is that while single-stepping, I can get the message about the thread exiting before the thread has even been started with rt_task_start.

A similar pthreads-only app steps and runs as expected on the same SD image.

Can anybody shed light on this? How do you folks debug? What am I missing?

Thanks in advance..

Drew

Michael Haberler

unread,
May 20, 2014, 8:07:32 PM5/20/14
to machi...@googlegroups.com


On Tuesday, May 20, 2014 11:46:39 PM UTC+2, Drew Moore wrote:
Hey guys..

I'm having trouble running my multi-threaded xenomai-enabled app under gdb on a beaglebone black.

Shouldn't this work? This link says it should..
http://www.xenomai.org/index.php/FAQs#How_can_GDB_be_used.3F
I'm using the latest machinekit debian images. (uname Thu May 8 but I see the trouble in the other recent images as well.)
I'm trying to distill it down to a simple example that shows the behavior:

When I compile xenomai/examples/native/trivial_periodic.c with -g and single-step through main, demo_task exits prematurely.
If I just run it in gdb, it runs as expected.. but if I create a new trivial thread (one that simply sleeps) after demo_task has been created, it seems that the act of creating the new thread kills the previous thread.. in gdb. If I create it before demo_task, then the new trivial thread dies, but demo_task runs. If I single step, both threads die. If I sleep for a few seconds between create and start, both threads die.. in gdb.. but all of these examples run as expected outside of gdb.

One thing I find confusing is that while single-stepping, I can get the message about the thread exiting before the thread has even been started with rt_task_start.

I think the confusion is about what xenomai RT threads are.

Xenomai is a hypervisor, i.e. RT threads run sort of "underneath" the linux kernel, like glorified interrupt handlers,  and the linux kernel being essentially a background task. You cannot expect stock posix threads behavior from an RT thread, because the thread context is essentially outside the linux kernel. That impacts debugging with gdb too. 

If you set a breakpoint in an RT thread, the immediate consequence is that this thread looses timing accuracy and hence is 'domain switched' to become a vanilla linux thread.

A similar pthreads-only app steps and runs as expected on the same SD image.

Can anybody shed light on this? How do you folks debug? What am I missing?

Debugging: if there is an issue with the application logic per se, what I do is to run the posix flavor of machinekit, which takes out the peculiarities of Xenomai RT threads, and makes gdb work as expected since it's all posix threads; most application logic bugs arent timing-related anyway

If issues are timing related - say because you used a linux system call from an RT thread and that RT thread got domain-switched to normal posix thread because of the timing impact that has, the details of the domain switch event will tell you where this happened.

But usually a good stare at the code will reveal what's wrong - simple rule: any linux system call - or library function which might cause a linux system call - is suspect. For instance, printf is suspect, since it calls write(2), and that will likely get your thread domain-switched. That's the reason why Xenomai has an rt_printf() function. Any code with dynamic memory allcoation: generally not a good idea; once the allocator needs a new slab of memory - bang, domain-switched. Note there is no C++ code in the RT threads execution path in machinekit, and for good reasons. C++ memory management is just a bit too automatic for this fairly restrictive environment. You could use RT-PREEMPT kernels which are easier to handle from an application perspective - at a price though, latency is usually half an order of magnitude worse.

the rtapi_msg_print functions use a ringbuffer with negligible delay, so that is usualy good help for narrowing down the issue without much impact on delay; also sometimes taking time stamps with rtapi_get_time() helps.

For stronger tools, consider the ipipe tracer, and possible the LTTNG trace toolkit (looks promising but I have no applicable experience).

For the gory details of RT exception handling, read https://github.com/machinekit/machinekit/blob/master/src/rtapi/xenomai.c for a start. Also, some of the precautions in rtapi/rtapi_app.cc, for instance to prevent page faults in RT threads.

You probably get more advice on the theory behind this on the xenomai list - we got this to work, prefer not to rock the boat, and follow simple rules which might be sub-optimal ;).

I dont think software versions have anything to do with what you observe.

- Michael


Drew Moore

unread,
May 21, 2014, 10:59:17 AM5/21/14
to machi...@googlegroups.com
Thanks for the reply, you've given me stuff to check out.

I don't think it explains why the thread exits before it has even started.
(task_create doesn't even include the address of the function to execute, yet gbd tells me the thread has exited shortly after the thread has been created..)
here's (effectively) my code. It's a modified version of the trivial-periodic example code.

// int rt_task_create (RT_TASK *task, const char *name, int stksize, int prio, int mode);
rt_task_create (&first_task, "first task", 0, 90, 0); //
rt_task_create (&second_task, "second task", 0, 80, 0); //
sleep(3); // <<
printf("hello world/n/r");
 

then at the command line:

gdb ./myprog
run

at this point I get four messages. Two "New Thread" messages and two "Thread Exited" messages.
then a 3 second pause, followed by the hello world

I haven't even called rt_task_start on either thread, so I don't think it is the code in my threads!

I have not started any realtime code yet, so I don't think it's a domain switch. My breakpoints (if I set them) never even get hit.

I'll check out the ipipe tracer and see if that gives me any clues.
I'll also move this to the xenomai list if nobody has any more ideas here.

Thanks again!

Michael Haberler

unread,
May 21, 2014, 1:31:10 PM5/21/14
to machi...@googlegroups.com


On Wednesday, May 21, 2014 4:59:17 PM UTC+2, Drew Moore wrote:
Thanks for the reply, you've given me stuff to check out.

I don't think it explains why the thread exits before it has even started.
(task_create doesn't even include the address of the function to execute, yet gbd tells me the thread has exited shortly after the thread has been created..)
here's (effectively) my code. It's a modified version of the trivial-periodic example code.

please pastebin your example in whole

-m

Drew Moore

unread,
May 21, 2014, 4:33:06 PM5/21/14
to machi...@googlegroups.com
Here's my modified /home/machinekit/xenomai-2.6/examples/native/trivial-periodic.c

http://pastebin.com/MUgLAbGX

Then at the terminal:

cd xenomai-2.6/examples/native
export MY_CFLAGS="-g"
make trivial-periodic
gdb ./trivial-periodic
run

At this point, I get the four thread messages, a 3 second pause, then "hello world" and nothing else.

Thanks for looking at this; I guess it's probably a little off-topic for this group.
I googled a bit but didn't find anyone else having similar trouble.

Michael Haberler

unread,
May 21, 2014, 6:58:22 PM5/21/14
to machi...@googlegroups.com


On Wednesday, May 21, 2014 10:33:06 PM UTC+2, Drew Moore wrote:
Here's my modified /home/machinekit/xenomai-2.6/examples/native/trivial-periodic.c

http://pastebin.com/MUgLAbGX

Then at the terminal:

cd xenomai-2.6/examples/native
export MY_CFLAGS="-g"
make trivial-periodic
gdb ./trivial-periodic
run

At this point, I get the four thread messages, a 3 second pause, then "hello world" and nothing else.

I can reproduce this - no idea why.

 

Thanks for looking at this; I guess it's probably a little off-topic for this group.
I googled a bit but didn't find anyone else having similar trouble.

Your best bet is the Xenomai group.

-m
 

Drew Moore

unread,
May 28, 2014, 10:27:45 AM5/28/14
to machi...@googlegroups.com


On Wednesday, May 21, 2014 6:58:22 PM UTC-4, Michael Haberler wrote:

I can reproduce this - no idea why.

Your best bet is the Xenomai group.

-m
 

I did get an answer -- after some discussion on the Xenomai list and some debugging of my own, I found that rt_task_create fires up a new thread that runs the "trampoline task."
The trampoline task waits around for you to call rt_task_start and bounce the real task off of it. In the debugger, this waiting for the real task gets an unexpected signal and the trampoline task exits without getting the real task. So, now the thread messages make sense.

To make it wait longer, I modified line 110 of skins/native/task.c:

- while(err == -EINTR);
+ while(err == -EINTR || err == -ENOSYS);

(and put the new libnative in /usr/xenomai/lib)

I'm not sure if this is a "correct fix" but it doesn't seem like a bad fix, and seems to be doing the job.

Drew

Michael Haberler

unread,
May 28, 2014, 10:48:41 AM5/28/14
to machi...@googlegroups.com
Drew,


thanks for drilling down - appreciated!

I followed up on the xenomai list - I cant detect any violation of the conditions Philippe mentioned there; using a slightly earlier kernel version

very curious what will come out of this - that said, the whole thing runs fine AFAICT.. but you never know

- Michael 

Drew

Drew Moore

unread,
May 30, 2014, 9:51:39 AM5/30/14
to machi...@googlegroups.com
Just to update the machinekit list..

My solution to this *was* a hack. No Xenomai call should ever return ENOSYS, so there was a bug in the kernel.

Gilles Chanteperdrix (Xenomai) pondered this for a short while (he was able to reproduce it as well) and came up with a patch that involved deleting four lines in ipipe.c and moving a label up a dozen lines in entry-common.S. In his words... "The result is embarrassingly simple: the BUG_ON is wrong, and we simply
need to pass restarted syscalls through __ipipe_syscall_root."

He issued a pull request including this patch; I'm not sure when that will get into machinekit as the patch was for ipipe 3.14.0.. but when it does, gdb should work fine with Xenomai in machinekit.

John Morris

unread,
May 30, 2014, 12:27:22 PM5/30/14
to Drew Moore, machi...@googlegroups.com
Hi Drew,

I'm about to release new Xenomai kernel packages for Machinekit, but
based on 3.8.13 for now. Do you have an idea if Gilles's patch can be
back-ported to 3.8.13? I could plop it into the packaging.

Point me to the patch and I'll see if it can be worked in.

John

Drew

unread,
May 30, 2014, 12:34:58 PM5/30/14
to John Morris, machi...@googlegroups.com
This is a cut and paste of Gilles' message on the Xenomai list.. (between the triple quotes)

It looks like it could be backported.


"""

The result is embarrassingly simple: the BUG_ON is wrong, and we simply
need to pass restarted syscalls through __ipipe_syscall_root.


diff --git a/arch/arm/kernel/entry-
common.S b/arch/arm/kernel/entry-common.S
index 68a80d3..8f29cb1 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -446,6 +446,7 @@ ENTRY(vector_swi)
        eor     scno, scno, #__NR_SYSCALL_BASE  @ check OS number
 #endif

+local_restart:
 #ifdef CONFIG_IPIPE
        mov     r1, sp
        mov     r0, scno
@@ -457,7 +458,6 @@ ENTRY(vector_swi)
        ldmia   sp, { r0 - r3 }
 #endif /* CONFIG_IPIPE */

-local_restart:
        ldr     r10, [tsk, #TI_FLAGS]           @ check for syscall tracing
        stmdb   sp!, {r4, r5}                   @ push fifth and sixth args

diff --git a/arch/arm/kernel/ipipe.c b/arch/arm/kernel/ipipe.c
index abdbe29..3a2266c 100644
--- a/arch/arm/kernel/ipipe.c
+++ b/arch/arm/kernel/ipipe.c
@@ -440,10 +440,6 @@ asmlinkage int __ipipe_syscall_root(unsigned long scno, struct pt_regs *regs)

        fast_irq_enable(flags);
 out:
-#ifdef CONFIG_IPIPE_DEBUG_INTERNAL
-       BUG_ON(ret > 0 && current_thread_info()->restart_block.fn !=
-              do_no_restart_syscall);
-#endif
        return ret;
 }

"""

Drew

unread,
May 30, 2014, 12:37:07 PM5/30/14
to John Morris, machi...@googlegroups.com
Here's his pull request..
I'm assuming "fix syscall restarting" is the one..

"""
The following changes since commit a7adc69ef0473e88b4c9678408cd53
800d08f0f5:

  arm/ipipe: export smp_on_up (2014-05-10 11:22:05 +0200)

are available in the git repository at:

  git://git.xenomai.org/ipipe-gch.git for-ipipe-3.14.0

for you to fetch changes up to 4cbe7685083be26834850e62fb16802eea3610ef:

  arm/ipipe: Add support for Ti da850 board (2014-05-29 14:29:53 +0200)

----------------------------------------------------------------
Gilles Chanteperdrix (5):
      arm/ipipe: avoid calling irq_enter/irq_exit for IPIs
      arm/ipipe: forbid context tracking with I-pipe
      arm/fcse: always set fcse_large_process
      arm/fcse: pass mm to services instead of fcse pid
      arm/ipipe: fix syscall restarting

Peter Howard (2):
      dma: edma: fix incorrect SG list handling
      arm/ipipe: Add support for Ti da850 board

 arch/arm/Kconfig                   |    2 +-
 arch/arm/include/asm/fcse.h        |   74 +++++++++++++++++++++++-----------------
 arch/arm/include/asm/mmu_context.h |   40 ++++++++--------------
 arch/arm/kernel/entry-common.S     |    2 +-
 arch/arm/kernel/ipipe.c            |    4 ---
 arch/arm/kernel/smp.c              |    8 ++---
 arch/arm/mach-davinci/Kconfig      |    1 +
 arch/arm/mach-davinci/cp_intc.c    |    1 +
 arch/arm/mach-davinci/time.c       |   46 +++++++++++++++++++++++++
 arch/arm/mm/fcse.c                 |   51 ++++++++++++++-------------
 drivers/dma/edma.c                 |    6 ++--
 drivers/gpio/gpio-davinci.c        |    3 +-
 12 files changed, 143 insertions(+), 95 deletions(-)
"""

Reply all
Reply to author
Forward
0 new messages