Does anybody knows how to force compiler to use call instruction instead of
br(branch)for disassembling function call?
It is extremely important for me to specific function is disassembled using
call instead of brunch, as compiler always does.
Please help.....
---------------------------------------
This message was sent using the comp.arch.embedded web interface on
http://www.EmbeddedRelated.com
> Dear all,
>
> Does anybody knows how to force compiler to use call instruction
> instead of br(branch)for disassembling function call?
> It is extremely important for me to specific function is disassembled
> using call instead of brunch, as compiler always does.
>
A) Why is it so important to you to use CALL rather than BR? You may
be falling into the trap of attacking the wrong problem.
B) I think the world would be a generally happier place if more
processors had a dedicated brunch instruction. I figure that properly
implemented it ought to take a good hour and a half to return, and then
come back with the stack smelling of coffee and bacon.
--
Rob Gaddi, Highland Technology
Email address is currently out of order
void Spin(void){
The compiler has deduced that a branch instruction is as good as a call
instruction for this/these calls of Spin. There can be two reasons for that:
1. If the compiler has seen the code of Spin (if it is in the same
source-code file as the calling function) it may have deduced that Spin
never returns, so it does not need the return address that a call
instruction would push on the stack. Of course the compiler cannot know
that your scheduler breaks C semantics (I assume by interrupting the
eternal loop in Spin) and needs the return address.
2. If the call to Spin is the last statement in the calling function (a
"tail call"), the compiler understands that the call does not have to
push a return address, because Spin will return (assuming it would
return) to the end of the calling function, which immediately returns to
*its* caller. The branch instruction leaves the calling function's
return address on the stack, so when Spin returns (assuming it could
return) it will take a short-cut and return to the caller of the calling
function. This optimization saves time and stack space.
In case 1, try to put the Spin function in its own source-code file and
compile it separately. When the compiler then compiles a call to Spin,
it should assume that Spin may return, and therefore needs a return
address and a call instruction, not a branch.
In case 2, you could add some statement in the calling function after
the call to Spin, that is, make sure that the call to Spin is never a
tail call. On the other hand, since a tail call still leaves a valid
return address on the stack, your scheduler could use this return
address (the return address for the function that calls Spin). Then you
don't have to do anything, it should work even with a branch instruction.
Another possibility is to avoid the "High" optimization level of the
compiler. I did not find a specific explanation of the
tail-call-to-branch optimization in my copy of the compiler manual, but
the "High" level seems to have most of the inter-procedural
optimizations, of which this may be one. Try the "Medium" level for the
compilation of the calling functions, and hope that the compiler does
not do tail-call optimization at this level.
HTH,
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
>> This is why i need it....
>> Function I'm calling have looks something like this:
>> void Spin(void){
>> for(;;){}
>> }
>
> The compiler has deduced that a branch instruction is as good as a call
> instruction for this/these calls of Spin. There can be two reasons for
> that:
[return address]
> In case 2, you could add some statement in the calling function after
> the call to Spin, that is, make sure that the call to Spin is never a
> tail call. On the other hand, since a tail call still leaves a valid
> return address on the stack, your scheduler could use this return
> address (the return address for the function that calls Spin). Then you
> don't have to do anything, it should work even with a branch instruction.
>
> Another possibility is to avoid the "High" optimization level of the
> compiler. I did not find a specific explanation of the
> tail-call-to-branch optimization in my copy of the compiler manual, but
> the "High" level seems to have most of the inter-procedural
> optimizations, of which this may be one. Try the "Medium" level for the
> compilation of the calling functions, and hope that the compiler does
> not do tail-call optimization at this level.
volatile void Spin(void) {} ?
>This is why i need it....
>Function I'm calling have looks something like this:
>void Spin(void){
>for(;;){}
>}
>So if it is disassembled with call before entering in pc will be saved on
>stack and it will point to instruction after function spin....So I want to
>use that pc and to save context so when my scheduler schedule that task
>again it will not continue spinning in that forever loop but it will jump
>to next instruction after Spin function.....
> branch doesn t push pc to stack so taht s my problem;)
Why don't you call a context_switch() function ?
--
42Bastian
Do not email to bast...@yahoo.com, it's a spam-only account :-)
Use <same-name>@monlynx.de instead !
Your diagnosis of the problem is fair enough, but your workarounds are,
IMHO, totally wrong. Anything that involves trying to trick or cripple
the compiler (separate compiled files, disabling optimisations, fake
extra inline assembly, gratuitous function pointer usage, etc.) is at
best an ugly hack, and at worst a maintenance nightmare. Remember, the
compiler is free to work around all these workarounds - lying to your
tools is a bad idea.
The function is called by branch, not call, because it never returns.
That's what you (OP) wrote in the source code, so that's what the
compiler does. If you want the function to return, you have to write
code that allows the function to return. In particular, you need to
have some way of exiting the spin, otherwise it is useless. Thus you
should write your spin function so that it exits when that condition is
satisfied. For example,
void Spin(volatile uint8_t char *pBlockedFlag) {
while (!(*pBlockedFlag)) ;
}
If you can't see why you need something along these lines, you'll have
to think a bit harder about how you want your code to work. But telling
the compiler you want a tight infinite loop, and then trying to find
some way to break out of it, is definitely not the answer.
Could you graciously use a few more precious key-strokes to explain what
you mean by that cryptic comment?
>> In case 2, you could add some statement in the calling function after
>> the call to Spin, that is, make sure that the call to Spin is never a
>> tail call. On the other hand, since a tail call still leaves a valid
>> return address on the stack, your scheduler could use this return
>> address (the return address for the function that calls Spin). Then
>> you don't have to do anything, it should work even with a branch
>> instruction.
>>
>> Another possibility is to avoid the "High" optimization level of the
>> compiler. I did not find a specific explanation of the
>> tail-call-to-branch optimization in my copy of the compiler manual,
>> but the "High" level seems to have most of the inter-procedural
>> optimizations, of which this may be one. Try the "Medium" level for
>> the compilation of the calling functions, and hope that the compiler
>> does not do tail-call optimization at this level.
>
> volatile void Spin(void) {} ?
The IAR MSP430 C compiler reference guide explains "volatile" for
objects only, it does not give "volatile" any meaning for functions.
Interestingly, it accepts the above "volatile" function declaration
without complaint. What do you suppose "volatile" should do, here?
I experimented a bit with the IAR MSP430 compiler (current "kickstart"
version), and it uses call instructions to call a non-returning function
containing only an eternal for-loop, even if the function is presented
in the same source-code file as the call. If the function is marked with
the __noreturn keyword the compiler will use a branch or jump
instruction, though. (I assume that the OP has not marked Spin with
__noreturn.)
So it seems my suggested reason 1 is not the true explanation.
>> 2. If the call to Spin is the last statement in the calling function
>> (a "tail call"), the compiler understands that the call does not have
>> to push a return address, because Spin will return (assuming it would
>> return) to the end of the calling function, which immediately returns
>> to *its* caller. The branch instruction leaves the calling function's
>> return address on the stack, so when Spin returns (assuming it could
>> return) it will take a short-cut and return to the caller of the
>> calling function. This optimization saves time and stack space.
In my small experiments, the IAR compiler does code a tail call to Spin
using a branch or jump instruction, instead of a call. So reason 2 is a
possible explanation for the OP's observation. Interestingly, this
happens even if the optimization level is set to "None", so this advice
of mine:
>> Another possibility is to avoid the "High" optimization level of the
>> compiler.
does not work.
David Brown wrote:
> Your diagnosis of the problem is fair enough, but your workarounds are,
> IMHO, totally wrong. Anything that involves trying to trick or cripple
> the compiler (separate compiled files, disabling optimisations, fake
> extra inline assembly, gratuitous function pointer usage, etc.) is at
> best an ugly hack, and at worst a maintenance nightmare. Remember, the
> compiler is free to work around all these workarounds - lying to your
> tools is a bad idea.
In general I agree with you, David, but the OP is trying to run C code
under a custom scheduler, apparently in some kind of simple
multi-threading or coroutine style. This is out of scope for the C
language, so the operation of the scheduler will involve some things
that the compiler does not know about -- and should not (have to) know
about. The scheduler/kernel routines should follow the C compiler's
calling protocols, but will themselves do things that exceed C's semantics.
Of course, the person writing the scheduler should know all about the C
compiler's calling protocols and run-time system so that the scheduler
can save and restore thread contexts properly.
The Spin function seems intended to be part of the application/scheduler
interface; an application task calls it when it has finished its job and
yields to the scheduler. Writing this "yield" routine as an eternal loop
is unusual, but can be OK for a custom kernel. In a more conventional
kernel, the application would call a kernel "yield" or "suspend_me"
function, the kernel would check if some other thread is ready to run,
and if not the kernel would stick in a loop, or schedule a looping "null
thread" that is always ready to run.
> The function is called by branch, not call, because it never returns.
That could be a reason, but I now doubt it for the IAR compiler -- see
my note on experiments above. The tail-call explanation is the more
likely one.
> That's what you (OP) wrote in the source code, so that's what the
> compiler does.
> If you want the function to return, you have to write
> code that allows the function to return. In particular, you need to
> have some way of exiting the spin, otherwise it is useless.
As I understand it, the OP's scheduler (most likely running in an
interrupt handler) will break out of the "eternal" loop by popping the
return address from the stack into the PC, forcing a return from Spin.
This is legal MSP430 code, but out of C semantics.
> If you can't see why you need something along these lines, you'll have
> to think a bit harder about how you want your code to work. But telling
> the compiler you want a tight infinite loop, and then trying to find
> some way to break out of it, is definitely not the answer.
Making Spin test a flag that the scheduler sets is a solution, but a
different solution.
It could be safer to write Spin in assembly language, to prevent the C
compiler gaining any false knowledge about its behaviour, such as "does
not return" knowledge. But if the OP knows that the C compiler does not
transport such knowledge across compilation units, writing Spin in C
(for separate compilation) is safe. Of course this has to be rechecked
for each new version of the compiler, so it is indeed a maintenance
burden, over and above the burden of checking for changes in the calling
protocols and run-time system structure, which a scheduler author has to
do for every compiler version anyway.
Summary: Tail call optimization is the likely cause of the compiler
using a branch instead of a call instruction. So:
- If the scheduler needs the return address (on the stack) only for
resuming execution at the code following the call to Spin, there is no
problem; the branch instruction leaves the return address of the calling
function on the stack, and the scheduler can resume execution at this
address.
- If the scheduler needs the return address to mark the location of the
call to Spin (but why?), there is a problem if the call happens through
a branch instruction, since the stacked return address then marks the
location of the call to the function that calls Spin (or even the call
to some even higher-level function, if there is more than one tail call
at the end of the call path). In this case, and as there seems to be no
way to disable the tail-call optimization in the IAR compiler, the only
option is to make sure that no call to Spin is a tail-call. Or use some
other kind of Spin, for example following David's suggestions.
It's always fun to test and compare compilers. The stable version of
gcc for the msp430 is an older version - 3.2.3 (with 4.x under
development). It always "calls" the function even when it knows it is
non-returning, and there is a "ret" after the call (and after the
infinite loop). Newer gcc versions give tighter code (testing with
avr-gcc 4.3.2) - a function calling Spin() inlines the infinite loop
into caller. There are no jumps, calls, or returns.
The point here is that such details vary from compiler to compiler, and
from version to version. The compiler will do exactly what you tell it,
but you can't rely on it using a particular method to implement a
particular construct.
>>> 2. If the call to Spin is the last statement in the calling function
>>> (a "tail call"), the compiler understands that the call does not have
>>> to push a return address, because Spin will return (assuming it would
>>> return) to the end of the calling function, which immediately returns
>>> to *its* caller. The branch instruction leaves the calling function's
>>> return address on the stack, so when Spin returns (assuming it could
>>> return) it will take a short-cut and return to the caller of the
>>> calling function. This optimization saves time and stack space.
>
> In my small experiments, the IAR compiler does code a tail call to Spin
> using a branch or jump instruction, instead of a call. So reason 2 is a
> possible explanation for the OP's observation. Interestingly, this
> happens even if the optimization level is set to "None", so this advice
> of mine:
>
>>> Another possibility is to avoid the "High" optimization level of the
>>> compiler.
>
> does not work.
>
Optimisation levels are never more than a hint to the compiler. You are
just making a suggestion as to how it should balance compile time, ease
of debugging, and size and speed of the generated code. Optimisation
flags are never demands, and the compiler is free to apply all its
optimisations at any level (though obviously it is more user-friendly to
have some correlation). Code that is dependent on the optimisation
level for correctness is broken code. (Obviously it can be dependent on
the optimisation level for size and speed requirements.)
> David Brown wrote:
>
>> Your diagnosis of the problem is fair enough, but your workarounds
>> are, IMHO, totally wrong. Anything that involves trying to trick or
>> cripple the compiler (separate compiled files, disabling
>> optimisations, fake extra inline assembly, gratuitous function pointer
>> usage, etc.) is at best an ugly hack, and at worst a maintenance
>> nightmare. Remember, the compiler is free to work around all these
>> workarounds - lying to your tools is a bad idea.
>
> In general I agree with you, David, but the OP is trying to run C code
> under a custom scheduler, apparently in some kind of simple
> multi-threading or coroutine style. This is out of scope for the C
> language, so the operation of the scheduler will involve some things
> that the compiler does not know about -- and should not (have to) know
> about. The scheduler/kernel routines should follow the C compiler's
> calling protocols, but will themselves do things that exceed C's semantics.
>
No, the scheduler/kernel should /not/ rely on the compiler's calling
protocols. The compiler can change these as it wants, and mix them for
different functions. If the scheduler depends on the compiler using
particular instructions to call a function, the scheduler is broken - a
pre-emptive scheduler can assume /nothing/ about the code it is pre-empting.
If you have a scheduler that for some reason needs a way to get a
function's return address, then it needs to use a compiler-specific
feature such as gcc's "__builtin_return_address()" function. If the
compiler doesn't have such a feature, then you are out of luck. Get a
different compiler, or write a scheduler that doesn't depend on knowing
the return address.
Under no circumstances is it correct to tell the compiler you have an
infinite loop, and then complain because you can't see how to break out
of it.
> Of course, the person writing the scheduler should know all about the C
> compiler's calling protocols and run-time system so that the scheduler
> can save and restore thread contexts properly.
>
> The Spin function seems intended to be part of the application/scheduler
> interface; an application task calls it when it has finished its job and
> yields to the scheduler. Writing this "yield" routine as an eternal loop
> is unusual, but can be OK for a custom kernel. In a more conventional
It is not "unusual", it is "wrong".
There is no point in trying to help the OP find some workaround to get
this system to compile - he must fix the code.
> kernel, the application would call a kernel "yield" or "suspend_me"
> function, the kernel would check if some other thread is ready to run,
> and if not the kernel would stick in a loop, or schedule a looping "null
> thread" that is always ready to run.
>
Exactly. When a task has finished, control must be returned to the
scheduler, either by calling a "yield" function, or by returning to its
caller (the kernel). You could, I suppose, end a task in an infinite
loop and rely on the pre-empter to make sure other tasks get processor
time. But you certainly wouldn't expect that thread to ever leave the
infinite loop - that's why it's called an "infinite loop".
>> The function is called by branch, not call, because it never returns.
>
> That could be a reason, but I now doubt it for the IAR compiler -- see
> my note on experiments above. The tail-call explanation is the more
> likely one.
>
>> That's what you (OP) wrote in the source code, so that's what the
>> compiler does.
>> If you want the function to return, you have to write code that allows
>> the function to return. In particular, you need to have some way of
>> exiting the spin, otherwise it is useless.
>
> As I understand it, the OP's scheduler (most likely running in an
> interrupt handler) will break out of the "eternal" loop by popping the
> return address from the stack into the PC, forcing a return from Spin.
> This is legal MSP430 code, but out of C semantics.
>
If the OP wants to write such brain-dead code in some sort of non-C,
that's up to him - but he should not expect to use a C compiler to
achieve it.
>> If you can't see why you need something along these lines, you'll have
>> to think a bit harder about how you want your code to work. But
>> telling the compiler you want a tight infinite loop, and then trying
>> to find some way to break out of it, is definitely not the answer.
>
> Making Spin test a flag that the scheduler sets is a solution, but a
> different solution.
>
That's /almost/ correct. Making Spin test a flag /is/ a solution. But
it's not a "different solution", because he doesn't have a solution at
the moment - his scheduler concept /cannot/ be made to work the way he
thinks.
An infinite loop is a dead end to the thread that hits it - no exits, no
escapes, no returns. It's dead. The end.
Rather than trying to play Dr. Frankenstein, the OP should re-think the
way his scheduler should work, and what Spin() should actually do. In
particular, if he wants the function to be able to return, he must give
it a way to return.
> It could be safer to write Spin in assembly language, to prevent the C
> compiler gaining any false knowledge about its behaviour, such as "does
> not return" knowledge.
Rubbish. Fake assembly to lie to the compiler is not the answer.
> But if the OP knows that the C compiler does not
> transport such knowledge across compilation units, writing Spin in C
> (for separate compilation) is safe. Of course this has to be rechecked
Dangerous rubbish. Code the relies on separate compilation is as broken
as code that relies on hobbling the optimiser. You don't have that
choice - the compiler can transport anything it wants across compilation
units, and you can't choose to stop that.
> for each new version of the compiler, so it is indeed a maintenance
> burden, over and above the burden of checking for changes in the calling
> protocols and run-time system structure, which a scheduler author has to
> do for every compiler version anyway.
>
> Summary: Tail call optimization is the likely cause of the compiler
> using a branch instead of a call instruction. So:
>
Real summary:
The original idea is /wrong/. An infinite loop has no exit and no
return. If the function Spin() needs to exit, it should have an exit.
Write code that says what you want it to do, don't write something
totally different and rely on layers of workarounds, compiler-specific
hacks, assembly tricks and other nonsense.
>>>> brOS wrote:
>>>>>> On Mon, 23 Nov 2009 14:19:14 -0600
>>>>>> "brOS" <bogdanr...@gmail.com> wrote:
>>>>>>
>>>>>>> Dear all,
>>>>>>>
>>>>>>> Does anybody knows how to force compiler to use call instruction
>>>>>>> instead of br(branch)for disassembling function call?
>>>>>>> It is extremely important for me to specific function is
>>>>>>> disassembled
>>>>>>> using call instead of brunch, as compiler always does.
>>>>>>>
>>>> ...
>>>>>>
>>>>> This is why i need it....
>>>>> Function I'm calling have looks something like this:
>>>>> void Spin(void){
>>>>> for(;;){}
>>>>> }
>>>>> So if it is disassembled with call before entering in pc will be
>>>>> saved on
>>>>> stack and it will point to instruction after function spin....So I
>>>>> want to
>>>>> use that pc and to save context so when my scheduler schedule that
>>>>> task
>>>>> again it will not continue spinning in that forever loop but it
>>>>> will jump
>>>>> to next instruction after Spin function.....
>>>>> branch doesn t push pc to stack so taht s my problem;)
> Niklas Holsti wrote:
>> ... The scheduler/kernel routines should follow the C compiler's
>> calling protocols, but will themselves do things that exceed C's
>> semantics.
David Brown wrote:
>
> No, the scheduler/kernel should /not/ rely on the compiler's calling
> protocols.
I didn't say "rely"-- I said "follow". If the application calls a kernel
routine, it will use the compiler's calling protocols that, for example,
say which registers must be preserved, and which can be overwritten. The
kernel routine should follow these rules, but is certainly allowed to
change the values of the overwritable registers, for example. (Note, I
am not talking about *pre-emption* here, nor was the OP, I believe.)
> The compiler can change these as it wants, and mix them for
> different functions. If the scheduler depends on the compiler using
> particular instructions to call a function,
The question here is not really about particular instructions, but about
the state in which the Spin routine is entered, specifically whether
there is a usable return address on the stack. The presence of a return
address on the stack must be defined in the compiler's calling protocol
if the compiler is meant to be able to interface to assembly-language
routines or generally "foreign" routines.
> the scheduler is broken - a
> pre-emptive scheduler can assume /nothing/ about the code it is
> pre-empting.
In principle true -- for a preemptive scheduler. (The OP is most likely
not making a pre-emptive scheduler, however.) But in practice a
pre-emptive scheduler must sometimes know about the run-time
architecture of the pre-empted software. For example, some small systems
use statically allocated memory for thread-specific data, such as
additional working "registers" for floating-point libraries. A
pre-emptive kernel has to know about such things in order to save and
restore context. The alternative is to disable preemption while a thread
uses such software-defined shared resources; the choice is a trade-off
between latency and context-switching overhead.
But that is veering off-topic, I think.
> If you have a scheduler that for some reason needs a way to get a
> function's return address, then it needs to use a compiler-specific
> feature such as gcc's "__builtin_return_address()" function. If the
> compiler doesn't have such a feature, then you are out of luck. Get a
> different compiler, or write a scheduler that doesn't depend on knowing
> the return address.
Not very helpful to the OP. But "tough love", perhaps :-)
> Under no circumstances is it correct to tell the compiler you have an
> infinite loop, and then complain because you can't see how to break out
> of it.
Who was complaining? The OP seems to know perfectly well how to break
out of this loop by changing the PC in the scheduler (when the looping
code is interrupted).
> There is no point in trying to help the OP find some workaround to get
> this system to compile - he must fix the code.
Eh? The system compiles. And can work, if the compiler's use of a branch
instruction instead of a call instruction is only due to tail-call
optimization, and there is always a return address on the stack.
>> ... the application would call a kernel "yield" or "suspend_me"
>> function, the kernel would check if some other thread is ready to run,
>> and if not the kernel would stick in a loop, or schedule a looping
>> "null thread" that is always ready to run.
>>
>
> Exactly. When a task has finished, control must be returned to the
> scheduler, either by calling a "yield" function, or by returning to its
> caller (the kernel). You could, I suppose, end a task in an infinite
> loop and rely on the pre-empter to make sure other tasks get processor
> time. But you certainly wouldn't expect that thread to ever leave the
> infinite loop - that's why it's called an "infinite loop".
It is not uncommon for kernels to (internally) use an eternal loop
("lab: jump lab") to wait for the next interrupt that creates some work
to do, as the OP does in Spin. Yes, the loop is syntactically
eternal/infinite, but in the presence of interrupts it can be terminated.
>> As I understand it, the OP's scheduler (most likely running in an
>> interrupt handler) will break out of the "eternal" loop by popping the
>> return address from the stack into the PC, forcing a return from Spin.
>> This is legal MSP430 code, but out of C semantics.
>>
>
> If the OP wants to write such brain-dead code in some sort of non-C,
> that's up to him - but he should not expect to use a C compiler to
> achieve it.
The OP is combining C semantics -- the loop is eternal -- with interrupt
semantics -- the loop can be broken. This approach is normal for writing
kernels and schedulers, but of course has its pitfalls.
I agree that it would be cleaner to write the non-C-semantics code, such
as Spin, in assembly language.
>> Making Spin test a flag that the scheduler sets is a solution, but a
>> different solution.
>
> That's /almost/ correct. Making Spin test a flag /is/ a solution. But
> it's not a "different solution", because he doesn't have a solution at
> the moment - his scheduler concept /cannot/ be made to work the way he
> thinks.
Sure it can -- that is, an interrupt handler can break the Spin loop and
resume execution at the point after the Spin call, as long as there is a
return address.
>> It could be safer to write Spin in assembly language, to prevent the C
>> compiler gaining any false knowledge about its behaviour, such as
>> "does not return" knowledge.
>
> Rubbish. Fake assembly to lie to the compiler is not the answer.
There is nothing "fake" about this. A kernel/scheduler (especially if
pre-emptive) has to go beyond C semantics. Using assembly language is
the normal way to do this. And using the return address is the normal
way for a kernel to save the PC of a thread, when a kernel routine
suspends the thread.
> The original idea is /wrong/. An infinite loop has no exit and no
> return.
The loop in Spin can be terminated by a scheduler using PC
manipulations, as in a typical scheduler. Nothing wrong about that,
although it is risky to write it in C, for the reason that we agree on:
the C compiler will only see the C semantics, and may use them in ways
that cause problems for this idea.
The OP has given us very little information to go on - a lot of what we
both are writing about is speculation (and I am just as likely to guess
incorrectly as you). However, since an infinite loop can clearly never
be broken without pre-emption, I am assuming he /does/ want pre-emption.
Certainly the kernel should follow the compiler's conventions for
function calling - it should, as far as practically possible, be written
in C, and thus calling conventions follow automatically. I
misinterpreted your post - I thought you meant the kernel could assume
that the code it is scheduling always follows the compiler's conventions.
>> The compiler can change these as it wants, and mix them for different
>> functions. If the scheduler depends on the compiler using particular
>> instructions to call a function,
>
> The question here is not really about particular instructions, but about
> the state in which the Spin routine is entered, specifically whether
> there is a usable return address on the stack. The presence of a return
> address on the stack must be defined in the compiler's calling protocol
> if the compiler is meant to be able to interface to assembly-language
> routines or generally "foreign" routines.
>
That is only true at the points at which it actually /is/ interfaced to
"foreign" code. When a C function calls another C function, the
compiler can use or abuse whatever calling convention it likes at the
time. Good compilers can and will do all sorts of re-arrangements to
get better code, including inlining code bodies, changing register
usage, or using a "branch" instead of a "call" when the called function
cannot return. Nothing you can do with compiler flags, separate
compilation, or other tricks can change that in a reliable way.
>> the scheduler is broken - a pre-emptive scheduler can assume /nothing/
>> about the code it is pre-empting.
>
> In principle true -- for a preemptive scheduler. (The OP is most likely
> not making a pre-emptive scheduler, however.) But in practice a
The code is just as broken for a co-operative scheduler. As you have
said yourself, when a task wants to release the processor it should call
the kernel scheduler.
I don't know how much you have worked on schedulers, but I get the
impression you know what you are doing and could write one perfectly
well. You would solve the same sorts of problems in a similar way to
the way I or most other scheduler writers would. So I don't really want
to sound like I am trying to teach you something you already know about.
But I just cannot comprehend why you are defending the OP's bad design,
and trying to find ways to jam that square peg into a round hole.
You know as well as I do that writing a tight infinite loop, and then
trying to find some way to go around the compiler to break out of the
loop, is bad design from step 1. Everything else in this thread is of
minor relevance (though interesting).
> pre-emptive scheduler must sometimes know about the run-time
> architecture of the pre-empted software. For example, some small systems
> use statically allocated memory for thread-specific data, such as
> additional working "registers" for floating-point libraries. A
> pre-emptive kernel has to know about such things in order to save and
> restore context. The alternative is to disable preemption while a thread
> uses such software-defined shared resources; the choice is a trade-off
> between latency and context-switching overhead.
>
That is true enough. In such a situation, the OS must know whether
these additional "registers" (or for some devices, they are real
registers) must be preserved and restored. In "normal" embedded code
the same situation turns up with interrupts. For example, when using
the embedded multiplier on the msp430 you must disable interrupts or be
sure that the interrupt routines don't use the multiplier.
> But that is veering off-topic, I think.
>
Only a little :-)
>> If you have a scheduler that for some reason needs a way to get a
>> function's return address, then it needs to use a compiler-specific
>> feature such as gcc's "__builtin_return_address()" function. If the
>> compiler doesn't have such a feature, then you are out of luck. Get a
>> different compiler, or write a scheduler that doesn't depend on
>> knowing the return address.
>
> Not very helpful to the OP. But "tough love", perhaps :-)
>
That is, IMHO, what the OP needs here. Any advice he gets that help him
continue down his original path is false help.
>> Under no circumstances is it correct to tell the compiler you have an
>> infinite loop, and then complain because you can't see how to break
>> out of it.
>
> Who was complaining? The OP seems to know perfectly well how to break
> out of this loop by changing the PC in the scheduler (when the looping
> code is interrupted).
>
He is complaining because although he knows he has to change the PC, he
doesn't know what new value to use.
>> There is no point in trying to help the OP find some workaround to get
>> this system to compile - he must fix the code.
>
> Eh? The system compiles. And can work, if the compiler's use of a branch
> instruction instead of a call instruction is only due to tail-call
> optimization, and there is always a return address on the stack.
>
A system can work (assuming for a moment that it can be made to work),
and yet still be so badly designed and fragile that it is "broken".
>>> ... the application would call a kernel "yield" or "suspend_me"
>>> function, the kernel would check if some other thread is ready to
>>> run, and if not the kernel would stick in a loop, or schedule a
>>> looping "null thread" that is always ready to run.
>>>
>>
>> Exactly. When a task has finished, control must be returned to the
>> scheduler, either by calling a "yield" function, or by returning to
>> its caller (the kernel). You could, I suppose, end a task in an
>> infinite loop and rely on the pre-empter to make sure other tasks get
>> processor time. But you certainly wouldn't expect that thread to ever
>> leave the infinite loop - that's why it's called an "infinite loop".
>
> It is not uncommon for kernels to (internally) use an eternal loop
> ("lab: jump lab") to wait for the next interrupt that creates some work
> to do, as the OP does in Spin. Yes, the loop is syntactically
> eternal/infinite, but in the presence of interrupts it can be terminated.
>
It is certainly possible to have such an infinite loop in the kernel -
but only as an idle function for when the processor is doing nothing.
The thread is never expected to continue beyond the loop, or return from
it in any way.
>>> As I understand it, the OP's scheduler (most likely running in an
>>> interrupt handler) will break out of the "eternal" loop by popping
>>> the return address from the stack into the PC, forcing a return from
>>> Spin. This is legal MSP430 code, but out of C semantics.
>>>
>>
>> If the OP wants to write such brain-dead code in some sort of non-C,
>> that's up to him - but he should not expect to use a C compiler to
>> achieve it.
>
> The OP is combining C semantics -- the loop is eternal -- with interrupt
> semantics -- the loop can be broken. This approach is normal for writing
> kernels and schedulers, but of course has its pitfalls.
>
It is perfectly common and reasonable to have almost-infinite loops. An
obvious example is a real spin lock, as implemented in real working
schedulers - you have a tight loop that checks for an external event
(such as a flag set within an interrupt routine or another task), and
exits the loop when the flag is set. But the critical point here is
that the loop has an exit clause. If you want to write a loop that will
be exited, you write a loop with an exit clause.
> I agree that it would be cleaner to write the non-C-semantics code, such
> as Spin, in assembly language.
>
I am /not/ saying this sort of code should be written in assembly - I am
saying it should not be written at all! It can never be "clean" code.
But if it is written in assembly, then at least you are giving the tools
no useful information, instead of directly lying to them.
>>> Making Spin test a flag that the scheduler sets is a solution, but a
>>> different solution.
>>
>> That's /almost/ correct. Making Spin test a flag /is/ a solution.
>> But it's not a "different solution", because he doesn't have a
>> solution at the moment - his scheduler concept /cannot/ be made to
>> work the way he thinks.
>
> Sure it can -- that is, an interrupt handler can break the Spin loop and
> resume execution at the point after the Spin call, as long as there is a
> return address.
>
How is this in any way "better" than having Spin loop until a flag is
set, and have the interrupt handler set that flag? Doing it the right
way is entirely standard C, is far easier, far safer, far more portable,
far more maintainable, and is smaller and faster than any sort of hack
you might conceivably get working.
>>> It could be safer to write Spin in assembly language, to prevent the
>>> C compiler gaining any false knowledge about its behaviour, such as
>>> "does not return" knowledge.
>>
>> Rubbish. Fake assembly to lie to the compiler is not the answer.
>
> There is nothing "fake" about this. A kernel/scheduler (especially if
> pre-emptive) has to go beyond C semantics. Using assembly language is
> the normal way to do this. And using the return address is the normal
> way for a kernel to save the PC of a thread, when a kernel routine
> suspends the thread.
>
Using assembly language where assembly language is needed is absolutely
fine - and a pre-emptive scheduler is always going to need some assembly
language. But using assembly language to try to force the compiler not
to optimise some code is almost always bad design.
And the scheduler gets the PC of a thread by looking at the return
address for the interrupt routine, not by trying to dig down the stack
and guess the return address for the current function in the interrupted
thread.
>> The original idea is /wrong/. An infinite loop has no exit and no
>> return.
>
> The loop in Spin can be terminated by a scheduler using PC
> manipulations, as in a typical scheduler. Nothing wrong about that,
> although it is risky to write it in C, for the reason that we agree on:
> the C compiler will only see the C semantics, and may use them in ways
> that cause problems for this idea.
>
One thing we haven't really discussed here is how the interrupt routine
/ scheduler knows that the thread is in the Spin function. Is it going
to take the real thread PC (from the interrupt routine's return stack)
and compare it to the address of the Spin function to determine if the
thread is current at the "lab: jump lab" instruction? If it is there,
then it will look deeper in the stack for the previous return address,
and return to that point. If not, then the thread is somewhere else and
the interrupt routine (or the scheduler) must return there.
While such a scheme may theoretically be made to work, it is needlessly
complicated, very fragile and highly dependent on getting the code
compiled in exactly the right way, and hopelessly restrictive and
inflexible.
Maybe there is something here that I'm missing - perhaps the OP will
come back to us with some more information.
Yes.
I wrote my answer assuming that the OP knows what he or she is doing but
was concerned that the branch instruction might not leave a good return
address on the stack.
> However, since an infinite loop can clearly never
> be broken without pre-emption, I am assuming he /does/ want pre-emption.
I would not call it pre-emption, but interruption. To me, pre-emption
means suspending a task at some arbitrary point in its execution and
switching control to another task. In the OP's code, the Spin function
seems to be the expected place for suspending and resuming the task, so
the task is prepared for it, at that point. This looks like co-operative
multi-tasking.
My guess about the OP's design was that the Spin function would be used
for consuming the rest of a thread's time-slice when the thread has
finished its current job, and that the OP would not try to schedule
another ready thread to use this (slack) time, perhaps in order to have
deterministic time-triggered behaviour, or perhaps to avoid pre-emptions.
> I misinterpreted your post - I thought you meant the kernel could assume
> that the code it is scheduling always follows the compiler's conventions.
I agree completely that a pre-emptive kernel cannot assume that. (Well,
there may be *some* conventions that always hold, for example relating
to the stack pointer. But all conventions known to hold at a "foreign"
call are generally not true at arbitrary points.)
> That is only true at the points at which it actually /is/ interfaced to
> "foreign" code. When a C function calls another C function, the
> compiler can use or abuse whatever calling convention it likes at the
> time.
Agreed. For most embedded compilers, though, anything in a separate
compilation is considered "foreign". But as you say, there is no
guarantee in general.
> The code is just as broken for a co-operative scheduler. As you have
> said yourself, when a task wants to release the processor it should call
> the kernel scheduler.
In my guess as to what the OP is doing, the call to Spin *is* this call,
which would make the OP's kernel a rather special one. On the other
hand, perhaps I mis-guessed, and the call to Spin happens *within* the
OP's kernel, after the kernel has done the more normal things such as
looking for other ready tasks.
> I don't know how much you have worked on schedulers, but I get the
> impression you know what you are doing and could write one perfectly
> well.
Thanks. I've written a couple of simple, co-operative ones, a while ago,
for obsolete processors, and studied a few other, current ones from the
point of view of static WCET analysis.
> But I just cannot comprehend why you are defending the OP's bad design,
> and trying to find ways to jam that square peg into a round hole.
>
> You know as well as I do that writing a tight infinite loop, and then
> trying to find some way to go around the compiler to break out of the
> loop, is bad design from step 1. Everything else in this thread is of
> minor relevance (though interesting).
I'm not so ready to call this "bad design" without knowing more about
the OP's requirements and design. The code generated for Spin is exactly
the kind of tight eternal loop that you often find in a kernel where the
kernel has no ready tasks and waits for an interrupt. I haven't tried
it, but it seems to me that writing this loop as a conditional,
flag-checking one could increase (by a little) the latency for resuming
the right task when an interrupt happens, compared to resuming the task
directly from the interrupt handler and simply abandoning the tight loop.
It may be bad practice to rely on the C compiler to generate this code,
and perhaps I should have said so in my original reply to the OP. It has
been said now, good.
David Brown wrote:
>>> Under no circumstances is it correct to tell the compiler you have an
>>> infinite loop, and then complain because you can't see how to break
>>> out of it.
Niklas Holsti replied:
>> Who was complaining? The OP seems to know perfectly well how to break
>> out of this loop by changing the PC in the scheduler (when the looping
>> code is interrupted).
David Brown replied:
> He is complaining because although he knows he has to change the PC, he
> doesn't know what new value to use.
Because the OP thought that a branch instruction would not leave a
return address on the stack. But if the branch instruction implements a
tail call, it does leave a return address (although for an outer call).
> How is this in any way "better" than having Spin loop until a flag is
> set, and have the interrupt handler set that flag?
See my comment on latency, above. But of course this is again a guess as
to why the OP is doing it this way.
> Using assembly language where assembly language is needed is absolutely
> fine - and a pre-emptive scheduler is always going to need some assembly
> language. But using assembly language to try to force the compiler not
> to optimise some code is almost always bad design.
Writing a function in assembly language (and not, of course, as "in-line
assembly code" in a C file) is a pretty sure way of making the C
compiler treat is as a "foreign" function and so ensure that calls use
the standard conventions, including pushing a return address.
> And the scheduler gets the PC of a thread by looking at the return
> address for the interrupt routine,
Or the return address of the call from the thread to the kernel
function, which is the case for Spin (I guess).
> One thing we haven't really discussed here is how the interrupt routine
> / scheduler knows that the thread is in the Spin function. Is it going
> to take the real thread PC (from the interrupt routine's return stack)
> and compare it to the address of the Spin function to determine if the
> thread is current at the "lab: jump lab" instruction? If it is there,
> then it will look deeper in the stack for the previous return address,
> and return to that point.
That is (also) my guess of what the OP is trying to do.
> If not, then the thread is somewhere else and
> the interrupt routine (or the scheduler) must return there.
Maybe not. In my guess of the OP's design, if the thread is not in Spin
when the interrupt happens, the thread has exceeded its time-slice. I
don't of course know what the OP intends the kernel/scheduler to do, in
that case; perhaps log a fatal error and reboot. Another choice is to
set an error flag and let the thread continue until the next tick, when
it is checked again.
> Maybe there is something here that I'm missing - perhaps the OP will
> come back to us with some more information.
That would be good.
> B) I think the world would be a generally happier place if more
> processors had a dedicated brunch instruction. Â I figure that properly
> implemented it ought to take a good hour and a half to return, and then
> come back with the stack smelling of coffee and bacon.
*PROPERLY* implemented it should divert to the nearest pub and not
return until the keg is dry.
I don't think "the OP knows what he or she is doing" is a fair
assumption, based on the posted code for Spin() !
>> However, since an infinite loop can clearly never be broken without
>> pre-emption, I am assuming he /does/ want pre-emption.
>
> I would not call it pre-emption, but interruption. To me, pre-emption
> means suspending a task at some arbitrary point in its execution and
> switching control to another task. In the OP's code, the Spin function
> seems to be the expected place for suspending and resuming the task, so
> the task is prepared for it, at that point. This looks like co-operative
> multi-tasking.
>
It's possible (or maybe even likely) that the OP is /trying/ to
implement a co-operative scheduler. But it doesn't actually co-operate
- an eternal loop is not co-operative, even if it you cheat and break
out using interrupts. Interrupts are inherently asynchronous - if the
thread can be suspended by an interrupt function, that is pre-emptive
multitasking.
> My guess about the OP's design was that the Spin function would be used
> for consuming the rest of a thread's time-slice when the thread has
> finished its current job, and that the OP would not try to schedule
> another ready thread to use this (slack) time, perhaps in order to have
> deterministic time-triggered behaviour, or perhaps to avoid pre-emptions.
>
That could well be the intention. But spinning like that is a silly
idea, and even if he wants to do what you suggest here, the
implementation is totally wrong. The interrupt should set a flag, and
the spin lock should block waiting for the flag.
>> I misinterpreted your post - I thought you meant the kernel could
>> assume that the code it is scheduling always follows the compiler's
>> conventions.
>
> I agree completely that a pre-emptive kernel cannot assume that. (Well,
> there may be *some* conventions that always hold, for example relating
> to the stack pointer. But all conventions known to hold at a "foreign"
> call are generally not true at arbitrary points.)
>
>> That is only true at the points at which it actually /is/ interfaced
>> to "foreign" code. When a C function calls another C function, the
>> compiler can use or abuse whatever calling convention it likes at the
>> time.
>
> Agreed. For most embedded compilers, though, anything in a separate
> compilation is considered "foreign". But as you say, there is no
> guarantee in general.
>
These days, full program optimisation is not uncommon. Even gcc
(despite its critics' opinions) can do reasonable full program
optimisation by compiling all the C modules in one shot.
>> The code is just as broken for a co-operative scheduler. As you have
>> said yourself, when a task wants to release the processor it should
>> call the kernel scheduler.
>
> In my guess as to what the OP is doing, the call to Spin *is* this call,
> which would make the OP's kernel a rather special one. On the other
> hand, perhaps I mis-guessed, and the call to Spin happens *within* the
> OP's kernel, after the kernel has done the more normal things such as
> looking for other ready tasks.
>
>> I don't know how much you have worked on schedulers, but I get the
>> impression you know what you are doing and could write one perfectly
>> well.
>
> Thanks. I've written a couple of simple, co-operative ones, a while ago,
> for obsolete processors, and studied a few other, current ones from the
> point of view of static WCET analysis.
>
I think most of our apparent disagreements have the basis in different
guesses as to what we think the OP is trying to do.
Hopefully the OP is still reading the thread, and will take some
inspiration from our discussion!
>> But I just cannot comprehend why you are defending the OP's bad
>> design, and trying to find ways to jam that square peg into a round hole.
>>
>> You know as well as I do that writing a tight infinite loop, and then
>> trying to find some way to go around the compiler to break out of the
>> loop, is bad design from step 1. Everything else in this thread is of
>> minor relevance (though interesting).
>
> I'm not so ready to call this "bad design" without knowing more about
> the OP's requirements and design. The code generated for Spin is exactly
> the kind of tight eternal loop that you often find in a kernel where the
> kernel has no ready tasks and waits for an interrupt. I haven't tried
> it, but it seems to me that writing this loop as a conditional,
> flag-checking one could increase (by a little) the latency for resuming
> the right task when an interrupt happens, compared to resuming the task
> directly from the interrupt handler and simply abandoning the tight loop.
>
Nah, the loop overhead to continually read a flag would be a few cycles
at most. The interrupt function overhead to figure out return addresses
from the stack will be much, much worse.
When I see someone write one thing, and mean another, I see a mistake.
When the author knows what he has written and is wants to find some way
to work around this difference rather than correcting the code, I see a
bad design. Maybe I'm just less tolerant than you.
That is true, but my point is that you should not use assembly like this
just to "get around" the compiler - not without very good reasons. I've
often seen people use assembly code to try to force the compiler to act
in some way, when they could have done much better while staying within C.
>> And the scheduler gets the PC of a thread by looking at the return
>> address for the interrupt routine,
>
> Or the return address of the call from the thread to the kernel
> function, which is the case for Spin (I guess).
>
>> One thing we haven't really discussed here is how the interrupt
>> routine / scheduler knows that the thread is in the Spin function. Is
>> it going to take the real thread PC (from the interrupt routine's
>> return stack) and compare it to the address of the Spin function to
>> determine if the thread is current at the "lab: jump lab"
>> instruction? If it is there, then it will look deeper in the stack
>> for the previous return address, and return to that point.
>
> That is (also) my guess of what the OP is trying to do.
>
>> If not, then the thread is somewhere else and the interrupt routine
>> (or the scheduler) must return there.
>
> Maybe not. In my guess of the OP's design, if the thread is not in Spin
> when the interrupt happens, the thread has exceeded its time-slice. I
> don't of course know what the OP intends the kernel/scheduler to do, in
> that case; perhaps log a fatal error and reboot. Another choice is to
> set an error flag and let the thread continue until the next tick, when
> it is checked again.
>
Your guesses as to the OP's ideas make a certain sense - perhaps he is
trying to implement a sort of fixed time-slice scheduler. The
implementation of Spin() is still wrong (you'll never convince me
otherwise!), but that might bring us a little closer to helping him get
a working implementation.
Well, what constitutes "co-operation" may be a matter of precise
definition (in real life, sometimes of litigation :-). In my guess of
the OP's kernel/scheduler design, the suspension is designed to happen
only when the thread is looping in the Spin function. By calling Spin
the thread shows that it is ready to be suspended, so it is co-operating
in my view. (As discussed earlier, we don't know what happens if the
scheduler interrupt finds the thread is *not* in Spin.)
>> Agreed. For most embedded compilers, though, anything in a separate
>> compilation is considered "foreign". But as you say, there is no
>> guarantee in general.
>>
>
> These days, full program optimisation is not uncommon. Even gcc
> (despite its critics' opinions) can do reasonable full program
> optimisation by compiling all the C modules in one shot.
Sure, but in that case it would not be "separate compilation".
Interesting question, though: Is there a standard way in a C environment
to ensure that the standard calling sequence is used for an extern
function, with no C-calling-C optimizations?
>> I'm not so ready to call this "bad design" without knowing more about
>> the OP's requirements and design. The code generated for Spin is
>> exactly the kind of tight eternal loop that you often find in a kernel
>> where the kernel has no ready tasks and waits for an interrupt. I
>> haven't tried it, but it seems to me that writing this loop as a
>> conditional, flag-checking one could increase (by a little) the
>> latency for resuming the right task when an interrupt happens,
>> compared to resuming the task directly from the interrupt handler and
>> simply abandoning the tight loop.
>>
>
> Nah, the loop overhead to continually read a flag would be a few cycles
> at most. The interrupt function overhead to figure out return addresses
> from the stack will be much, much worse.
Let's consider what the kernel has to do, in my guess of the OP's
design, considering the two cases of (A) an unconditional "eternal" loop
and (B) a flag-checking loop.
The kernel knows which thread is running.
When the thread finishes its job in this time-slice, it calls Spin,
expecting to be resumed at the next instruction after the Spin call, say
instruction R.
The Spin function loops, eating up the rest of the time-slice.
The tick interrupt comes in.
The tick interrupt handler saves the context of the interrupted thread.
By comparing its PC to the address of the Spin loop, it can check that
the thread has not overrun its time-slice. At this point:
- For (A) the handler gets the return address of Spin
by a POP and stores this return address as the resumption
point for the thread to be suspended.
- For (B) the handler stores the interrupted PC (in
the flag-checking loop) as the resumption point.
The interrupt handler (scheduler) finds the thread to run in the next
time-slice. In case (B) it then sets the (thread-specific) flag on which
Spin is waiting. In case (A) it does not need to set any flag.
The handler restores the context of the new thread. As the last step in
this, it pushes the resumption address and the restored status register
and does return-from-interrupt (RETI).
In case (A), the thread is resumed immediately at the desired
instruction, the instruction R that follows the Spin call.
In case (B), the thread is resumed in the middle of the flag-checking
loop. It still has to read the flag, branch out of the loop, and execute
a return instruction (effectively a POP from the stack), before
instruction R is reached.
In summary, case (A) and case (B) both have to POP the stack to get to
instruction R, but case (B) also has to set a flag and check a flag. It
is a close call, but you might save some cycles in case (A). Morever, in
case (B) the flag has to be thread-specific, so it has to be passed to
Spin with a parameter, consuming more cycles.
Niklas Holsti wrote:
> In summary, case (A) and case (B) both have to POP the stack to get to
> instruction R, but case (B) also has to set a flag and check a flag.
Plus the flag has to be cleared at some point (being careful to avoid
race conditions).
> Morever, in
> case (B) the flag has to be thread-specific, so it has to be passed to
> Spin with a parameter, consuming more cycles.
>
... except if the flag is in a register that is cleared in Spin before
the loop, but set by the scheduler in the context that is restored
(except for this flag) when the Spin loop is resumed.
Niklas Holsti wrote:
> Morever, in case (B) the flag has to be thread-specific, so it has to
> be passed to Spin with a parameter, consuming more cycles.
In fact the flag can be global, not thread-specific, since only one
thread is resumed at a time. But since this thread *is* resumed, it is
certain to find the flag set, which goes to show that the flag is
redundant in this design, and the flagless unconditional loop in case
(A) makes more sense.
Fair enough.
What about gcc 4.5 with -flto ? Then you can compile C modules
separately into object files, but the object files hold a copy of the
internal trees as well as generated object code. When you link these
object files, the trees are used for link-time optimisation, including
inlining across modules. You lose all clarity in the definitions of
"compile", "link", and "separate compilation". But that is a
digression, especially since the msp430 gcc port is not (yet) updated to
gcc 4.5, which is itself not yet released.
> Interesting question, though: Is there a standard way in a C environment
> to ensure that the standard calling sequence is used for an extern
> function, with no C-calling-C optimizations?
>
I think the only way is by being sure that the compiler can't access the
code for a function declared as "extern". It should not be hard to do,
but you may have to do it explicitly. For example, if you use a
compiler's IDE and project manager, you might have to go out of your way
to force true separate compilation.
>>> I'm not so ready to call this "bad design" without knowing more about
>>> the OP's requirements and design. The code generated for Spin is
>>> exactly the kind of tight eternal loop that you often find in a
>>> kernel where the kernel has no ready tasks and waits for an
>>> interrupt. I haven't tried it, but it seems to me that writing this
>>> loop as a conditional, flag-checking one could increase (by a little)
>>> the latency for resuming the right task when an interrupt happens,
>>> compared to resuming the task directly from the interrupt handler and
>>> simply abandoning the tight loop.
>>>
>>
>> Nah, the loop overhead to continually read a flag would be a few
>> cycles at most. The interrupt function overhead to figure out return
>> addresses from the stack will be much, much worse.
>
> Let's consider what the kernel has to do, in my guess of the OP's
> design, considering the two cases of (A) an unconditional "eternal" loop
> and (B) a flag-checking loop.
>
> The kernel knows which thread is running.
>
> When the thread finishes its job in this time-slice, it calls Spin,
> expecting to be resumed at the next instruction after the Spin call, say
> instruction R.
>
> The Spin function loops, eating up the rest of the time-slice.
>
> The tick interrupt comes in.
>
> The tick interrupt handler saves the context of the interrupted thread.
> By comparing its PC to the address of the Spin loop, it can check that
> the thread has not overrun its time-slice. At this point:
>
Note that a sensible Spin function would tell the kernel that it is
finished and entering the spin loop, rather than leaving the interrupt
handler to figure it out in this fragile way.
> - For (A) the handler gets the return address of Spin
> by a POP and stores this return address as the resumption
> point for the thread to be suspended.
>
Assuming, of course, that you've figured out a way to do that safely and
reliably....
> - For (B) the handler stores the interrupted PC (in
> the flag-checking loop) as the resumption point.
>
This bit will typically require some assembly, compiler-specific
features, or some knowledge of the way the compiler generates interrupt
routines. But that's unavoidable when you have an interrupt-based
scheduler.
> The interrupt handler (scheduler) finds the thread to run in the next
> time-slice. In case (B) it then sets the (thread-specific) flag on which
> Spin is waiting. In case (A) it does not need to set any flag.
>
Fair enough, although setting a flag is exactly a hard job, and can be
done within standard C.
> The handler restores the context of the new thread. As the last step in
> this, it pushes the resumption address and the restored status register
> and does return-from-interrupt (RETI).
>
OK.
> In case (A), the thread is resumed immediately at the desired
> instruction, the instruction R that follows the Spin call.
>
Again assuming that is it is possible to figure out the address of R in
a reliable way...
> In case (B), the thread is resumed in the middle of the flag-checking
> loop. It still has to read the flag, branch out of the loop, and execute
> a return instruction (effectively a POP from the stack), before
> instruction R is reached.
>
Yes, you can expect it to take about 3 or 4 instructions before getting
to R. That would still be a lot less time than you spend messing around
getting the address of R in case (A), so case (B) wins here in time.
> In summary, case (A) and case (B) both have to POP the stack to get to
> instruction R, but case (B) also has to set a flag and check a flag. It
> is a close call, but you might save some cycles in case (A). Morever, in
> case (B) the flag has to be thread-specific, so it has to be passed to
> Spin with a parameter, consuming more cycles.
>
Remember that all ideas about how case (A) could feasibly be implemented
are based on hobbling the compiler. Write the code correctly (case B),
and you can let the optimiser do its job - that will overwhelm any
conceivable time advantage case A might have had. Among other things,
Spin() could be inlined in its calling function and remove most of the
overhead.
Even making the great leap of faith that there is a reliable way to get
the desired return address, and then making a second leap of faith that
case A is faster, the concept is /still/ wrong. There is no way that
shaving a few cycles off the latency could justify using this horrible
hack. If those cycles matter, you need a new design.
[ snip ]
Niklas Holsti wrote:
>>>> Agreed. For most embedded compilers, though, anything in a separate
>>>> compilation is considered "foreign". But as you say, there is no
>>>> guarantee in general.
David Brown wrote:
>>> These days, full program optimisation is not uncommon. Even gcc
>>> (despite its critics' opinions) can do reasonable full program
>>> optimisation by compiling all the C modules in one shot.
Niklas Holsti wrote:
>> Sure, but in that case it would not be "separate compilation".
And David Brown wrote:
> Fair enough.
>
> What about gcc 4.5 with -flto ? Then you can compile C modules
> separately into object files, but the object files hold a copy of the
> internal trees as well as generated object code. When you link these
> object files, the trees are used for link-time optimisation, including
> inlining across modules.
That is interesting info on gcc, and new to me (I don't follow gcc
development that closely). Thanks, David, this is definitely something
one needs to know about (future) gcc.
> You lose all clarity in the definitions of
> "compile", "link", and "separate compilation".
Yes indeed, at least for the purpose of ensuring a standard calling
sequence is used for a given function.
> But that is a
> digression, especially since the msp430 gcc port is not (yet) updated to
> gcc 4.5, which is itself not yet released.
And the OP specifically asked about the IAR MSP430 compiler, anyway.
>> Interesting question, though: Is there a standard way in a C
>> environment to ensure that the standard calling sequence is used for
>> an extern function, with no C-calling-C optimizations?
>>
>
> I think the only way is by being sure that the compiler can't access the
> code for a function declared as "extern". It should not be hard to do,
> but you may have to do it explicitly. For example, if you use a
> compiler's IDE and project manager, you might have to go out of your way
> to force true separate compilation.
I was afraid that would be the answer, as far as it goes. Yuck.
>>>> I'm not so ready to call this "bad design" without knowing more
>>>> about the OP's requirements and design. The code generated for Spin
>>>> is exactly the kind of tight eternal loop that you often find in a
>>>> kernel where the kernel has no ready tasks and waits for an
>>>> interrupt. I haven't tried it, but it seems to me that writing this
>>>> loop as a conditional, flag-checking one could increase (by a
>>>> little) the latency for resuming the right task when an interrupt
>>>> happens, compared to resuming the task directly from the interrupt
>>>> handler and simply abandoning the tight loop.
>>>>
>>>
>>> Nah, the loop overhead to continually read a flag would be a few
>>> cycles at most. The interrupt function overhead to figure out return
>>> addresses from the stack will be much, much worse.
There are several things under discussion here:
- Whether it makes sense to use a routine Spin, containing a loop
(whether conditional or unconditional) as the last thing a thread should
call in its time-slice, such that threads are always suspended and
resumed only at this point, that is, within the call of Spin.
David, I think you have more or less agreed that this is a workable
design for a non-preemptive (in my definition) time-sliced system that
does not schedule other threads to use the slack left over in one
thread's time-slice. I won't say more to defend it at this point.
- How difficult or time-consuming is it for an interrupt handler that
interrupts the loop in Spin to find the return address of that call of Spin?
Assuming that
o the thread calls Spin using the normal calling sequence, in
which the return address is left on top of the stack,
o the code in Spin does not push more data on the stack, and
o the handler is written in MSP430 assembly language,
then this is just one POP.W instruction, executed after the interrupt
handler has popped the saved status register and saved interrupt-point
PC from the stack. (I'm not very familiar with the MSP430 instruction
set and its interrupt handling, so this may be a bit optimistic. But the
MSP430 instruction set is claimed to be strong on stack accesses, so it
should not be much harder.)
Thus, getting the return address of Spin (under the above assumptions)
is quick and well-defined.
In fact, the tick interrupt handler could do it smartly as follows:
1) Pop the saved status register.
2) Pop the saved interrupt-point PC, check that it points to the loop in
Spin, and then discard it.
3) Push back the saved status register.
This makes the two top words on the stack be the resumption PC (the
return address for the Spin call) and the saved status register, exactly
the state needed for a future RETI to resume this thread. It is not even
necessary to get and manipulate the return address for the Spin call.
(I'm assuming that each thread has its own stack area.)
- Whether the Spin routine can or should be written in C.
If the C compiler generates code for Spin and for the calls to Spin that
satisfies the above assumptions, it can be written in C. But David is
right to say that it is hard to be sure that the assumptions do hold,
and will continue to hold, if Spin is written in C. So let's assume that
we write Spin in assembly language, which lets us be sure that the
assumptions hold.
- Whether the thread-resumption latency can be shorter if the Spin loop
is unconditional, and the return address of Spin is saved and used as
the resumption point (case A), compared to the latency when the Spin
loop polls a flag, the address in the interrupted loop is saved and used
as the resumption point, and the interrupt handler sets the flag to make
the loop terminate (case B).
I comment on that below.
>> Let's consider what the kernel has to do, in my guess of the OP's
>> design, considering the two cases of (A) an unconditional "eternal"
>> loop and (B) a flag-checking loop.
>>
>> The kernel knows which thread is running.
>>
>> When the thread finishes its job in this time-slice, it calls Spin,
>> expecting to be resumed at the next instruction after the Spin call,
>> say instruction R.
>>
>> The Spin function loops, eating up the rest of the time-slice.
>>
>> The tick interrupt comes in.
>>
>> The tick interrupt handler saves the context of the interrupted
>> thread. By comparing its PC to the address of the Spin loop, it can
>> check that the thread has not overrun its time-slice. At this point:
>>
>
> Note that a sensible Spin function would tell the kernel that it is
> finished and entering the spin loop,
Why would the kernel need that information, if it is not going to
schedule another thread for the rest of this time-slice?
> rather than leaving the interrupt
> handler to figure it out in this fragile way.
I don't see much fragility in it. It is beautifully simple: if the
thread finished what it had to do, it is in Spin; otherwise not. (I hope
I am lauding the OP here, not my own guess about the design.)
And perhaps the OP's kernel actually has an "I_am_done" kernel call,
which just ends up in Spin.
>> - For (A) the handler gets the return address of Spin
>> by a POP and stores this return address as the resumption
>> point for the thread to be suspended.
>>
>
> Assuming, of course, that you've figured out a way to do that safely and
> reliably....
See above: a POP instruction. It is safe and reliable, if Spin is
written in assembly language.
>> In case (B), the thread is resumed in the middle of the flag-checking
>> loop. It still has to read the flag, branch out of the loop, and
>> execute a return instruction (effectively a POP from the stack),
>> before instruction R is reached.
>>
>
> Yes, you can expect it to take about 3 or 4 instructions before getting
> to R. That would still be a lot less time than you spend messing around
> getting the address of R in case (A), so case (B) wins here in time.
Getting the address of R takes one POP in case (A). Probably faster than
these 3-4 instructions, at least not much slower.
> Remember that all ideas about how case (A) could feasibly be implemented
> are based on hobbling the compiler.
No, it is based only on putting the right code in Spin, and ensuring
that Spin is called with the standard calling sequence that leaves a
return address on top of stack. This is readily and normally done by
writing Spin in assembly language. The compiler is not hobbled in the C
code parts.
> Write the code correctly (case B),
> and you can let the optimiser do its job - that will overwhelm any
> conceivable time advantage case A might have had.
It can hardly overwhelm it if Spin contains just the loop -- not much to
optimise there. Or do you mean to write the tick interrupt handler in C?
I know that some C compilers claim that you can use them to write
interrupt handlers, but to me this seems more fragile than writing Spin
in C. Especially for an interrupt handler that is meant to switch
threads, not just manage som peripheral device and return to the
interrupted thread.
However, the OP said that Spin in the OP's kernel contains some other
things, too, so it's hard to say what the optimiser could do.
> Among other things,
> Spin() could be inlined in its calling function and remove most of the
> overhead.
What "overhead" are you talking about, David? If Spin's profile is as
simple as the OP showed (void Spin (void)), inlining would directly save
only one call or branch instruction per time-slice. Perhaps the
optimiser could let the thread keep more local data in registers over
the (in-lined) Spin call, avoiding some store/load instructions. The IAR
MSP430 compiler defines R12-R15 as scratch registers (caller-save) and
R4-R11 as preserved registers (callee-save), so an inlined function
could increase the available registers from 8 to 12; hard to say if that
would be significant.
If Spin is inlined we lose the ability to check for time-slice overruns
by checking that the interrupted thread is in the unique and only Spin,
so I think that Spin should not be inlined. Perhaps you consider this to
be one of the "fragile" aspects of this overrun-checking method, but it
is not difficult to make sure that Spin is not inlined.
> Even making the great leap of faith that there is a reliable way to get
> the desired return address,
Wow, this makes me feel like a preacher. But "raise your eyes to the
text above", and believe! :-)
There can hardly be a more reliable aspect of a standard calling
convention than the presence and location of the return address.
> and then making a second leap of faith that
> case A is faster, the concept is /still/ wrong.
That sounds rather dogmatic. If it works and is reliable, why is it
"wrong"? Too heretical?
> There is no way that shaving a few cycles off the latency
> could justify using this horrible hack.
Although I still think that case A is a bit faster (and feel I have
given good reasons above and in my preceding posting) it isn't the main
point in favour of case A, the unconditional loop. Given this design of
a Spin function in which the threads are suspended and resumed, a
flag-polling loop is logically unnecessary: after the tick interrupt,
the thread that gets to poll the flag is *the* scheduled thread, so
polling the flag is superfluous.
I think the design is a neat solution to specific, limited requirements.
It is a bit tricky, but interrupt-handling and thread-switching are
often tricky. I mean "tricky" in the sense of "a trick", not in the
sense of "difficult".
Having seen this thread up to this point, I think the key point is
summarized in the quote.
You assume that the OP has a very thorough knowledge of what he's
doing and why. To me, his questions show that this might not be the
case!
Solving his question to the letter will probably not be of much value.
To mention a few points that support this impression of mine:
- What's the behaviour of a function that does not return, when it
suddenly returns? To all my knowledge the behaviour will be
undefined. You can't complain when undefined behaviour turns out to
not be what you expected.
- How should a scheduler detect that the interrupted task is in fact
in the spin loop? Should it use the linker map to find the code
address (and what about optimizations)? Should it match instruction
patterns (and how do you force those to be generated by the
compiler)? In my opinion there is no safe way to detect this, unless
the function "helps" the scheduler (for example using a Yield() call,
global vars, etc).
Since we're talking about a C program, we are constrained by the rules
of C. We programmers exhaust those rules to our advantage, and the
compiler writers exhaust "their side" of the rules to make better
compilers. Therefore we can't break the rules, and C is not the
correct tool for what the OP wants (IF it really is what he wants).
The clean way is to use any language or tool to create the desired
functionality and encapsulate it into a linkable object. Only then it
is compatible to the C portion of the design, and will stay so until
the compiler calling conventions change.
I say this as programmer who has written schedulers on various
architectures, and coincidentially also a virtual CPU of the
architecture used by the OP (involving detailed inspection of
instruction set and compiler conventions to create efficient hooks
from virtual to physical).
Best regards
Marc
See <http://gcc.gnu.org/gcc-4.5/changes.html> and
<http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-flto-796>
(and also <http://gcc.gnu.org/> more generally).
It was news to me too that the LTO (link time optimisation) branch of
gcc had been merged with the main development line. I've known of its
existence for a long time, but for many years it has been (or appeared
to be) a bit of a blue-sky project with a lot of ideas and limited
working code.
It will still take a while before LTO rolls down to gcc compilers
popular in c.a.e. gcc 4.5 is in stage 3 (no new features, bug fix and
testing) - it will be a early next year before we can expect a first
release. It will take a while from then for Code Sourcery to qualify
and verify it thoroughly on their targets, and the 32-bit embedded gcc
suppliers will pick it up from there. Smaller ports, such as avr-gcc,
will take longer - they have fewer developers and resources. For
out-of-tree ports such as the msp430, it depends entirely on what the
developers want to prioritise.
At the moment (gcc 4.3), using -combine and -fwhole-program can get you
quite a lot of these effects for C programming. Basically, it treats
all the files in the program as a single big C file with everything
declared "static".
LTO will give several advantages on top of that. Files can be
individually compiled - useful for large projects, when files are in
different directories, or when you want different compiler options.
Libraries can also have LTO information. You can use languages other
than C (for example, C++), and mix them together. And gcc 4.5 has a
number of new optimisations that are only relevant for whole-program
compilation.
>> You lose all clarity in the definitions of "compile", "link", and
>> "separate compilation".
>
> Yes indeed, at least for the purpose of ensuring a standard calling
> sequence is used for a given function.
>
>> But that is a digression, especially since the msp430 gcc port is not
>> (yet) updated to gcc 4.5, which is itself not yet released.
>
> And the OP specifically asked about the IAR MSP430 compiler, anyway.
>
>>> Interesting question, though: Is there a standard way in a C
>>> environment to ensure that the standard calling sequence is used for
>>> an extern function, with no C-calling-C optimizations?
>>>
>>
>> I think the only way is by being sure that the compiler can't access
>> the code for a function declared as "extern". It should not be hard
>> to do, but you may have to do it explicitly. For example, if you use
>> a compiler's IDE and project manager, you might have to go out of your
>> way to force true separate compilation.
>
> I was afraid that would be the answer, as far as it goes. Yuck.
>
It is like a lot of things in C development - finding a general,
portable, standards-compliant solution is hard, even though making it
work in a real life project is typically very easy. The trick is to
find a balance for a solution that is general enough without being
overly complicated.
>>>>> I'm not so ready to call this "bad design" without knowing more
>>>>> about the OP's requirements and design. The code generated for Spin
>>>>> is exactly the kind of tight eternal loop that you often find in a
>>>>> kernel where the kernel has no ready tasks and waits for an
>>>>> interrupt. I haven't tried it, but it seems to me that writing this
>>>>> loop as a conditional, flag-checking one could increase (by a
>>>>> little) the latency for resuming the right task when an interrupt
>>>>> happens, compared to resuming the task directly from the interrupt
>>>>> handler and simply abandoning the tight loop.
>>>>>
>>>>
>>>> Nah, the loop overhead to continually read a flag would be a few
>>>> cycles at most. The interrupt function overhead to figure out
>>>> return addresses from the stack will be much, much worse.
>
> There are several things under discussion here:
>
> - Whether it makes sense to use a routine Spin, containing a loop
> (whether conditional or unconditional) as the last thing a thread should
> call in its time-slice, such that threads are always suspended and
> resumed only at this point, that is, within the call of Spin.
>
> David, I think you have more or less agreed that this is a workable
> design for a non-preemptive (in my definition) time-sliced system that
> does not schedule other threads to use the slack left over in one
> thread's time-slice. I won't say more to defend it at this point.
>
OK.
> - How difficult or time-consuming is it for an interrupt handler that
> interrupts the loop in Spin to find the return address of that call of
> Spin?
>
> Assuming that
>
> o the thread calls Spin using the normal calling sequence, in
> which the return address is left on top of the stack,
>
You can't assume that (otherwise the OP would never have asked the
question in the first place...), although we can probably assume this
can be forced in some way.
> o the code in Spin does not push more data on the stack, and
>
It may make sense for the end-of-time-slice function to do more than
just spin. Then it may have to make a separate call to Spin.
> o the handler is written in MSP430 assembly language,
>
That's a big requirement, and totally unnecessary except as a way to
implement this bad idea.
> then this is just one POP.W instruction, executed after the interrupt
> handler has popped the saved status register and saved interrupt-point
> PC from the stack. (I'm not very familiar with the MSP430 instruction
> set and its interrupt handling, so this may be a bit optimistic. But the
> MSP430 instruction set is claimed to be strong on stack accesses, so it
> should not be much harder.)
>
That is mostly true (you can't just pop the top of the stack, because
the interrupt function must first preserve a register or two - but as
you say the msp430 has good stack access instructions), given your
assumptions. But as I noted above, the assumptions are not reasonable,
IMHO.
> Thus, getting the return address of Spin (under the above assumptions)
> is quick and well-defined.
>
> In fact, the tick interrupt handler could do it smartly as follows:
>
> 1) Pop the saved status register.
> 2) Pop the saved interrupt-point PC, check that it points to the loop in
> Spin, and then discard it.
> 3) Push back the saved status register.
>
Once you have taken into account saving a working register or two (easy
enough), then that's a fairly elegant implementation of a very ugly hack.
If this system is for scheduling important or time-critical tasks, and
there is no prioritised pre-emption, it is very likely that you need to
track when the task's work is done for testing and verification, or for
tracking errors.
But you are correct that the kernel might not /need/ that information.
>> rather than leaving the interrupt handler to figure it out in this
>> fragile way.
>
> I don't see much fragility in it. It is beautifully simple: if the
> thread finished what it had to do, it is in Spin; otherwise not. (I hope
> I am lauding the OP here, not my own guess about the design.)
>
Working around the compiler in this way /is/ fragile. It is a hack, and
it is dependent on details of the compiler, the processor, the stack
structure, it requires assembly for what should be simple C code, and it
hinders the compiler's optimiser.
However, I have to agree that you have come up with a simple
implementation of this design (and I am lauding /your/ implementation
here, not the OP's bad design, or our guesses about it).
> And perhaps the OP's kernel actually has an "I_am_done" kernel call,
> which just ends up in Spin.
>
>>> - For (A) the handler gets the return address of Spin
>>> by a POP and stores this return address as the resumption
>>> point for the thread to be suspended.
>>>
>>
>> Assuming, of course, that you've figured out a way to do that safely
>> and reliably....
>
> See above: a POP instruction. It is safe and reliable, if Spin is
> written in assembly language.
>
>>> In case (B), the thread is resumed in the middle of the flag-checking
>>> loop. It still has to read the flag, branch out of the loop, and
>>> execute a return instruction (effectively a POP from the stack),
>>> before instruction R is reached.
>>>
>>
>> Yes, you can expect it to take about 3 or 4 instructions before
>> getting to R. That would still be a lot less time than you spend
>> messing around getting the address of R in case (A), so case (B) wins
>> here in time.
>
> Getting the address of R takes one POP in case (A). Probably faster than
> these 3-4 instructions, at least not much slower.
>
Don't forget the comparisons to check that you are in the spin loop.
>> Remember that all ideas about how case (A) could feasibly be
>> implemented are based on hobbling the compiler.
>
> No, it is based only on putting the right code in Spin, and ensuring
> that Spin is called with the standard calling sequence that leaves a
> return address on top of stack. This is readily and normally done by
> writing Spin in assembly language. The compiler is not hobbled in the C
> code parts.
>
A function as simple as Spin case B would often be inlined into the
calling function (either explicitly, or via whole-program optimisation).
Code like that is smaller as well as faster when inlined.
>> Write the code correctly (case B), and you can let the optimiser do
>> its job - that will overwhelm any conceivable time advantage case A
>> might have had.
>
> It can hardly overwhelm it if Spin contains just the loop -- not much to
> optimise there.
The code calling Spin can be better optimised if Spin is a proper C
function, and the complier knows its definition.
> Or do you mean to write the tick interrupt handler in C?
> I know that some C compilers claim that you can use them to write
> interrupt handlers, but to me this seems more fragile than writing Spin
> in C. Especially for an interrupt handler that is meant to switch
> threads, not just manage som peripheral device and return to the
> interrupted thread.
>
The majority of interrupt handlers in embedded systems are written in C
these days. For general interrupts, if you are not happy to trust your
compiler to generate good and safe interrupt code, get a better
compiler! But thread switching code will almost certainly need some
inline assembly at least.
> However, the OP said that Spin in the OP's kernel contains some other
> things, too, so it's hard to say what the optimiser could do.
>
>> Among other things, Spin() could be inlined in its calling function
>> and remove most of the overhead.
>
> What "overhead" are you talking about, David?
I am referring to the check of the exit flag that you think makes my
Spin function too slow compared to your version.
> If Spin's profile is as
> simple as the OP showed (void Spin (void)), inlining would directly save
> only one call or branch instruction per time-slice. Perhaps the
> optimiser could let the thread keep more local data in registers over
> the (in-lined) Spin call, avoiding some store/load instructions. The IAR
> MSP430 compiler defines R12-R15 as scratch registers (caller-save) and
> R4-R11 as preserved registers (callee-save), so an inlined function
> could increase the available registers from 8 to 12; hard to say if that
> would be significant.
>
It is, as you say, hard to be sure - especially if Spin contains other code.
> If Spin is inlined we lose the ability to check for time-slice overruns
> by checking that the interrupted thread is in the unique and only Spin,
Yes, but you don't /need/ to check call or return addresses if you write
proper C code...
> so I think that Spin should not be inlined. Perhaps you consider this to
> be one of the "fragile" aspects of this overrun-checking method, but it
> is not difficult to make sure that Spin is not inlined.
>
>> Even making the great leap of faith that there is a reliable way to
>> get the desired return address,
>
> Wow, this makes me feel like a preacher. But "raise your eyes to the
> text above", and believe! :-)
>
I think we'll just have to agree to disagree on this one.
> There can hardly be a more reliable aspect of a standard calling
> convention than the presence and location of the return address.
>
> > and then making a second leap of faith that
>> case A is faster, the concept is /still/ wrong.
>
> That sounds rather dogmatic. If it works and is reliable, why is it
> "wrong"? Too heretical?
>
Yes, it's against my religion :-) You don't write hacked code based on
lying to the compiler, assembly code, and stack manipulation tricks when
there are perfectly safe, efficient and reliable ways to do the same job
with C. It's about writing legible, maintainable, portable code that is
clear in its purpose and easy to verify. Even if this is nothing more
than a simple test program, you should maintain a certain level of
development quality.
I am not saying that /all/ such hacks are a bad thing - just that you
have to have very good reason for using them. Shaving off a few
processor cycles (if you are correct and you /do/ save time) is very
seldom a good enough reason.
Just because code is accepted by the compiler, and works in practice,
does not mean it cannot be *wrong*. And yes, I know I am pontificating
- doesn't that beat preaching?
You may of course be right, Marc. My impression is based on two things:
the OP seems to have a workable design for the scheduler, and the OP
seems able to read and understand the assembly-language code the
compiler has generated. The OP's worry was only that a branch
instruction would not leave a return address that the scheduler could
use to resume the thread. I think that the OP did not know about tail
calls implemented as branches, or did not remember this possibility.
> To mention a few points that support this impression of mine:
>
> - What's the behaviour of a function that does not return, when it
> suddenly returns? To all my knowledge the behaviour will be
> undefined. You can't complain when undefined behaviour turns out to
> not be what you expected.
The behaviour is defined by the code that the C compiler generates, and
which the OP seems to have inspected. Of course, it is risky to rely on
future compilations giving the same code. Writing Spin in assembly
language would remove that risk.
I do understand that your question is about behaviour as defined in the
C standard, but this is not a pure C program.
> - How should a scheduler detect that the interrupted task is in fact
> in the spin loop? Should it use the linker map to find the code
> address (and what about optimizations)? Should it match instruction
> patterns (and how do you force those to be generated by the
> compiler)? In my opinion there is no safe way to detect this, unless
> the function "helps" the scheduler (for example using a Yield() call,
> global vars, etc).
As I understood it, the Spin function is part of the OP's
kernel/scheduler. If the loop in Spin is of the form "lab: jump lab",
the scheduler interrupt handler can compare the PC at the interrupt
point to the address of the label "lab". If Spin is written in assembly
language, "lab" can be defined as a global symbol so its address is
accessible to the scheduler as a constant. Or the Spin module can define
a globally visible data word that holds the address of "lab".
But I don't know how, or even if, the OP intends to check that the task
has reached Spin. Perhaps the OP's Spin sets a flag before entering the
loop. Anyway, the check can be done easily and quickly, whether using a
flag or using the PC.
> Since we're talking about a C program, we are constrained by the rules
> of C.
We are talking about a program consisting of some C code, divided into
several threads/tasks (which is outside C semantics, I believe), plus a
thread scheduler (also outside C semantics), probably implemented by a
tick interrupt handler. Typical multi-threaded embedded program.
> We programmers exhaust those rules to our advantage, and the
> compiler writers exhaust "their side" of the rules to make better
> compilers. Therefore we can't break the rules, and C is not the
> correct tool for what the OP wants (IF it really is what he wants).
That would certainly be my approach -- for my own purposes, I am not at
all an "anything that works is good" programmer; I usually write in Ada,
and enjoy it. I would write Spin in assembly language, as well as any
coding-sensitive parts of the scheduler.
> The clean way is to use any language or tool to create the desired
> functionality and encapsulate it into a linkable object. Only then it
> is compatible to the C portion of the design, and will stay so until
> the compiler calling conventions change.
I agree fully, but I did not want to lecture the OP, only answer the
OP's question. I did advise the OP to use separate compilation for Spin.
In retrospect, I should have advised the OP to use assembly language for
Spin.
[ snip lots ]
>> Assuming that
>>
>> o the thread calls Spin using the normal calling sequence, in
>> which the return address is left on top of the stack,
>>
>
> You can't assume that (otherwise the OP would never have asked the
> question in the first place...), although we can probably assume this
> can be forced in some way.
The OP was specifically asking how to force the use of a normal calling
sequence. The OP thought that a branch instruction did not represent a
normal calling sequence, and it doesn't, except when it is implementing
a tail call, as I think was the case in the OP's problem.
>> o the code in Spin does not push more data on the stack, and
>>
>
> It may make sense for the end-of-time-slice function to do more than
> just spin. Then it may have to make a separate call to Spin.
Agreed. Or pop whatever it has pushed on the stack, before spinning. Or
even better, save the context of the thread before spinning, which would
really decrease the thread-resumption latency.
>> o the handler is written in MSP430 assembly language,
>>
>
> That's a big requirement, and totally unnecessary except as a way to
> implement this bad idea.
Pooh. You have admitted that parts of a thread switch must be written in
assembly language. So this code is in that part.
> Don't forget the comparisons to check that you are in the spin loop.
Yep. But if you want a check for time-slice overrun, that has to be done
somehow, perhaps with a flag. Both checks are about equally fast, I
think. If you don't want to check for time-slice overrun, you need a
pre-emptive scheduler.
>> If Spin is inlined we lose the ability to check for time-slice overruns
>> by checking that the interrupted thread is in the unique and only Spin,
>
> Yes, but you don't /need/ to check call or return addresses if you write
> proper C code...
But if you still want to check for time-slice overrun you have to use
flags, and watch out for race conditions. The amount of inlined code is
growing...
>>> Even making the great leap of faith that there is a reliable way to
>>> get the desired return address,
>>
>> Wow, this makes me feel like a preacher. But "raise your eyes to the
>> text above", and believe! :-)
>>
>
> I think we'll just have to agree to disagree on this one.
OK.
>> the concept is /still/ wrong.
>>
>> That sounds rather dogmatic. If it works and is reliable, why is it
>> "wrong"? Too heretical?
>>
>
> Yes, it's against my religion :-) You don't write hacked code based on
> lying to the compiler, assembly code, and stack manipulation tricks when
> there are perfectly safe, efficient and reliable ways to do the same job
> with C.
Except for the assembly code that you need to switch threads.
The only "lying" that has been done here is writing the eternal Spin
loop in C, from which the compiler could deduce that no Spin call
returns. I think Spin should be written in assembly language and
considered part of the thread-switching code.
In fact (but don't take this too grievously :-), the only non-standard C
code for "stack manipulation tricks" was shown by you, when you referred
to the special gcc function for getting the return address.
> And yes, I know I am pontificating
> - doesn't that beat preaching?
Good one, David! Luckily I'm an atheist... well, perhaps an agnostic for
the purposes of this thread.
I 'd like to thank you for your posts because you helped me a lot. And I'm
also sorry because I couldn't reply earlier...
I would also would like to explain what I wanted with my Spin function.
First, it is infinite loop function which increments idle counter only for
testing code in simulator. For real work instead of it I'm using low power
mode.
The Spin function is part of a kernel API call Task_Suspend_no_sched().
This function should suspend task, change its state, put it in a suspended
list, save context and then enter low power mode until next tick, when
scheduler should be called. So I needed that PC to save task's context in
such way, that when it is scheduled again it could jump outside the Spin,
or to instruction after the _bis(LPM1) or somthing like that.
So that was idea and finally I realized it like shown below
Task routine should look like this :
void TaskRoutine(void){
while(1){
//do some work
Task_Suspend_no_sched();
}
}
Task_Suspend_no_sched should looks like this:
void Task_Suspend_no_sched(void){
change_state();
put_in_suspend_list();
save_context();//it should use PC placed on stack by
// Task_Suspend_no_sched call
//_bis(LPM1); or loop like below
for(;;){
Idlecnt++
}
}
So ISR which provide system tick should interrupt loop or low power mode,
and then scheduler should be called. When Scheduler schedule this task
again it should jump at the beginning of the task.
Thank you for your comments and suggestions, it was very useful. :)
}
Thank you for your comments and suggestions, it was very useful. :)
Then all the comments about your approach (being inappropriate) where
dead-on.
Your function Task_Suspend_no_sched() should really be named something
like WaitForTick(). And it should do just that: wait for the next
system tick.
The implementation does not require breaking out of endless loops or
other fancy stuff. It is basic task switching theory, descibed in OS
literature and all over the internet.
Using such a function, all the rest falls nicely into place.
Good luck with your project!
Marc
> Task_Suspend_no_sched should looks like this:
>
> void Task_Suspend_no_sched(void){
> change_state();
> put_in_suspend_list();
> save_context();//it should use PC placed on stack by
> // Task_Suspend_no_sched call
> //_bis(LPM1); or loop like below
> for(;;){
> Idlecnt++
> }
Marc Jet wrote:
> brOS [Bogdan Rosandic] wrote:
>> So ISR which provide system tick should interrupt loop or low power mode,
>> and then scheduler should be called. When Scheduler schedule this task
>> again it should jump at the beginning of the task.
>
> Then all the comments about your approach (being inappropriate) where
> dead-on.
Bogdan's extended description of his design closely matches my guess of
his design, as far as I can tell. Therefore I don't agree with Marc's
conclusion. I think Bogdan's approach is reasonable, except that it is
risky to use C for code such as save_context() and the unconditional
spin loop. But perhaps Bogdan is only showing C-like pseudocode?
There is one possibly significant difference between Bogdan's
description and my guess. The example task that Bogdan shows:
brOS [Bogdan Rosandic] wrote:
> Task routine should look like this :
>
> void TaskRoutine(void){
> while(1){
>
> //do some work
>
> Task_Suspend_no_sched();
> }
>
> }
contains just one call of Task_Suspend_no_sched, at the end of the
eternal while(1) loop. If this is true of all tasks in Bogdan's
application, the design could be simplified by inverting the control, so
that instead of TaskRoutine calling the kernel/Scheduler through
Task_Suspend_no_sched, and the Scheduler then resuming the TaskRoutine
after this call, the Scheduler could always call TaskRoutine at its
entry point, and the TaskRoutine could return to the kernel after doing
its work. The while(1) in TaskRoutine would be removed as would the call
of Task_Suspend_no_sched, and their functions would be taken over by
things that the kernel/Scheduler does in between calls of TaskRoutine.
The TaskRoutine would be just
void TaskRoutine(void){
//do some work
}
A TaskRoutine with a single call of Task_Suspend_no_sched corresponds to
a very strict design rule for real-time systems called the single
suspension point rule. By this rule, each task shall have a single point
where it can be suspended and resumed. The rule is good for
schedulability analysis, but can be difficult to use if the task must
perform complex timing or interaction sequences, because the task must
then use data variables to remember what it is doing -- how far the
sequence has advanced -- perhaps by means of a finite-state automaton.
> Your function Task_Suspend_no_sched() should really be named something
> like WaitForTick(). And it should do just that: wait for the next
> system tick.
Marc, did you not note that the Scheduler may schedule *another* task,
not simply continue the one that is "waiting for the next system tick"
in your terms? So the waiting task may be waiting for a longer time,
several ticks, and it is necessary to save its context, including the
PC. One way to get the PC is to retrieve the return address for
Task_Suspend_no_sched, as Bogdan does. Another way is to put the task in
a loop until it is interrupted, then retrieve the PC of the interrupt
point (in other words, the return address of the interrupt handler).
> The implementation does not require breaking out of endless loops or
> other fancy stuff. It is basic task switching theory, descibed in OS
> literature and all over the internet.
Any kernel call that can suspend the calling task must retrieve the PC,
which seems to be the kind of "fancy stuff" that Marc means.
As for the endless loop, most schedulers can encounter a situation where
no real task is ready to do any real work until the next interrupt
happens. There are several ways to deal with this:
- Use a special instruction or configuration that halts, idles, or
powers-down the processor until an interrupt comes in. This is what
Bogdan plans to do in the real system. (As an aside, this can have a
nasty drawback: in one project where I was involved and it was tried,
the resulting square-wave variation in the processor's power consumption
disturbed the sensitive analog electronics on the board, so we had to
use an eternal loop instead. That was not an MSP430, however, but a
space-qualified 80C32, so it used rather more power.)
- Schedule a lowest-priority null task that contains just an eternal
loop that does nothing, or perhaps maintains a processor-load indicator
such as Bogdan's Idlecnt. But this is not so easy in a non-preemptive
scheduler.
- Use a spin loop in the kernel itself, as Bogdan does. Earlier
discussion in this thread shows that making this loop conditional is
logically redundant. The only reason for not using an eternal
(unconditional) loop would be to avoid confusing the C compiler, which
is one reason why this loop should be written in assembly language. I
have seen such eternal null loops in more than one kernel, including
commercial kernels. I think they are an appropriate solution to this
requirement.
> Good luck with your project!
Good luck from me, too, Bogdan.