You're stuck with this problem because you are forcing processing and
timeliness constraints into the ISR's. An ISR should *only* do what it
ABSOLUTELY MUST. "In and out", lickity split!
Your description SUGGESTS that you are implementing your comms system
in a state machine something at the RxIRQ level like (pseudocode):
GetCharacter:
retrieve character from comms hardware
note error flags associated with this reception
if any error, do some sort of error recovery/logging
(or, do state specific error recovery, as appropriate)
ret
AwaitSoH:
header = GetCharacter()
if (header != Start_of_Header)
diagnostic("SoH not received when anticipated")
// leave RxIRQ as is; remain in the state awaiting SoH
else
set_RxIRQ(SoHReceived)
return from interrupt
// the above assumes SoH doesn't occur in a message body. If it
// does, then you revisit this state occasionally as you sync up to
// the data stream
SoHReceived:
address = GetCharacter()
if (address != MyAddress)
diagnostic("message APPARENTLY not intended for me")
set_RxIRQ(AwaitSoH) // a simplification, for illustration
else
message_length = 0 // prepare for payload to follow
InitializeChecksum(address)
set_RxIRQ(AccumulateMessage)
return from interrupt
AccumulateMessage:
datum = GetCharacter()
buffer[message_length++] = datum
UpdateChecksum(datum)
if (message_length > MESSAGE_LENGTH)
set_RxIRQ(AwaitChecksum)
return from interrupt
AwaitChecksum:
checksum = GetCharacter()
if (Checksum != computedChecksum)
diagnostic("message failed integrity check")
error
else
parse message (unless you've been doing this incrementally)
act on message
prepare result
wait until master has turned off its bus driver
turn on your bus driver and transmitter
(or, have the scheduler do so)
schedule your reply for transmission
set_RxIRQ(AwaitSoH)
return from interrupt
[Note that you may, instead, have folded all of this into one static RxIRQ
by conditionally examining a "state variable" (byte counter?) and acting
accordingly:
if (byte_count == 1)
check if correct header
else if (byte_count == 2)
check if correct address
else if (byte_count ...
I manipulate the interrupt vector instead as each little ISR "knows"
what the next ISR should be so why introduce a bunch of conditionals?]
And, your TxIRQ (once the reply has been scheduled):
TxIRQ:
SendCharacter(buffer[message_length++])
if (message_length > MESSAGE_LENGTH)
set_TxIRQ(ShutdownTx)
return from interrupt
ShutdownTx:
wait until last character cleared transmitter (NOT holding reg)!
wait until line has stabilized (bus driver)
turn off bus driver
set_TxIRQ(none)
return from interrupt
[Of course, message lengths can vary between Tx and Rx, etc]
So, all of your IRQ's are lightweight. EXCEPT the "AwaitChecksum"
state. There, you have to do a fair bit of processing ON THE HEELS OF
the final character in the master's transmission (you could add an
additional "trailer" but that's just more characters to receive and
process and doesn't fundamentally change the algorithm).
The delays ("wait until...") are all relatively short. Yet, not
necessarily "opcode timing" short. So, you sort of have to sit around
twiddling your thumbs until you think you've met the required delays
(you can't make any *observations* to tell you when the time is right...
when the master has turned off its bus driver, etc.)
Or, throw some hardware resource at the problem (small interval timing)
All of that waiting wants to be done OUTSIDE the ISR. Yet, you can't
afford for it to be unbounded -- because your master no doubt expects a
reply in *some* time period else a dropped message would hang all comms!
(and it probably uses a lack of reply to indicate a failure of your node)
I.e., there are LOWER and UPPER bounds on when you can start your reply.
Too soon and you collide with the the tail end of the master's transmission;
too late and you risk the end of your reply running into the master's
*next* transmission.
[Or, you can add a timer to the master so that it doesn't start its next
transmission until it is *sure* you are finished transmitting]
Likewise, all of the *processing* that isn't time critical (or, SHOULDN'T
be!) wants to happen outside the ISR.
If, instead, you could note the time at which a "request" from the master
was sent and use that as a reference point from which to determine
when you would have a CHANCE to deliver a reply ("timeslot"), then
you can do all of this processing AND waiting outside of the IRQ.
[If you force that time to be JUST the duration of the master's message,
then you don't have any real leeway -- you have to act promptly! You're
stuck with your present dilemma.]
For example, assume you have a 1ms periodic interrupt (change to
suit your needs). Assume you are delivering data at 9600 baud
(change to suit your needs). Assume messages from the master are
M characters long.
[I've chosen numbers that make the math relatively easy so you don't
have to dig out a calculator. Changing the values just changes the
math. I.e., characters are arriving roughly at the same rate as your
periodic interrupt -- though they aren't guaranteed to be in
a particular phase relationship with it. (this is not a requirement
of this approach, just a coincidence for the numbers I have chosen)]
On a particular node, you notice a "SoH" received sometime between
periodic interrupt S-1 and S (because you modify your AwaitingSoH
ISR to signal an event that you can then examine -- or, let it
capture the "periodic counter" *in* the ISR and post that time value
as the "event").
You KNOW the master's message will not be complete until at least
(S-1)+M but definitely before S+M -- because you have *designed* to
this goal!
Furthermore, you know that your timeslot is offset X ms from the
StartOfHeader in the master's message (i.e., time ~S). You KNOW
that you can't safely turn on your bus driver until S+X (because those
are the rules of the protocol) but you *do* know that the master
will have turned his bus driver off by then (because it is following
the same rules!)
So, you schedule a job that turns on your bus driver at S+X and
pushes *a* reply onto the bus.
*A* reply.
This need not be *the* reply to the message from the master sent
at time S! It may, instead, be a reply to the message sent by
the master at the *previous* "time S". Or, the one before that!
Instead of tail-gating the master's message and trying to reply
as soon as the last character in its message has cleared the medium
(or, some epsilon later), you decouple your replies from the master's
requests.
With the timeslot, you *know* you can send a reply EVEN IF THE MESSAGE
FROM THE MASTER IS NOT FOR YOU! I.e., you dont have to check the address,
decode the message, act on it AND compose your reply *now*. Likewise,
other nodes know that they can send THEIR replies even when the current
message is for *you* and not them!
You just support some number of outstanding messages to each node
(perhaps just "1" but bigger numbers are better) and tag them with
a (small) sequence number -- so the master can pair replies to
outstanding requests AND so you can see when the master has given
up on an "old" request that you perhaps forgot to acknowledge.
[E.g., you can conceivably reply to message 2 before replying to message
1 -- if that makes sense in your current execution environment. If
not, then Reply2 has to wait to be scheduled until Reply1 has been sent.
You are free to arrange those criteria as fits your processing capabilities.
You don't HAVE TO reply to the message *now*.]
Note that this can be scaled by supporting a smaller number of
timeslots than there are physical nodes in the system -- the
master can allow nodes (1 - Q) to use the Q timeslots following
*this* message (before it issues the NEXT message) and the (Q+1 - 2Q)
nodes to use the Q timeslots following the NEXT message.
The point is, each node knows AHEAD OF TIME when it can reply (when
it can turn on its bus driver) instead of having "very little notice"
and having to react promptly.
It also allows the nodes to know that communications are "fair"
and "deterministic". Any node knows how long it must wait before
it is *guaranteed* a chance to access the medium. (if there was
no guarantee, then nodes wouldn't have been able to PREDICT when
they should acquire the medium and place their messages on it)
You've moved the:
parse message (unless you've been doing this incrementally)
act on message
prepare result
steps from the ISR into a lower priority task where you, presumably,
have more leeway in addressing those needs (than you would in an
ISR that wants to be short!). *All* of your ISRs are now short
(because they just empty the receiver or stuff the transmitter
and don't do any *decoding*, processing, etc.)
RxIRQ:
datum = GetCharacter()
FIFO <- (datum,timestamp)
return from interrupt
Something else watches the FIFO to try to identify messages
within. When it does, it knows when the message began (because
of the timestamp on that datum) so it can figure out when a reply
*should* be scheduled -- even though the reply might not be the
reply for this message.
Instead of having to deliver a response *NOW*, your protocol moves
the time that you are granted to fabricate a response out of the
low level "driver". You could conceivably allow different timeouts
for different types of messages, etc.
And, because you know when to expect a message from the master
(even if it is not for you!) -- because the protocol has the master
sending a message followed by Q reply timeslots -- and when, relative
to that, to present your reply, the timing of the bus arbitration is
decoupled from the immediacy of a particular "Rx IRQ". You're not
trying to accurately time/delay "character times", "bit times" or smaller.
You, instead, rely on a timebase that you already have in place.
Making timing decisions in a more tolerant environment (including,
potentially, the enabling and disabling of the bus driver)
If you look at higher performance "packet protocols", you will tend
to see this same pattern repeated. I.e., you don't send an ethernet
message and expect the receiving interrupt on the destination node
to prepare and deliver a reply! Nor do you expect the sending node
to twiddle its thumbs awaiting that reply; nor the reply to message 1
before it sends message 2 (to you *or* some other node).
You just have to agree, ahead of time (i.e., protocol specification!)
how long you are willing to wait for old acknowledgements and how
many you are willing to let remain outstanding at any given time.
This defines the time you have to "handle" a particular message; NOT
some artifically tight constraint IN an ISR.
Then, "do the math" for your particular periodic interrupt rate
and data rates (and message lengths, node counts, etc.) so you can
ensure each node gets serviced in a timely fashion. The point isn't
to maximize throughput but, rather, to be able to provide *guarantees*
without overspecifying hardware or needlessly tightening constraints
on the software.
This turns a hard real-time approach (where missing a deadline means
you simply abandon the action that you were attempting) into a softer
one with more flexible deadlines (where missing a deadline doesn't
render the effort "wasted" but, rather, salvageable at some possibly
reduced "value", at a later time).
Otherwise, you "must" reply to this message now. And, if you can't,
what makes you think you will be able to the *next* time it is
sent to you (i.e., as a "retry")? Your implementation is "more brittle".
(what happens if some node needs an ISR for some other capability
and that ISR interferes with your TIMELY processing of "this" message?)
[I currently use a variation of this "over ethernet" (where it isn't
necessary) in anticipation of porting the design to a wireless
implementation (where folks can't all "talk at once" yet deterministic
delivery is required). I.e., "bus driver" in my case is "RF transmitter"]
*Think* about this sort of implementation... what it means/costs in your
hardware. Put ballpark numbers on how much time it allows slaves to
handle messages, etc. Decide what the impact on the devices (master?)
making those requests might be: e.g., if they BLOCK until they get a
request (and no other processing CAN occur on that node), then you
would want to favor more prompt responses.
OTOH, if you can do meaningful work while awaiting a reply, then the
impact of a delayed reply is minimized.
I reviewed a colleague's implementation of a product a few weeks back
that had several CAN nodes collaborating on the services provided.
Every message request he sent out, I asked:
"What happens if you NEVER get a reply to this?
What happens if the reply is delayed?
Is this node effectively *stuck* in that get_reply() for the duration?"
And, eventually:
"Ahhhh... so that's why you are doing all this work in your ISRs that
one would more safely do elsewhere -- less precious! Can't risk things
catching fire while you're waiting for a reply!"