On 12/02/2021 19:35, Marcel Mueller wrote:
> Am 12.02.21 um 10:09 schrieb David Brown:
>>> Use the C++ standard atomic<T> and check for is_always_lock_free with a
>>> static assertion and you are fine.
>>
>> There is no problem with the types that are always lock free - the
>> compiler generates code for these directly. But then, for types that
>> are always lock free, you generally don't need atomics at all - with a
>> single core system, "volatile" is fine (plus the occasional memory
>> barrier) for loads and stores.
>
> Strictly speaking this is not true if you take DMA into account. But
> this is not a common use case.
That depends very much on the way DMA and the memory system is
implemented. On many microcontrollers, volatile is all you need. On
others, the memory barrier instructions generated with atomics is not
sufficient - you need explicit cache flush instructions. (That's the
kind of thing that makes low-level code so much fun!)
>
>> (And if you stick to the sanity rule of
>> only writing to a given object in /one/ thread, then RMW operations are
>> fine too.)
>
> Indeed.
>
>> The fun comes with bigger types. And on a 32-bit processor, that
>> includes 64-bit types - which are not uncommon. That is when having
>> atomic types in the language becomes a big benefit - and it is when they
>> fail completely with this implementation.
>
> Sure, when the hardware does not allow lock free access, then there are
> no generic, satisfactory solutions.
I think (correct me if I'm wrong) that every system will have a limit to
the lock-free sizes they support.
>
>> (Just for fun, the Cortex-M can /read/ 64-bit data atomically, as the
>> double-read instruction is restartable. But it can't write it
>> atomically - an interrupt in the middle of the instruction will leave
>> the object halfway updated.)
>
> In this case yo cannot use DWCAS on this platform. You need to seek for
> other solutions. E.g. store and replace a 32 bit pointer to the actual
> value.
>
Yes, that could perhaps be a way to handle things, but off the top of my
head I can't see how to do this safely and generically.
>
>> This is an implementation library in a compiler - it is not covered by
>> the C++ standard either. The standard only says what the code should
>> do, not how it should do it (and in this case, the code does not work).
>
> So the library needs to be adjusted platform dependent.
Yes.
>
>>> It also implies
>>> other impacts, e.g. making IRQ response times unpredictable.
>>> Personally I dislike this old hack from the 70s/80s.
>>
>> The "old hack" is far and away the easiest, safest and most efficient
>> method. IRQ response times are /always/ unpredictable - if you think
>> otherwise, you have misunderstood the system. The best you can do is
>> figure out a maximum response time,
>
> There are systems with guaranteed maximum values.
Of course - but that is the maximum of the inherent cpu maximum response
time, maximum delays from memory systems, maximum run-times of other
interrupt functions (that have not re-enabled interrupts), and so on.
Interrupt-disable sections which are shorter than the maximum interrupt
function time will not affect the maximum response time for interrupts.
>
>> and even then it may only apply to
>> the highest priority interrupt.
>
> Of course.
>
>> Real time systems are not about saying
>> when things will happen - they are about giving guarantees for the
>> maximum delays.
>
> Exactly. But that is enough.
Yes.
>
>> Remember, during most interrupt handling functions on most systems,
>> interrupts are disabled - your maximum interrupt response is already
>> increased by the time it takes to run your timer interrupt, or UART
>> interrupt, or ADC interrupt, or whatever other interrupts you have.
>> These will all be much longer than the time for a couple of loads or
>> stores.
>
> Context switches can be quite expensive if your hardware has many
> registers (including MMU) and no distinct register sets for different
> priority levels.
>
Yes, context switches can be expensive - but I don't see how that is
relevant. Interrupt response times don't usually have to take full
context switch times into account, because you don't need a full context
switch to get to the point where you are able to react quickly to the
urgent event. Most cpus preserve the instruction pointer, flag
register, and perhaps one or two general purpose registers when an
interrupt occurs - they rarely do full context switches.
(There are more niche architectures with very fast and deterministic
response times.)
>> Thus disabling interrupts around atomic accesses does not
>> affect IRQ response times.
>
> Feel free to do so if it is suitable on your platform. You already
> mentioned that this is not sufficient on multi core systems, which
> become quite common for embedded systems too nowadays.
>
Embedded systems can broadly be divided into three groups. There are
"big" systems, for which multi-core is becoming more common - these are
dominated by Linux, and you can use whatever multi-threading techniques
you like from the Linux world. There are "small" systems, with a single
core - these are far and away the largest in quantity. In between, you
have asymmetric multiprocessing - devices with a big fast core for hard
work (or possibly multiple fast cores running Linux), and a small core
for low-power states, handling simple low-level devices, or dedicated to
things like wireless communication. Although you have more than one
core in the same package, these are running different programs (and
perhaps different OS's).
I see very little in the way of symmetric multicore systems running
RTOS's. That may change, however.
>
>> There is no single maximally efficient solution that will work in all
>> cases - any generic method will be overkill for some use-cases. But a
>> toolchain-provided generic method that doesn't always work - that is,
>> IMHO, worse than no implementation.
>
> Obviously your particular case is sensitive to priority inversion.
>
> But this is always the case when you use libraries. They cover /common
> cases/ not all cases.
> If a generic atomic library does not guarntee forward progress when used
> with different priorities it is not suitable for this case.
>
All RTOS systems are sensitive to priority inversion, as are all small
single-core embedded systems with bare metal (interrupt functions are,
in effect, high priority pre-emptive threads). It is not a special case
- it applies to just about every use of devices such as the Cortex-M or
other microcontrollers. You cannot use the gcc-provided atomics library
for even the simple situation of having a 64-bit value shared between an
interrupt function and a main loop or other thread - it will kill your
system if there is an access conflict.