Arduino APIs, performance, and C++ templates (long)

34 views
Skip to first unread message

Clifford Heath

unread,
May 15, 2016, 8:28:41 PM5/15/16
to Connected Community HackerSpace
Folk,

I'm doing a little project, my first with Arduinos. I had assumed that the
choice to use C++ meant that the APIs would be fancy object-oriented
APIs that generate inline assembly code for performance. I normally do
more bare-metal stuff, including building C++ APIs for the peripherals
of the MC68HC11 more than a decade ago, so I was keen to see what
can be achieve using more modern C++ compilers.

To say I've been disappointed is an understatement. The standard of
the code is simply awful. The g++ compiler is fantastic, but the Arduino
APIs just don't use that power.

As an example, "digitalWrite" takes over 50 cycles, compared to the
expected 2. I know that there are libraries that work faster, but why are
the default libraries so bad? Even calling these methods takes at least
*three* times the code space that's required. I drilled in to see what's
going on, but that's not the topic here. I wanted to show how things
could be better, and to see if anyone here is interested in making it
happen (personally I actually want to do this for ST's ARM range, but
will assist if someone wants to do AVR versions).

Using template metaprogramming, we can get nice object-oriented APIs
that also map directly to the hardware instructions. Unfortunately it's not
easy to use the existing Arduino port definitions as template parameters,
which might mean having to redefine some of the #defines of the low-
level hardware (more below). So here's a minimal example that works,
and shows what could be achieved by following this route:

template <int Port, uint8_t Mask>
class Pin
{
public:
Pin& operator=(bool b)
{
if (b)
*(volatile uint8_t*)Port |= Mask;
else
*(volatile uint8_t*)Port &= ~Mask;
return *this;
}
};

Pin<0x25, 0x01> portBp0;

Note that the 0x25 is the memory-mapped address of PORTB (its I/O
address is 0x05, but memory-mapping adds an offset of 0x20, if I
understand the AVR hardware correctly).

Now, when I write "portBp0 = 1;" I get exactly one instruction emitted
("sbi") which takes the expected 2 cycles (1 in -Mega). Same deal for
"portBp0 = 0;", the instruction is "cbi". Both are single-word instructions,
whereas a call to digitalWrite takes three or four words of code space.

Note that I would have preferred to define the template like this:

template <volatile uint8_t* Port, uint8_t Mask>
class Pin
{...};

Which allows removing the casts on uses of Port, but to be able to
instantiate the template requires a cast:

Pin<PORTB, 0x01> portBp0;

which translates roughly to:

Pin<(volatile uint8_t*)0x25, 0x01> portBp0;

... and that's not valid for a template parameter. The only method I know
that does work is to define the port variable as extern, in a particular
section, and use the linker script or the linker option --just-symbols to
define the location. This means we can also use a C++ reference
instead of a pointer:

extern volatile uint8_t PortB; // address provided to the linker
template <volatile uint8_t& Port, uint8_t Mask>
class Pin
{...};

Pin<PortB, 0x01> portBp1;

It's quite a lot of fiddling to use a linker script, although --just-symbols
looks easy enough; either way you can't use the standard AVR header
files for the values :(.

One option might be to define a structure for all the registers in a given
AVR variant (and just locate the structure using --just-symbols), e.g.

extern struct {
...
volatile uint8_t PortB; // ... at address 0x25 in the structure.
...
} CPU;

void clear_B()
{
CPU.PortB = 0;
}

The other advantage of using templates is that we can specialise them
to set up the port correctly, and to check for collisions in port usage:

template <volatile uint8_t& Port, uint8_t Mask>
class OutputPin : public Pin<Port, Mask>
{
OutputPin()
{
// (Check with a pin registry that this pin isn't already assigned to something else?)
// Set up port direction...
}
};

This also means that you can dynamically assign port pins just by
defining a local variable in a function, and the pin will be set up for
you when you hit that function.

With more work, you could set up templates for whole ports, or for ranges
of pins on the same port:

template <volatile uint8_t& Port, uint8_t Mask, int Shift>
class PinRange
{
public:
operator int()
{
return (Port&Mask) >> Shift;
};

PinRange& operator=(int val)
{
Port = (Port&~Mask) | ((val << Shift)&Mask);
return *this;
};

Pin& operator++()
{
*this = (int)*this + 1;
return *this;
};
// ..., etc
};

PinRange<CPU.PortB, 0x16, 2> portBpins23and4;

The G++ compiler is quite capable of turning all these templates and
meta-programming into the most efficient possible inline assembly code,
with none of the downsides of the Arduino approach.

Anyhow, I hope I've piqued someone's interest. Your comments would be
welcome.

Clifford Heath.

Clifford Heath

unread,
May 16, 2016, 1:43:13 AM5/16/16
to Connected Community HackerSpace
Folk,

I spent a few hours playing with this approach, and when you go to
"extern" definitions with the address provided to the linker, the compiler
no longer recognises that it can substitute "sbi" for "ldw", "or" and "stw",
so you get long-form code.

I tried to force the issue using inline "asm" calls to the SBI instruction,
but then gcc won't coerce the (unknown, but possibly 16-bit) address into
the 6-bit field, even when I try various ways to force it. I think that Atmel
have hacked gcc just enough to work for the cases they care about.

The upshot of that is I can't make proper use of a "struct" (because I
can't locate it in memory).

I.e. I can't see any way to use SBI/CLI instructions on registers in this
struct:

struct __attribute__((packed)) AvrIOPort {
uint8_t pin;
uint8_t ddr;
uint8_t data;
};
extern volatile AvrIOPort PortB; // Address set by a linker option

or the low 0x20 bytes of my much larger "CPU" structure (which maps
the entire 0xFF block).

Here is the code which fails:

template <volatile AvrIOPort& Port, uint8_t Number>
class Pin
{
public:
Pin& operator=(bool b)
{
if (b)
// Port.data |= (01<<Number);
asm volatile(
" sbi %[portdata],%[portbit]\n"
: // Output Operands
: // Input Operands
[portdata] "I" (&Port.data),
[portbit] "I" (01<<Number)
:
);
else
// Port.data &= ~(01<< Number);
asm volatile(
" cbi %[portdata],%[portbit]\n"
: // Output Operands
: // Input Operands
[portdata] "I" (&Port.data),
[portbit] "I" (01<<Number)
: // Clobbers
);
return *this;
}
void output() { }
};

Pin<PortB, 0> portBp0;

The compiler can't see that the (external) address of "Port.data" can
be fit into a 6-bit field (specified by the "I" parameter type), so it
complains "impossible constraint".

I can still make this all work using #defines for all the register addresses,
but it's a lot uglier than using structs.

I hope I don't have the same problem with the ARM version of gcc.

Perhaps someone will get value from my discussion above.

Clifford Heath.


> ----- Original Message ----- From: "Clifford Heath" <cliffor...@gmail.com>
> To: "Connected Community HackerSpace" <connected-commu...@googlegroups.com>
> Sent: Monday, May 16, 2016 10:28 AM
> Subject: Arduino APIs, performance, and C++ templates (long)
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2016.0.7597 / Virus Database: 4568/12237 - Release Date: 05/15/16
>
>

Reply all
Reply to author
Forward
0 new messages