[freertos - Open Discussion and Support] STM32F4 with FPU

124 views
Skip to first unread message

SourceForge.net

unread,
Oct 16, 2011, 7:56:57 AM10/16/11
to SourceForge.net

Read and respond to this message at:
https://sourceforge.net/projects/freertos/forums/forum/382005/topic/4761747
By:

Hi!

I just got my discovery board, and would like to try out the FPU. Did anyone
write a port yet?! Or a time estimate when it will be officially supported?

I just had a quick look at the architecture manual.. It seems that FreeRTOS
would have to store the entire state of the FPU, adding at least 32x4 bytes
(Are all 32 FPU-registers in use by compilers? Seems to be an awful lot!). Perhaps
i'll give it a try myself.

_____________________________________________________________________________________
You are receiving this email because you elected to monitor this topic or entire forum.
To stop monitoring this topic visit:
https://sourceforge.net/projects/freertos/forums/forum/382005/topic/4761747/unmonitor
To stop monitoring this forum visit:
https://sourceforge.net/projects/freertos/forums/forum/382005/unmonitor

SourceForge.net

unread,
Oct 16, 2011, 8:07:55 AM10/16/11
to SourceForge.net
By: richardbarry

A lot of thought and work has already gone into supporting the Cortex-M4F, but
support is not yet officially available. Note that if you have the FPU turned
off then the standard Cortex-M3 port will work fine, but having the FPU turned
on is much more complex than you might imagine.

The easy option, if you wish to do it yourself, is to set the FPU related registers
to save and restore the FPU context automatically on each interrupt. This is
[b]horrendously[/b] inefficient with the VFP architecture of the M4F, especially
when you consider that only a few tasks will ever use the FPU. Only half the
context can be saved automatically, so the other have has to be done manually.

Another option is to allow tasks to register themselves as FPU context users,
then manually save the FPU context for just those tasks. That is a little more
efficient, but will still result in FPU contexts being saved unnecessarily
sometimes.

Another extreme is to attempt to use the lazy save mechanism of the FPU ([b]note[/b]
lazy save is turned on by default). If you do that, then you have an extremely
complex problem to implement, and if interrupts use the FPU too (they might
if they are doing something like motor control) then there are a dozen corner
cases to take care of once interrupts start nesting that are near impossible
to test.

Yet another options is to preform a software lazy save.

Etc. Etc.


Also a word of warning - take extreme care to set up your compiler such that
it does not randomly use FPU registers as temporary registers in tasks that
are not themselves using the FPU. Some do that, unless special non default
command line options are used.

Have fun.

Regards.

SourceForge.net

unread,
Nov 25, 2011, 6:47:11 PM11/25/11
to SourceForge.net

Hi again!

I think I got my port up and running.. please find it here:

https://github.com/thomask77/FreeRTOS_ARM_CM4F

Before I started, I did some performance measurements. As you said, the time
for a full FPU state save/restore is quite long. A pair of vpush {s0-31}/vpop
{s0-s31} takes around 400ns on my STM32F407 @ 168MHz.

On the other hand, that translates to just ~68 cycles, which is not that bad
at all if you consider the overall performance gain of the FPU vs. software
emulation.

Still, I don't want to have the performance hit for things like serial-port
or motor-control interrupts. So I'll leave the hardware lazy-save mode enabled.

Without an OS switching tasks, the CPU will just do the right thing anyways:

The AAPCS says that s0-s15 are used as scratch registers, so they're automatically
(lazy)-saved on exception entry. s16-s31 are saved by the compiler. There is
a performance hit of ~200ns for entry/exit if the lazy save is actually triggered.
For interrupts without FPU instructions there is no additional overhead.

The only time when all registers must be saved and restored is for a task switch.
This will take about 400ns longer than without FPU.

I added the extended stack frame registers to pxPortInitialiseStack, vPortSVCHandler
and xPortPendSVHandler. Additionally, vPortSVCHandler marks the stack frame
as an extended frame (Bit 4, LR/EXC_RETURN value).


I must warn that the code is _not_ yet fully tested! Use at your own risk!

Have fun,
Thomas Kindler <mail...@t-kindler.de>

Reply all
Reply to author
Forward
0 new messages