>
> In function mmc_send_cmd(cpu/omap3/mmc.c).
>
> 145 OMAP_HSMMC_BLK = BLEN_512BYTESLEN | NBLK_STPCNT;
> 146 OMAP_HSMMC_STAT = 0xFFFFFFFF;
> 147 OMAP_HSMMC_ARG = arg;
> 148 OMAP_HSMMC_CMD = cmd | CMD_TYPE_NORMAL | CICE_NOCHECK |
> 149 CCCE_NOCHECK | MSBS_SGLEBLK | ACEN_DISABLE |
> BCE_DISABLE |
> 150 DE_DISABLE;
>
> Thant means CPU write to MMC register follow the sequence of
> OMAP_HSMMC_BLK -> OMAP_HSMMC_STAT -> OMAP_HSMMC_ARG -> OMAP_HSMMC_CMD.
>
> But the compiler reordes the register access sequence.
> r2= 0x4809c000
> 40201d78: e3a03c02 mov r3, #512 ; 0x200
> 40201d7c: e5821108 str r1, [r2, #264] <-0x108 OMAP_HSMMC_ARG
> 40201d80: e5823104 str r3, [r2, #260] <-0x104 OMAP_HSMMC_BLK
> 40201d84: e582010c str r0, [r2, #268] <-0x10C OMAP_HSMMC_CMD
> 40201d88: e3e03000 mvn r3, #0 ; 0x0
> 40201d8c: e5823130 str r3, [r2, #304] <-0x130 OMAP_HSMMC_STAT
> Can anyone tell me why compiler reorde the register access sequence?
> How to stop it?
A C compiler is free to reorder operations as long as the semantics
are the same. In this case, the compiler believes that the semantics
are the same since the memory operations are being done to different
locations. As to why the compiler chose to do so in this case, it is
likely because without knowledge of the temporal side-effects, the
compiler chose to hide instruction latencies. If there were no side-
effects, the immediate move into r3 will take a few cycles due to the
Cortex-A8 pipeline. The next instruction specified, the write to
OMAP_HSMMC_BLK, has a data dependency on r3 and so must wait until the
immediate move has finished before it can be executed. Rather than
have a bubble in the pipeline, the compiler has placed what it
believes to be an independent operation in that slot (the move of r1
to OMAP_HSMMC_BLK).
Now, how to fix it. You can attempt to play with the optimizer (-O0)
or with compiler flags to turn of instruction scheduling, but those
are hackish and are not guaranteed to resolve the issue on anything
other than the exact toolchain you are using. A way to do this so
that the compiler knows better is to do these few operations as
volatile inline functions or asm blocks. Essentially, by breaking up
each line into a separate inline function that is marked volatile or a
single asm block that is marked volatile, the compiler is informed
that the calls to the functions or the position of the asm block
_cannot_ be reordered. I personally prefer using an inline volatile
asm block since the entire operation can be done in a single block.
--
Rick Altherr
kc8...@kc8apf.net
"He said he hadn't had a byte in three days. I had a short, so I split
it with him."
-- Unsigned
This is why using structures or suchlike to access IO registers is
"teh l0ss". You are probably better off defining a set of IO accessor
functions (they can be inline) with the proper compiler code to ensure
that the compiler will issue the instructions in the right order (this
may require a write-memory-barrier to be issued).
--
Ben
Q: What's a light-year?
A: One-third less calories than a regular year.
>
> Hi everyone,
>
> I am the developer of qemu-omap3(http://code.google.com/p/qemu-omap3/)
> and I am doing MMC card emulation right now.
>
> I meet a strange code reorder problem while compiling x-loader for
> qemu-omap3.
> x-loader souce code is downloaded from
> http://beagleboard.googlecode.com/files/u-boot_beagle_revb.tar.gz.
Don't use that code, use http://www.sakoman.net/cgi-bin/gitweb.cgi?p=x-load-omap3.git;a=shortlog
instead. Same goes for u-boot.
> The arm-none-linux-gnueabi-gcc I use is gcc version 4.2.1
> (CodeSourcery Sourcery G++ Lite 2007q3-51).
>
> In function mmc_send_cmd(cpu/omap3/mmc.c).
>
> 145 OMAP_HSMMC_BLK = BLEN_512BYTESLEN | NBLK_STPCNT;
> 146 OMAP_HSMMC_STAT = 0xFFFFFFFF;
> 147 OMAP_HSMMC_ARG = arg;
> 148 OMAP_HSMMC_CMD = cmd | CMD_TYPE_NORMAL | CICE_NOCHECK |
> 149 CCCE_NOCHECK | MSBS_SGLEBLK | ACEN_DISABLE |
> BCE_DISABLE |
> 150 DE_DISABLE;
>
> Thant means CPU write to MMC register follow the sequence of
> OMAP_HSMMC_BLK -> OMAP_HSMMC_STAT -> OMAP_HSMMC_ARG -> OMAP_HSMMC_CMD.
>
> But the compiler reordes the register access sequence.
> r2= 0x4809c000
> 40201d78: e3a03c02 mov r3, #512 ; 0x200
> 40201d7c: e5821108 str r1, [r2, #264] <-0x108 OMAP_HSMMC_ARG
> 40201d80: e5823104 str r3, [r2, #260] <-0x104 OMAP_HSMMC_BLK
> 40201d84: e582010c str r0, [r2, #268] <-0x10C OMAP_HSMMC_CMD
> 40201d88: e3e03000 mvn r3, #0 ; 0x0
> 40201d8c: e5823130 str r3, [r2, #304] <-0x130 OMAP_HSMMC_STAT
>
> It will confuse the MMC emulation.
>
> Can anyone tell me why compiler reorde the register access sequence?
> How to stop it?
regards,
Koen
> The information in http://elinux.org/BeagleBoard#Cortex_A8_ARM
> Note: CodeSourcery 2007q3 has an issue with -Os option. There is a fix
> (needs recompilation)
>
> Is it about -Os option?
>
> Thanks.
>
>
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google
> Groups "Beagle Board" group.
> To post to this group, send email to discu...@beagleboard.org.
> To unsubscribe from this group, send email to beagleboard...@beagleboard.org
> For more options, visit this group at http://groups.google.com/group/beagleboard?hl=en
> -~----------~----~----~----~------~----~------~--~---
>
Please fix the problem the proper way (using volatile in the definitions) and not by using the patch/’hack’ suggested below by Koen…
– I really think it’s ugly introducing a new volatile variable in order to work around the definition-error in the h-file.
Furthermore the patch doesn’t fix the problem – Just hides it. If i.e. a new revision of GCC would like to switch the order of the lines:
OMAP_HSMMC_STAT = OMAP_HSMMC_STAT;
OMAP_HSMMC_CON &= ~INIT_INITSTREAM;
, at the bottom of the patch, it’s still free to do it as long these aren’t declared volatile as well. This might therefore still cause other unforeseen issues…
(I haven’t checked if the particular order of these two lines are really important – The two lines are only meant as an example)
Therefore please change to use ‘volatile’ for each and every register access – It’s the only way to ensure the HW will work properly.
Furthermore you of cause have to mark the register-address-region(s) as non-cacheable and non-bufferable in the MMU configuration, but this is a complete other story J I think you safely can assume that this is set right – In case it isn’t everything would work ‘strange’ J
Best regards – Enjoy X-mas everybody
Søren
PS: I totally agree with Koen, that you should use the GIT repository instead of the old tar.gz file.
Only problem is, that the definition still doesn’t contain ‘volatile’ in the GIT tree either J
Fra:
beagl...@googlegroups.com [mailto:beagl...@googlegroups.com] På vegne
af Koen Kooi
Sendt: 20. december 2008 12:10
Til: beagl...@googlegroups.com
Emne: [beagleboard] Re: code reorder of x-loader using
arm-none-linux-gnueabi-gcc
Op 20 dec 2008, om 06:41 heeft yajin het volgende geschreven:
Please fix the problem the proper way (using volatile in the definitions) and not by using the patch/’hack’ suggested below by Koen…
– I really think it’s ugly introducing a new volatile variable in order to work around the definition-error in the h-file.
Furthermore the patch doesn’t fix the problem – Just hides it. If i.e. a new revision of GCC would like to switch the order of the lines:
OMAP_HSMMC_STAT = OMAP_HSMMC_STAT;
OMAP_HSMMC_CON &= ~INIT_INITSTREAM;
, at the bottom of the patch, it’s still free to do it as long these aren’t declared volatile as well. This might therefore still cause other unforeseen issues…
(I haven’t checked if the particular order of these two lines are really important – The two lines are only meant as an example)
Therefore please change to use ‘volatile’ for each and every register access – It’s the only way to ensure the HW will work properly.
Furthermore you of cause have to mark the register-address-region(s) as non-cacheable and non-bufferable in the MMU configuration, but this is a complete other story J I think you safely can assume that this is set right – In case it isn’t everything would work ‘strange’ J
Best regards – Enjoy X-mas everybody
Søren
PS: I totally agree with Koen, that you should use the GIT repository instead of the old tar.gz file.
Only problem is, that the definition still doesn’t contain ‘volatile’ in the GIT tree either J
Fra:
beagl...@googlegroups.com [mailto:beagl...@googlegroups.com] På vegne
af Koen Kooi
Sendt: 20. december 2008 12:10
Til:
beagl...@googlegroups.com
Emne: [beagleboard] Re: code reorder of
x-loader using arm-none-linux-gnueabi-gcc
Op 20 dec 2008, om 06:41 heeft yajin het volgende geschreven:
Without having investigated the issue in detail, I would expect, that the reason it works is, that the write to the OMAP_HSMMC_STAT happens before the command complete bit is being set by the HW. In order for the HW to set the command complete bit, it needs to do several clock cycles on the MMC clk (Maximum 52 MHz AFAIR), which is much slower than the ARM core running at 500-600MHz.
Although it’s clearly a sequential mistake, it seems to be saved by the dealys introduced in the real HW…
I hope the above explanation was clear/understandable?
Søren
<BR
I'd need to consult the C standard, but I'm fairly certain that
marking a pointer as volatile only means that the value that pointer
points to cannot be cached in a register. It shouldn't change the
compilers ability to reorder things. That is what write barriers and
synchronization instructions are for.
a volatile pointer by definition is a memory adress containing a pointer
that may be changed from outside current program flow (eg. annother
task, isr routine)
this is not the same as a pointer to a volatile memory location (or any
other type).
but I also don't know what you get when you succeed in defining a
pointer to a volatile function.
Beside: I've never seen or heard of it before... (probably due to my
lack of experience with gcc:-))
Anyway the C compiler is not allowed to change order of C instructions
(";" separated...).
This has nothing to do with volatiles being used within inside each
instruction.
The order inside a single instruction of course is up to the compiler.
For easier understanding: The volatile keyword shall cause the compiler
to directly read the contens of the variables memory location before
using it within a C instruction (not the same as operation!), and must
write the contens immidiatly back to the memory location when finishing
the instruction.
Thus it can not be kept in a working register,- in turn declaring
somthing "volatile register" must give a compilation error.
When it comes to CPUs with caching and pipelines, or different execution
units that can act in parallel then it is up to the compiler to ensure
that the result has established before reading it in next instruction. -
In this case think of volatile hardware register that even change
without program flow... It must be ensured these are not being cached.
However there are processors that can not directly modify memory contens...
For each C instruction these will copy the contens to a working register
(ex. accumulator) modify it according to the C instruction and write it
back before the next instruction is executed.
Assuming : volatile int a;
a++;a++; // this will two times (read value - increase value, write
value) from/to memory
where as:
a = (a * 5) + (a * 3) + a; // this shall (but may not) once read value
- calculate - write value;
The latter one largely depends on implementation of the compiler, some
implementers of compilers consider it more safe to read a each time from
memory, even within one C instruction.
Keep this in mind when porting between platforms.
So if you intentionally mean:
get value - calculate result - write result to value without getting
messed up from interrupt service routine then _do_ code this way...
*prevent isr / dispatching*
Get value - calculate - write value;
*enable isr / dispatching*
even the instruction
a++;
although looking quite harmless may not be safe against interrupt
service routines,- if 'a' is not atomic on the specific platform you may
end up in big trouble.
Rick Altherr schrieb:
>
> Hi Rick,
>
> a volatile pointer by definition is a memory adress containing a
> pointer
> that may be changed from outside current program flow (eg. annother
> task, isr routine)
> this is not the same as a pointer to a volatile memory location (or
> any
> other type).
>
Right, all of the register macros should be defined as volatile
pointers.
> but I also don't know what you get when you succeed in defining a
> pointer to a volatile function.
> Beside: I've never seen or heard of it before... (probably due to my
> lack of experience with gcc:-))
>
No idea, but a pointer to a volatile function isn't necessary in this
case.
> Anyway the C compiler is not allowed to change order of C instructions
> (";" separated...).
> This has nothing to do with volatiles being used within inside each
> instruction.
> The order inside a single instruction of course is up to the compiler.
>
The C compiler isn't allowed to change the semantic order of C
"instructions." That does allow the C compiler to intermix two C
"instructions" as long as they are completed in the semantic order.
That is exactly what the C compiler is doing in this case. For ARM, a
*(unsigned *)REGISTER = 0x1 C "instruction" assembles down to 2 ASM
instructions: a mov immediate and a store. The subsequent *(unsigned
*)REGISTER2 = arg assembles as a single instruction since arg is
already in a register. Since the two C "instructions" share no data,
they have no semantic ordering between the two operations. Thus, the
compiler intermixes them to avoid a pipeline bubble. Now, if the
registers were marked with volatile, aliasing rules may come into play
and the compiler may not be allowed to assume that the two pointers
are not equivalent and thus a data dependency enforces a semantic order.
> When it comes to CPUs with caching and pipelines, or different
> execution
> units that can act in parallel then it is up to the compiler to ensure
> that the result has established before reading it in next
> instruction. -
> In this case think of volatile hardware register that even change
> without program flow... It must be ensured these are not being cached.
>
Right, but in this case there is no data dependency so there is no
semantic ordering between the two register operations at the C level.
C is unaware of the hardware side effects of the operations.
All good points when dealing with volatiles. You still need to watch
when a specific set of operations must happen in a specific order.
You either need to use a volatile asm block or a synchronizing
instruction between the C operations.
Rick
I almost agree, but to me the compiler for a specific platform _must
know and deal with special considerations_ being imposed by the target cpu.
Ex: take a look at Motorola DSP CPU's (I remember 56k CPUs) and a 'good'
compiler for it.
If you take a look at the resulting ASM code you will find up to 3
instructions per row!
The compiler I used then would even take care of NOP'ping 2'nd 3'rd
instruction or following row(s) if it was necessary (It was the DSP 56k
Compiler on Next Station..).
Same (not three instructions per ASM row of course) goes for the
pentium, as for almost any last generation CPU that have Arithmethic Co
processors, pipelines and caches.
However with a new compiler, a newly ported (which is prone to errors)
or a compiler that is as universal as gcc you will find problems in
keeping up with these 'special' considerations for specific targets, -
then there is no way but using ASM routines...
Of course you are right when it comes down to special timing, and/or
sequence considerations for I/O devices of controllers, such as special
hardware registers interfering with pipelining and caching.
Ex. Watchdog registers...
The compiler must not know about these, so if it generates code that
does not meet the requirements of these registers, you will either have
to turn off optimization or provide an ASM function to do the job.
As for the example "volatile int a; a= (a * 5) + (a * 3) + a;"
I missed to provide a reason why the compiler should only once read a
for the above operation...
The reason is that we for example could be reading/writing a status
register,- if a is read multiple times and we are checking flags then we
will end up in trouble as it changes after first read...
Best practice is again to code what we want, so we take a copy before
processing it...
To my knowledge the 'C' definition is not very clear on how to deal with
volatiles inside a C-instruction. Because of this, there are different
interpretations resulting in non portable expressions if the target
and/or source variable is a hardware register...
Rick, thanks for the discussion there are not too much people to share
thoughts on this and I still feel I don't have everything in the right
order on these issues :-).
regards,
AdT
Rick Altherr schrieb:
On Dec 20, 2008, at 5:22 PM, Adrianes den Toom wrote:
Hi Rick,
a volatile pointer by definition is a memory adress containing a pointer
that may be changed from outside current program flow (eg. annother
task, isr routine)
this is not the same as a pointer to a volatile memory location (or any
other type).
Right, all of the register macros should be defined as volatile pointers.
Hi Toom,
Thanks for a fruitful discussion – As you wrote earlier not too many persons, knows too much about these issues in real detail J
> I doubt there is a clear statement for the C definition on how to handle the optimization of instruction sequence
> if all variables (not being connected) are declared volatile. The compiler can not know if the change of sequence
> will compromise the operation on HW
I/O control registers intended by the user because this is way out of its
scope.
This is indeed my main argument why the compiler needs to accept the order of the statements given by the user. I’m 99.999% confident, that the compiler isn’t allowed to change the order of (non connected) volatile read/writes, since this would make reliable HW peripheral accesses impossible from the C language…
With respect to your DSP56k headache I as well agree, although I have to admit that I haven’t touched such a DSP for ages, although I still clearly remember to complete un-understandable ASM in case you switch on all optimizations J
Best regards
Søren
>I’m 99.999% confident, that the
compiler isn’t allowed to change the order of (non connected) volatile
read/writes