I was wondering if anyone had suggestions as to what to look out for
when porting from a Linux application written in C for x86 to arm in the
omap3.
I have found several sites about issues dealing with porting to an arm
processor in general but wasn't sure how many still apply to the arm
that the arm cortex A8.
I have found some information about unsigned char vs signed char that
does seem to still be the case.
I have also read a lot about alignment issues that I haven't really seen
in my tests.
Any ideas/suggestions/pointers are most welcome.
Ben Anderson
Those two are the big ones. Unless you are dealing with hardware i/o, then you
probably don't need to worry about anything else.
If you aren't seeing alignment issues, then you must be writing some pretty
decent C code. In particular, anytime you are casting pointers you are probably
asking for trouble. x86 is very forgiving, ARM isn't.
Signed vs. unsigned is a compiler issue. You can throw switches in gcc to make
unspecified chars be signed or unsigned, it's just that the defaults for ARM are
opposite to x86. Again, writing portable C code is the way to avoid this
problem. Dan Saks has some columns on embedded.com about this topic.
b.g.
--
Bill Gatliff
bg...@billgatliff.com
Forgot one more: endianness.
Any time you do anything that assumes a certain byte-ordering for multi-byte
values, the differences between ARM and x86 will trip you up. This includes
casting integer pointers to character pointers and then dereferencing the char
pointer via []'s, using unions to pick apart bytes of an integer, memcpy'ing
structures to storage on one machine, and then reading them back on another machine.
The << and >> operators work properly, because the C language specification is
clear as to how they are supposed to work. If you look at the assembly language
and memory representations of values as you shift them, however, you'll see they
behave differently in ARM vs. x86.
As with the others, portable C code is your friend here too.
> On 2 okt, 21:31, John Beetem <johnbee...@yahoo.com> wrote:
>> You're in luck. Cortex-A8 allows unaligned access, according to the
>> Cortex-A8 Technical Reference atwww.arm.com. Earlier ARM processors
>> did not allow unaligned access. Unaligned accesses usually cause a
>> loss of performance, so it's best to avoid them.
Unaligned accesses cost one cycle extra on Cortex-A8. This is much
less than manually shifting the bytes into place. Writing the code
properly avoids most unaligned data, but sometimes it's unavoidable,
for instance in networking code.
>> You're also in luck regarding little versus big-endian byte
>> numbering. ARM is an "either-endian" architecture, but specific
>> implementations usually hard-wire one or the other. As far as I can
>> tell, OMAP uses little-endian like the x86. (Or else it's either-
>> endian and software sets it to little-endian by default.) Little
>> versus big-endian would be an issue if you wanted to port to a big-
>> endian architecture like PowerPC.
>
> You can still run FPA code on the cortex which is mixed-endian. Why
> one would do that, I don't know :)
You can't run FPA code on a Cortex CPU. It doesn't have an FPA unit.
You probably meant soft-fpa.
--
Måns Rullgård
ma...@mansr.com
What is it about casting of pointers that is bad? Is it de-referencing
pointers to un-aligned data elements?
I don't mean to say that casting of any type in general is good. I am
just trying to get a firm grasp on what the issues are.
> Signed vs. unsigned is a compiler issue. You can throw switches in gcc to make
> unspecified chars be signed or unsigned, it's just that the defaults for ARM are
> opposite to x86. Again, writing portable C code is the way to avoid this
> problem. Dan Saks has some columns on embedded.com about this topic.
>
>
>
> b.g.
I looked through some of Dan Saks articles and in one he does mention
caution against casting of pointers but no real details.
I still having hard time finding this info via google. So if anyone
knows of some site that goes through the alignment/pointer issue with
possibly some examples let me know. It would be much appreciated.
Thanks all of you for your input!
Ben Anderson
As I've tinkered with it, NEON provides major acceleration for media-intensive
activities. But for more general-purpose floating-point operations, it doesn't
outperform a true FPA.
So is that a yes, or a no? It depends. :)
Correct.
Generally speaking, if you have to cast a pointer then you must have lied to the
compiler at some point about what the object in question actually is. That's
almost always a bad idea, particularly so when writing portable code. Better to
clearly convey to the compiler what's going on, and let it and the rules of C
work to your advantage.
> I don't mean to say that casting of any type in general is good. I am
> just trying to get a firm grasp on what the issues are.
If you cast a char* to an int*, then you risk problems on machines where the
alignment restrictions are different for the two data types. x86 doesn't care,
so if your char wasn't word-aligned then when you dereference the casted pointer
to int, nothing bad happens. On ARM, however, you get an exception (*).
* - except on some of the newest ARM cores, apparently. I avoid the problem so
that I don't have to care which core I'm running on!
These kinds of problems can be nasty to test for, because they seem to travel
along with dynamically-allocated data structures, buffers, etc. that are very
sensitive to system state. So you might make several passes over the same code
successfully before the !kaboom! happens. Better to prove the code right by
inspection beforehand, which is only possible if you follow C's rules carefully.
>
> I looked through some of Dan Saks articles and in one he does mention
> caution against casting of pointers but no real details.
>
> I still having hard time finding this info via google. So if anyone
> knows of some site that goes through the alignment/pointer issue with
> possibly some examples let me know. It would be much appreciated.
Just don't cast pointers, and you should be fine. Here's a bad one I see from
time to time:
int i;
char *ibuf = (char*)&i;
char a, b, c, d;
/* break apart i into its four bytes */
a = ibuf[0];
b = ibuf[1];
c = ibuf[2];
d = ibuf[3];
The values of a, b, c, and d will be different on x86 vs. ARM. Ditto if you go
the opposite way:
char cbuf[sizeof(int)];
int i;
i = cbuf[0];
i = (i << 8) + cbuf[1];
i = (i << 8) + cbuf[2];
i = (i << 8) + cbuf[3];
This code is actually portable if you load up cbuf the right way each time,
regardless of the endianness of the machine (which can be tricky). BUT, you
will still get different values for i if char's are signed on one machine but
unsigned on another. BUT BUT, you won't see the problem until the
most-significant bit of a byte in cbuf is set.
The two above examples aren't casts per-se, but they are definitely
representation transformations of the same type that casts cause. So I lump
them together.
Just watch out for stuff like that. You tend to know when you're in risky
territory, because the code starts to look very much like the above.
Forgot to mention: fix this by making cbuf an _unsigned_ char in all cases.
I would be surprised to see single precision code running faster on an
FPA unit than on a NEON unit at the same frequency. For double
precision, well NEON doesn't support it.
Laurent
Indeed.. and of course the Cortex-A8 also has VFPv3. VFP is the
successor to ARM's ancient FPA coprocessor, and can do single and double
precision operations. For general floating point you can use a mixture
of VFPv3 and NEON instructions, since these use the same registers.
--
Torne Wuff
to...@wolfpuppy.org.uk
The Cortex-A8 has a VFPlite unit, which implements the full VFPv3
instruction set, but it is not pipelined, meaning floating-point-heavy
code runs rather slowly, but thanks to the instruction FIFO shouldn't
impact the execution speed too much when the bulk of the code run in the
ARM pipeline.
When in "runfast" mode, single-precision VFP instructions execute in the
NEON pipeline. If using gcc, adding the flags -ffast-math -fno-math-errno,
and avoiding double precision is advisable, assuming results are still
accurate enough.
--
Måns Rullgård
ma...@mansr.com
The issue I am having is the an application developer is casting a
(unsigned char *) to a double * then de-referencing it into a double.
Of course this has historically worked fine in x86 but is hanging under
the cortex A8.
Here is a sample of what the application does.
--------------------------------------------------------------
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#pragma pack(1)
typedef struct INFO {
int fld1;
int fld2;
char fld3;
double fld4;
} INFO;
#pragma pack()
INFO info = { 1, 2, '3', 4.0 };
int main(int argc, char **argv)
{
unsigned char *data;
double dbl;
data = (unsigned char *)&info.fld4;
printf("&info=0x%08x &info.fld4=0x%08x offsetof(INFO, fld4)=%d\n",
(unsigned int)&info, (unsigned int)data, offsetof(INFO, fld4));
dbl = *(double *)data;
printf("fld4=%g dbl=%g\n", info.fld4, dbl);
return 0;
}
--------------------------------------------------------------
The issues is that this works/doesn't work by just changing the code in
small ways. For example adding extra vars in different places. Also
removing/adding printf also seem to effect it as well. Making the
double aligned also solves the problem (obviously).
When it doesn't work the program will just sit in what appears to be an
endless loop with cpu usage maxed out. It also causes the number in the
"User" field in /proc/cpu/alignment to change.
If I turn on warning (echo 1 > /proc/cpu/alignment) I get the following
repeated over and over.
Alignment trap: lt (1211) PC=0x00008394 Instr=0xe8980300
Address=0x000105e5 FSR 0x001
It is interesting to note that the Address in the warning is the same as
the address of the data in the above example.
The hard part is that I can get this code to fail/succeed in various
ways. The code at some points has succeeded even though the data
variable is pointing to misaligned address. This indicates to me that
its not *just* a pointer to mis-aligned data problem. But I have know
idea why it works sometime and not others.
I can also get the code to work by turning on the automatic kernel fix
for unaligned accesses (echo 2 > /proc/cpu/alignment).
I am also not sure anymore what the real problem is. I not sure if
there is some coding flaw I am not seeing, some arm vs x86 issue I am
not understanding or if it is a bug in gcc/libs/etc. Is it possible
that the kernel is doing something that is needed for the earlier arms
but gets in the way for the cortex-A8?
According to the cortex A8 documents (mentioned in a previous emai) the
cortex can access un-aligned data. Does this also apply to pointers?
I know there is ways to get around doing it this way but I have to
satisfy my curiosity as to what is going wrong and I not sure the
applications developer will like me telling them to just do it different
without explanation as to why. Especially when some of the draw to the
processors was its ability to handle un-aligned data access.
Again thanks for all the input and hopefully other can benefit from this
discussion as well.
Ben Anderson
> Ben,
>
> Accessing data using a pointer shouldn't make a difference.
>
> I checked the Cortex-A8 Technical Reference Manual section 4.2 and it
> says that Cortex-A8 supports unaligned words and half-words, but
> doesn't say anything about doubles. It may be that double words must
> be aligned.
The alignment requirement depends on the instructions used for
load/store. The VLDR and VLDM instructions (formerly FLDD/FLDS and
FLDMD/FLDMS) require 4-byte alignment. The VLDn (n=1,2,3,4) NEON
instructions take an optional alignment specifier. If no alignment is
specified, none is required; otherwise, the specified alignment is
required. If the A bit in the System Control Register is set,
strict alignment is always required.
When compiling normal floating-point code for Cortex-A8, gcc tends to
issue VLDR instructions for loads.
--
Måns Rullgård
ma...@mansr.com
On 3 Oct 2008, at 04:29, Bill Gatliff wrote:
> Here's a bad one I see from
> time to time:
>
> int i;
> char *ibuf = (char*)&i;
> char a, b, c, d;
>
> /* break apart i into its four bytes */
> a = ibuf[0];
> b = ibuf[1];
> c = ibuf[2];
> d = ibuf[3];
>
>
> The values of a, b, c, and d will be different on x86 vs. ARM.
Apologies if this is a stupid question, but just to clarify: Do you
mean a, b, c, d will be different in their signedness (though, only
different if you say something like anInt = (int)a and would therefore
see their sign)? It looks instead like your example is talking about
endianness, and I'm having trouble seeing how that could be the case
going from one LE machine to another LE machine. (Unless GCC ARM does
something freaky I don't know about...) :)
Cheers,
Matt