porting from x86 to omap3 (arm)

Ben Anderson

unread,

Oct 2, 2008, 2:20:41 PM10/2/08

to Beagle Board

All,

I was wondering if anyone had suggestions as to what to look out for
when porting from a Linux application written in C for x86 to arm in the
omap3.

I have found several sites about issues dealing with porting to an arm
processor in general but wasn't sure how many still apply to the arm
that the arm cortex A8.

I have found some information about unsigned char vs signed char that
does seem to still be the case.

I have also read a lot about alignment issues that I haven't really seen
in my tests.

Any ideas/suggestions/pointers are most welcome.

Ben Anderson

Bill Gatliff

unread,

Oct 2, 2008, 2:42:08 PM10/2/08

to beagl...@googlegroups.com

Ben Anderson wrote:
> All,
>
> I was wondering if anyone had suggestions as to what to look out for
> when porting from a Linux application written in C for x86 to arm in the
> omap3.
>
> I have found several sites about issues dealing with porting to an arm
> processor in general but wasn't sure how many still apply to the arm
> that the arm cortex A8.
>
> I have found some information about unsigned char vs signed char that
> does seem to still be the case.
>
> I have also read a lot about alignment issues that I haven't really seen
> in my tests.

Those two are the big ones. Unless you are dealing with hardware i/o, then you
probably don't need to worry about anything else.

If you aren't seeing alignment issues, then you must be writing some pretty
decent C code. In particular, anytime you are casting pointers you are probably
asking for trouble. x86 is very forgiving, ARM isn't.

Signed vs. unsigned is a compiler issue. You can throw switches in gcc to make
unspecified chars be signed or unsigned, it's just that the defaults for ARM are
opposite to x86. Again, writing portable C code is the way to avoid this
problem. Dan Saks has some columns on embedded.com about this topic.

b.g.
--
Bill Gatliff
bg...@billgatliff.com

Bill Gatliff

unread,

Oct 2, 2008, 3:29:17 PM10/2/08

to beagl...@googlegroups.com

Forgot one more: endianness.

Any time you do anything that assumes a certain byte-ordering for multi-byte
values, the differences between ARM and x86 will trip you up. This includes
casting integer pointers to character pointers and then dereferencing the char
pointer via []'s, using unions to pick apart bytes of an integer, memcpy'ing
structures to storage on one machine, and then reading them back on another machine.

The << and >> operators work properly, because the C language specification is
clear as to how they are supposed to work. If you look at the assembly language
and memory representations of values as you shift them, however, you'll see they
behave differently in ARM vs. x86.

As with the others, portable C code is your friend here too.

John Beetem

unread,

Oct 2, 2008, 3:31:22 PM10/2/08

to Beagle Board

You're in luck. Cortex-A8 allows unaligned access, according to the
Cortex-A8 Technical Reference at www.arm.com. Earlier ARM processors
did not allow unaligned access. Unaligned accesses usually cause a
loss of performance, so it's best to avoid them.

You're also in luck regarding little versus big-endian byte
numbering. ARM is an "either-endian" architecture, but specific
implementations usually hard-wire one or the other. As far as I can
tell, OMAP uses little-endian like the x86. (Or else it's either-
endian and software sets it to little-endian by default.) Little
versus big-endian would be an issue if you wanted to port to a big-
endian architecture like PowerPC.

As Bill mentioned, signed versus unsigned char is a compiler option
and always an issue when porting code from one compiler to another,
even on the same architecture. Other gotchas along these lines
include how structures are packed, and what byte size is used to
implement enums.

John Beetem

unread,

Oct 2, 2008, 3:38:59 PM10/2/08

to Beagle Board

Here's another subtle one: multi-byte character constants.

Modern C compilers let you define a multi-byte character constant like
'ab', which results in a two-byte short value. However, which
chararcter ends up in which byte was not well specified so it depends
on the compiler. Borland C and GCC do it differently even if both are
compiling for x86.

Koen Kooi

unread,

Oct 2, 2008, 3:48:56 PM10/2/08

to Beagle Board

On 2 okt, 21:31, John Beetem <johnbee...@yahoo.com> wrote:
> You're in luck. Cortex-A8 allows unaligned access, according to the

> Cortex-A8 Technical Reference atwww.arm.com. Earlier ARM processors

> did not allow unaligned access. Unaligned accesses usually cause a
> loss of performance, so it's best to avoid them.
>
> You're also in luck regarding little versus big-endian byte
> numbering. ARM is an "either-endian" architecture, but specific
> implementations usually hard-wire one or the other. As far as I can
> tell, OMAP uses little-endian like the x86. (Or else it's either-
> endian and software sets it to little-endian by default.) Little
> versus big-endian would be an issue if you wanted to port to a big-
> endian architecture like PowerPC.

You can still run FPA code on the cortex which is mixed-endian. Why
one would do that, I don't know :)

regards,

Koen

Måns Rullgård

unread,

Oct 2, 2008, 6:39:49 PM10/2/08

to beagl...@googlegroups.com

Koen Kooi <koen...@gmail.com> writes:

> On 2 okt, 21:31, John Beetem <johnbee...@yahoo.com> wrote:
>> You're in luck. Cortex-A8 allows unaligned access, according to the
>> Cortex-A8 Technical Reference atwww.arm.com. Earlier ARM processors
>> did not allow unaligned access. Unaligned accesses usually cause a
>> loss of performance, so it's best to avoid them.

Unaligned accesses cost one cycle extra on Cortex-A8. This is much
less than manually shifting the bytes into place. Writing the code
properly avoids most unaligned data, but sometimes it's unavoidable,
for instance in networking code.

>> You're also in luck regarding little versus big-endian byte
>> numbering. ARM is an "either-endian" architecture, but specific
>> implementations usually hard-wire one or the other. As far as I can
>> tell, OMAP uses little-endian like the x86. (Or else it's either-
>> endian and software sets it to little-endian by default.) Little
>> versus big-endian would be an issue if you wanted to port to a big-
>> endian architecture like PowerPC.
>
> You can still run FPA code on the cortex which is mixed-endian. Why
> one would do that, I don't know :)

You can't run FPA code on a Cortex CPU. It doesn't have an FPA unit.
You probably meant soft-fpa.

--
Måns Rullgård
ma...@mansr.com

Ben Anderson

unread,

Oct 2, 2008, 7:21:51 PM10/2/08

to beagl...@googlegroups.com

On Thu, 2008-10-02 at 13:42 -0500, Bill Gatliff wrote:
> Ben Anderson wrote:
> > All,
> >
> > I was wondering if anyone had suggestions as to what to look out for
> > when porting from a Linux application written in C for x86 to arm in the
> > omap3.
> >
> > I have found several sites about issues dealing with porting to an arm
> > processor in general but wasn't sure how many still apply to the arm
> > that the arm cortex A8.
> >
> > I have found some information about unsigned char vs signed char that
> > does seem to still be the case.
> >
> > I have also read a lot about alignment issues that I haven't really seen
> > in my tests.
>
> Those two are the big ones. Unless you are dealing with hardware i/o, then you
> probably don't need to worry about anything else.
>
> If you aren't seeing alignment issues, then you must be writing some pretty
> decent C code. In particular, anytime you are casting pointers you are probably
> asking for trouble. x86 is very forgiving, ARM isn't.
>

What is it about casting of pointers that is bad? Is it de-referencing
pointers to un-aligned data elements?

I don't mean to say that casting of any type in general is good. I am
just trying to get a firm grasp on what the issues are.

> Signed vs. unsigned is a compiler issue. You can throw switches in gcc to make
> unspecified chars be signed or unsigned, it's just that the defaults for ARM are
> opposite to x86. Again, writing portable C code is the way to avoid this
> problem. Dan Saks has some columns on embedded.com about this topic.
>
>
>
> b.g.

I looked through some of Dan Saks articles and in one he does mention
caution against casting of pointers but no real details.

I still having hard time finding this info via google. So if anyone
knows of some site that goes through the alignment/pointer issue with
possibly some examples let me know. It would be much appreciated.

Thanks all of you for your input!

Ben Anderson

John Beetem

unread,

Oct 2, 2008, 8:01:51 PM10/2/08

to Beagle Board

> I still having hard time finding this info via google. So if anyone
> knows of some site that goes through the alignment/pointer issue with
> possibly some examples let me know. It would be much appreciated.

Wikipedia to the rescue: http://en.wikipedia.org/wiki/Data_structure_alignment

Which points to this nifty page: http://www.eventhelix.com/RealtimeMantra/ByteAlignmentAndOrdering.htm

For tech stuff, I always start with Wikipedia.

glennswest

unread,

Oct 2, 2008, 10:12:27 PM10/2/08

to Beagle Board

Actually most of the "fun" is in cross-compiling setup.
If you put a sata based disk or network mount a nice shared drive,
there no reason you cannot build a native compiler. (Cross tool
supports it)
So then you compile on the beagle itself. Most of the cross compile
issues go away.
Of course if you something like arm-debian, most packages are in
binary anyway.
(Including the compiler).

It may be slower than your quad core, but not that much.

Also for me I like working in ruby, for gui development. Which the
omap can
run easily. And once ruby is compiled, the rest is interpreted.

Albert Nguyen

unread,

Oct 2, 2008, 10:49:27 PM10/2/08

to beagl...@googlegroups.com

Isn't the NEON coprocessor is kinda like an FPA?

Bill Gatliff

unread,

Oct 2, 2008, 11:11:12 PM10/2/08

to beagl...@googlegroups.com

Albert Nguyen wrote:
> Isn't the NEON coprocessor is kinda like an FPA?

As I've tinkered with it, NEON provides major acceleration for media-intensive
activities. But for more general-purpose floating-point operations, it doesn't
outperform a true FPA.

So is that a yes, or a no? It depends. :)

Bill Gatliff

unread,

Oct 2, 2008, 11:29:40 PM10/2/08

to beagl...@googlegroups.com

Ben Anderson wrote:
>
> What is it about casting of pointers that is bad? Is it de-referencing
> pointers to un-aligned data elements?

Correct.

Generally speaking, if you have to cast a pointer then you must have lied to the
compiler at some point about what the object in question actually is. That's
almost always a bad idea, particularly so when writing portable code. Better to
clearly convey to the compiler what's going on, and let it and the rules of C
work to your advantage.

> I don't mean to say that casting of any type in general is good. I am
> just trying to get a firm grasp on what the issues are.

If you cast a char* to an int*, then you risk problems on machines where the
alignment restrictions are different for the two data types. x86 doesn't care,
so if your char wasn't word-aligned then when you dereference the casted pointer
to int, nothing bad happens. On ARM, however, you get an exception (*).

* - except on some of the newest ARM cores, apparently. I avoid the problem so
that I don't have to care which core I'm running on!

These kinds of problems can be nasty to test for, because they seem to travel
along with dynamically-allocated data structures, buffers, etc. that are very
sensitive to system state. So you might make several passes over the same code
successfully before the !kaboom! happens. Better to prove the code right by
inspection beforehand, which is only possible if you follow C's rules carefully.

>
> I looked through some of Dan Saks articles and in one he does mention
> caution against casting of pointers but no real details.
>
> I still having hard time finding this info via google. So if anyone
> knows of some site that goes through the alignment/pointer issue with
> possibly some examples let me know. It would be much appreciated.

Just don't cast pointers, and you should be fine. Here's a bad one I see from
time to time:

int i;
char *ibuf = (char*)&i;
char a, b, c, d;

/* break apart i into its four bytes */
a = ibuf[0];
b = ibuf[1];
c = ibuf[2];
d = ibuf[3];

The values of a, b, c, and d will be different on x86 vs. ARM. Ditto if you go
the opposite way:

char cbuf[sizeof(int)];
int i;

i = cbuf[0];
i = (i << 8) + cbuf[1];
i = (i << 8) + cbuf[2];
i = (i << 8) + cbuf[3];

This code is actually portable if you load up cbuf the right way each time,
regardless of the endianness of the machine (which can be tricky). BUT, you
will still get different values for i if char's are signed on one machine but
unsigned on another. BUT BUT, you won't see the problem until the
most-significant bit of a byte in cbuf is set.

The two above examples aren't casts per-se, but they are definitely
representation transformations of the same type that casts cause. So I lump
them together.

Just watch out for stuff like that. You tend to know when you're in risky
territory, because the code starts to look very much like the above.

Bill Gatliff

unread,

Oct 3, 2008, 12:33:34 AM10/3/08

to beagl...@googlegroups.com

Bill Gatliff wrote:
> BUT, you will still get different values for i if char's are signed on one machine but
> unsigned on another. BUT BUT, you won't see the problem until the
> most-significant bit of a byte in cbuf is set.

Forgot to mention: fix this by making cbuf an _unsigned_ char in all cases.

Laurent Desnogues

unread,

Oct 3, 2008, 2:30:50 AM10/3/08

to beagl...@googlegroups.com

On Fri, Oct 3, 2008 at 5:11 AM, Bill Gatliff <bg...@billgatliff.com> wrote:
>
> Albert Nguyen wrote:
>> Isn't the NEON coprocessor is kinda like an FPA?
>
> As I've tinkered with it, NEON provides major acceleration for media-intensive
> activities. But for more general-purpose floating-point operations, it doesn't
> outperform a true FPA.
>
> So is that a yes, or a no? It depends. :)

I would be surprised to see single precision code running faster on an
FPA unit than on a NEON unit at the same frequency. For double
precision, well NEON doesn't support it.

Laurent

Måns Rullgård

unread,

Oct 3, 2008, 5:09:48 AM10/3/08

to beagl...@googlegroups.com

No more so than an ARM processor is kinda like a PPC.

--
Måns Rullgård
ma...@mansr.com

Torne Wuff

unread,

Oct 3, 2008, 6:03:09 AM10/3/08

to beagl...@googlegroups.com

Indeed.. and of course the Cortex-A8 also has VFPv3. VFP is the
successor to ARM's ancient FPA coprocessor, and can do single and double
precision operations. For general floating point you can use a mixture
of VFPv3 and NEON instructions, since these use the same registers.

--
Torne Wuff
to...@wolfpuppy.org.uk

Måns Rullgård

unread,

Oct 3, 2008, 6:16:06 AM10/3/08

to beagl...@googlegroups.com

The Cortex-A8 has a VFPlite unit, which implements the full VFPv3
instruction set, but it is not pipelined, meaning floating-point-heavy
code runs rather slowly, but thanks to the instruction FIFO shouldn't
impact the execution speed too much when the bulk of the code run in the
ARM pipeline.

When in "runfast" mode, single-precision VFP instructions execute in the
NEON pipeline. If using gcc, adding the flags -ffast-math -fno-math-errno,
and avoiding double precision is advisable, assuming results are still
accurate enough.

--
Måns Rullgård
ma...@mansr.com

Ben Anderson

unread,

Oct 3, 2008, 1:10:33 PM10/3/08

to Beagle Board

The issue I am having is the an application developer is casting a
(unsigned char *) to a double * then de-referencing it into a double.
Of course this has historically worked fine in x86 but is hanging under
the cortex A8.

Here is a sample of what the application does.

--------------------------------------------------------------
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>

#pragma pack(1)
typedef struct INFO {
int fld1;
int fld2;
char fld3;
double fld4;
} INFO;
#pragma pack()

INFO info = { 1, 2, '3', 4.0 };

int main(int argc, char **argv)
{
unsigned char *data;
double dbl;

data = (unsigned char *)&info.fld4;
printf("&info=0x%08x &info.fld4=0x%08x offsetof(INFO, fld4)=%d\n",
(unsigned int)&info, (unsigned int)data, offsetof(INFO, fld4));

dbl = *(double *)data;

printf("fld4=%g dbl=%g\n", info.fld4, dbl);

return 0;
}
--------------------------------------------------------------

The issues is that this works/doesn't work by just changing the code in
small ways. For example adding extra vars in different places. Also
removing/adding printf also seem to effect it as well. Making the
double aligned also solves the problem (obviously).

When it doesn't work the program will just sit in what appears to be an
endless loop with cpu usage maxed out. It also causes the number in the
"User" field in /proc/cpu/alignment to change.

If I turn on warning (echo 1 > /proc/cpu/alignment) I get the following
repeated over and over.

Alignment trap: lt (1211) PC=0x00008394 Instr=0xe8980300
Address=0x000105e5 FSR 0x001

It is interesting to note that the Address in the warning is the same as
the address of the data in the above example.

The hard part is that I can get this code to fail/succeed in various
ways. The code at some points has succeeded even though the data
variable is pointing to misaligned address. This indicates to me that
its not *just* a pointer to mis-aligned data problem. But I have know
idea why it works sometime and not others.

I can also get the code to work by turning on the automatic kernel fix
for unaligned accesses (echo 2 > /proc/cpu/alignment).

I am also not sure anymore what the real problem is. I not sure if
there is some coding flaw I am not seeing, some arm vs x86 issue I am
not understanding or if it is a bug in gcc/libs/etc. Is it possible
that the kernel is doing something that is needed for the earlier arms
but gets in the way for the cortex-A8?

According to the cortex A8 documents (mentioned in a previous emai) the
cortex can access un-aligned data. Does this also apply to pointers?

I know there is ways to get around doing it this way but I have to
satisfy my curiosity as to what is going wrong and I not sure the
applications developer will like me telling them to just do it different
without explanation as to why. Especially when some of the draw to the
processors was its ability to handle un-aligned data access.

Again thanks for all the input and hopefully other can benefit from this
discussion as well.

Ben Anderson

John Beetem

unread,

Oct 3, 2008, 7:36:06 PM10/3/08

to Beagle Board

Ben,

Accessing data using a pointer shouldn't make a difference.

I checked the Cortex-A8 Technical Reference Manual section 4.2 and it
says that Cortex-A8 supports unaligned words and half-words, but
doesn't say anything about doubles. It may be that double words must
be aligned.

In your example, "#pragma pack(1)" aligns all fields to byte
boundaries. This aligns double field fld4 to an odd address, since it
immediately follows the char. This is consistent with your alignment
trap at address 0x000105E5 which is also odd.

Is there a particular reason that the structures must be packed to
byte boundaries? If not, it may work to remove the #pragma pack
directives and let the compiler align fields to its preferred
boundaries. Your code will run faster as well.

The 8088 processor in the original IBM PC had an 8-bit data bus, so
there was no penalty for non-aligned access. The x86 compilers
therefore preferred to pack structures as tighly as possible to fit
into those 64KB segments.

John

Måns Rullgård

unread,

Oct 3, 2008, 8:12:20 PM10/3/08

to beagl...@googlegroups.com

John Beetem <johnb...@yahoo.com> writes:

> Ben,
>
> Accessing data using a pointer shouldn't make a difference.
>
> I checked the Cortex-A8 Technical Reference Manual section 4.2 and it
> says that Cortex-A8 supports unaligned words and half-words, but
> doesn't say anything about doubles. It may be that double words must
> be aligned.

The alignment requirement depends on the instructions used for
load/store. The VLDR and VLDM instructions (formerly FLDD/FLDS and
FLDMD/FLDMS) require 4-byte alignment. The VLDn (n=1,2,3,4) NEON
instructions take an optional alignment specifier. If no alignment is
specified, none is required; otherwise, the specified alignment is
required. If the A bit in the System Control Register is set,
strict alignment is always required.

When compiling normal floating-point code for Cortex-A8, gcc tends to
issue VLDR instructions for loads.

--
Måns Rullgård
ma...@mansr.com

Matt Evans

unread,

Oct 3, 2008, 6:20:04 AM10/3/08

to beagl...@googlegroups.com

Hi Bill,

On 3 Oct 2008, at 04:29, Bill Gatliff wrote:

> Here's a bad one I see from
> time to time:
>
> int i;
> char *ibuf = (char*)&i;
> char a, b, c, d;
>
> /* break apart i into its four bytes */
> a = ibuf[0];
> b = ibuf[1];
> c = ibuf[2];
> d = ibuf[3];
>
>
> The values of a, b, c, and d will be different on x86 vs. ARM.

Apologies if this is a stupid question, but just to clarify: Do you
mean a, b, c, d will be different in their signedness (though, only
different if you say something like anInt = (int)a and would therefore
see their sign)? It looks instead like your example is talking about
endianness, and I'm having trouble seeing how that could be the case
going from one LE machine to another LE machine. (Unless GCC ARM does
something freaky I don't know about...) :)