Can someone simply explain the difference between ARM and THUMB
instructions and which is better?
I'm a bit confused and not finding any good answers.
James
Neither is simply "better", or they wouldn't both exist.
>>> I'm a bit confused and not finding any good answers.
>>
>> http://www.arm.com/products/CPUs/architecture.html
>
> http://www.arm.com/products/CPUs/archi-thumb.html
The above link tells you the advantages of Thumb over ARM, but not the other
way round.
Until ARMv6T2 (with Thumb2), ARM instructions are still required for certain
functions, notably exception handling. The ARM instruction set is also
naturally much more flexible, since it has 32-bits to encode in. The
notable limitations in Thumb are:
- no condition codes on most Thumb instructions (available on almost all ARM
instructions)
- most Thumb data processing instructions set status flags (optional in most
ARM instructions)
- most Thumb instructions only have access to half the register file
- inline shift functions are mostly unavailable with Thumb instructions.
HTH
John
--
John Penton, posting as an individual unless specifically indicated
otherwise.
Somewhat OT but, neither ARM nor Thumb instructions are able to
provide a 32-bit immediate operand[*], right?
[*] in which you have freedom to choose independently every bit.
> Until ARMv6T2 (with Thumb2), ARM instructions are still required for
> certain functions, notably exception handling. The ARM instruction set is
> also naturally much more flexible, since it has 32-bits to encode in. The
> notable limitations in Thumb are:
> - no condition codes on most Thumb instructions (available on almost all
> ARM instructions)
> - most Thumb data processing instructions set status flags (optional in
> most ARM instructions)
> - most Thumb instructions only have access to half the register file
> - inline shift functions are mostly unavailable with Thumb instructions.
Don't forget:
- small branch ranges for BL/B/Bcc (need branch veneers in large
functions/programs)
- no immediates on many instructions (ANDS, ORRS etc), small ranges on
others
(ADD, LDR) - means more literal pool loads
- most instructions are 2 operand rather than 3 (means extra moves)
- only immediate and register index addressing modes (no pre/post increment)
- no coprocessor/media/dsp instructions
Thumb-2 solves these problems by fusing ARM with Thumb.
Wilco
You are correct, The boffins at ARM couldn't think of any way
to make a 32 immediate fit in 32 bits while also using the
same 32 bits to encode some instruction.
You can get a 32 bit constant by loading from a literal pool,
but this is obviously slower.
My question did not imply that they have to share the same 32-bit
space. Many 8- and 16-bit cores provide 8- and 16-bit immediate
operands at the immediatly following positions in the code memory. I
guess my question can be reworded as: all ARM instructions are one
word, and all Thumb instructions are one half-word, right?
I'm just curious: Why couldn't they allow having a 32-bit immediate
operand right after the 32-bit instruction?
>You can get a 32 bit constant by loading from a literal pool,
>but this is obviously slower.
Yes. What I'm saying is a particular case of this, but having the
operand right next to the instruction (rather than far away, in a
table) is convenient, when you code in assembly.
Best,
Thanks, I at least knew that. Just the compiler I use supports both ARM
and Thumb settings for how to compile the code and didn't understand
the difference.
>
>
>>>>I'm a bit confused and not finding any good answers.
>>>
>>>http://www.arm.com/products/CPUs/architecture.html
>>
>>http://www.arm.com/products/CPUs/archi-thumb.html
>
>
Thanks everyone for these two links, they describe a lot about the
architecture and I think I know what the R means in the CPU part number now.
Just a little background..... about a year ago my company wanted to
upgrade our product to make it faster, have more FLASH, RAM, etc. The
ARM solution seemed the most promissing. I researched a bit and ended
up with an AT91R40008 (Atmel ARM) processor. We added 4MB of external
FLASH, 2MB of external SRAM, all running at 73MHz.
My comany also was happy with the development tools we where using with
the AVR series (IAR Embedded Workbench, they have excellent support!).
They also have a version for the ARM processor which made code easily
transported from one platform to the other.
I spent about 2 months developing with an EB40A development board and
managed to learn a lot of cool things about the ARM processor we are now
using.
I've only recently gotten back into the researching end when I noticed
the settings in the compiler for ARM/Thumb instructions.... only thing
is documentation is scarse.
Thanks,
James
yes, except for thumb BL. (IIRC). Thumb2 changes that.
> I'm just curious: Why couldn't they allow having a 32-bit
> immediate operand right after the 32-bit instruction?
It not the 'RISC way'.
I assume that pulling raw data in thru the instruction bus is
poor in terms of gate count and programming model.
The 'risc way' is to make the life of the programmer slightly
harder in order to make the chip simpiler, because you only
write the code once, but execute it a lot.
Thumb BL is mostly as 16-bit instructions in pairs on ARM7TDMI, but is
treated by the architecture as 32-bit.
Thumb-2 also allows many more literals to be loaded in a single instruction
without using the literal pool.
> >>You can get a 32 bit constant by loading from a literal
> >>pool, but this is obviously slower.
> >
> > Yes. What I'm saying is a particular case of this, but
> > having the operand right next to the instruction (rather
> > than far away, in a table) is convenient, when you code in
> > assembly.
The assembler should give you an easy way to do this. The common syntax for
ARM is:
LDR Rd, =<literal>
which the assembler will generate a MOV or similar instruction if possible,
otherwise a load from a literal pool (which the assembler will create for
you). You just need to make sure the assembler can put literal pools close
enough (usually not a problem).
Mike.
>My comany also was happy with the development tools we where using with
>the AVR series (IAR Embedded Workbench, they have excellent support!).
I also tried their 30-day eval version for ARM. Nice (although not as
nice as CrossWorks, in my opinion), but the debugger crashes many
times when using a parallel port Wiggler compatible ARM-JTAG device. I
guess you were using a USB version, right?
Are you sure? My understanding was that Thumb instruction data is _always_
(even in Thumb2) treated as 16-bit quantities (BL and Thumb2 included) -
particularly when it comes to determining which byte is which in a
big-endian system.
> Thumb-2 also allows many more literals to be loaded in a single
> instruction without using the literal pool.
Yup. Indeed, v6T2 and later allow any 32-bit literal to be loaded in two
(32-bit) instructions. In ARM or Thumb state. Which is equivilent to what
the OP was after.
>>>> You can get a 32 bit constant by loading from a literal
>>>> pool, but this is obviously slower.
>>>
>>> Yes. What I'm saying is a particular case of this, but
>>> having the operand right next to the instruction (rather
>>> than far away, in a table) is convenient, when you code in
>>> assembly.
>
> The assembler should give you an easy way to do this. The common
> syntax for ARM is:
>
> LDR Rd, =<literal>
>
> which the assembler will generate a MOV or similar instruction if
> possible, otherwise a load from a literal pool (which the assembler
> will create for you). You just need to make sure the assembler can
> put literal pools close enough (usually not a problem).
Are you refering to "the R profile for real-time systems", or something else
that I have missed?
If the former, then maybe not - ARM have yet to release any v7R processors.
A/R/M was not applied prior to ARMv7.
I haven't had any problems other than that.
Well a 32 bit Thumb2 instruction is architecturally and micro
architecturally one instruction. However the instructions are defined
in terms of two 16 bit half words.
-p
--
Paul Gotch
Software Engineer
Development Systems ARM Limited.
IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the named
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy
the information in any medium.
-----------------------------------------------------------------------
Indeed. Prior to Thumb-2 this was architecturally the case ("These Thumb
instructions [the BL/BLX pairs] must always occur in [pairs]") -- that is,
you're architecturally not allowed to put other instructions between the two
halfwords. But the encoding used meant that it was possible to build an
implementation where that would work. That's not the case in Thumb-2: the
encodings do not allow for this kind of implementation.
Anyway, that's just a diversion from the point! :-)
Mike.
--
Signature in for repair.
> codic <e@e.e> wrote in
> news:6io8o19hneoag01or...@4ax.com:
...
> > I'm just curious: Why couldn't they allow having a 32-bit
> > immediate operand right after the 32-bit instruction?
>
> It not the 'RISC way'.
It's not a religious thing :-)
> I assume that pulling raw data in thru the instruction bus is
> poor in terms of gate count and programming model.
> The 'risc way' is to make the life of the programmer slightly
> harder in order to make the chip simpiler, because you only
> write the code once, but execute it a lot.
Any immediate mode is shipping data in from the instruction bus.
Fixed length instructions mean you always know where the next
instruction starts. This helps enormously in a pipeline; the
subsequent decode can begin before the current instruction is
complete. [Okay - that's a simplification.]
Now consider superscalar issue - how do you know what is an
instruction and what is an address field? Either wait or speculate.
Fast, variable length instruction processors tend to get hot.
-- Jim Garside
So why not use ARM opcodes all the time ?
Because it is slower, but only in narrow bus systems.
If, (like philips), you feed the core with a 128 bit FLASH bus, then
there is no speed penalty with ARM. There is a size penalty, but ARM
codes can do more, with less opcodes - so how much of a penalty, depends
on how good/bad you want it to look.
> Thumb-2 solves these problems by fusing ARM with Thumb.
but adds some minor ones
** it is currently vaporware, with full details still under NDA
** it is not binary compatible - so might fly as well as the Philips
XA51. That was their New/better 80C51 "just recompile the source, boys.."
Yes, Thumb2 fixes many of the issues, and makes a better
microcontroller than the old ARM7 processor, but binary compatible it is
not, and history is littered with 'better technically' offerings, that
never made it over the software inertia hurdle.
Watching it with interest.
-jg
>Well a 32 bit Thumb2 instruction is architecturally and micro
>architecturally one instruction. However the instructions are defined
>in terms of two 16 bit half words.
And can we download a specification for Thumb-2 yet? A quick
trawl of the ARM site an Google revealed no technical specs.
Time for a new ARM ARM?
Stephen
--
Stephen Pelc, steph...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads
> Wilco Dijkstra wrote:
...
> > Thumb-2 solves these problems by fusing ARM with Thumb.
>
> but adds some minor ones
> ** it is currently vaporware, with full details still under NDA
> ** it is not binary compatible - so might fly as well as the Philips
> XA51. That was their New/better 80C51 "just recompile the source, boys.."
>
> Yes, Thumb2 fixes many of the issues, and makes a better
> microcontroller than the old ARM7 processor, but binary compatible it is
> not, and history is littered with 'better technically' offerings, that
> never made it over the software inertia hurdle.
I believe it -is- binary compatible with Thumb, so an ARM/Thumb-2
processor will still run old object code. There is some information
accompanying the free, time-bombed demo. software.
The version of these tools I looked at didn't quite assemble Thumb and
Thumb-2 identically, but close enough that differences are probably
bugs. (The assembler and disassembler disagreed in places!) The
instruction set design clearly works around the existing codings.
-- Jim Garside
Note: all the information I have on this has been deduced from open
sources; I have no `inside' information about Thumb-2.
The ARM1156T2-S exists. The ARM ARM has not been published yet but the
TRM for the ARM1156 is available from
http://www.arm.com/pdfs/DDI0338C_arm1156t2s_r0p0_trm.pdf
An instruction reference exists in the Assembler Guide for RVDS 2.2
which is available from
http://www.arm.com/pdfs/DUI0204F_rvct_assembler_guide.pdf
Full instruction set encoding details are available from ARM under NDA
until the ARM ARM covering Thumb-2 is published.
> ** it is not binary compatible
We've had this argument before. The ARM1156T2-S is fully backwards
compatible with ARM and THUMB binaries.
However binaries compiled for Thumb-2 will only execute on a processor
implementing Thumb-2, just as binaries compiled specifically for an
ARMv6 processor will not execute on an ARMv5 processor.
A Thumb-2 only processor exists in the shape of ARM Cortex-M3.
Which is confirming what I said...."full details still under NDA"
>>** it is not binary compatible
>
>
> We've had this argument before. The ARM1156T2-S is fully backwards
> compatible with ARM and THUMB binaries.
>
> However binaries compiled for Thumb-2 will only execute on a processor
> implementing Thumb-2, just as binaries compiled specifically for an
> ARMv6 processor will not execute on an ARMv5 processor.
>
> A Thumb-2 only processor exists in the shape of ARM Cortex-M3.
.. and that Thumb-2-only is thus NOT binary compatible with any ARM
core that supports Thumb and ARM opcodes.
You can see the confusion that results, from what is a poor clarity of
names. I guess ARM marketing wants to 'brand' Cortex, and they have
trouble grasping that not all Cortex cores are binary compatible, or
perhaps believe they can just talk-over such issues.
Philips tried that, and yes, the parts still exist, but no, they never
hit critical mass.
-jg
Apart from the increased hardware complexity in decode (as already
mentioned) this would also increase codesize and therefore reduce
performance and increase powerconsumption. This is because constants
would need to be duplicated in every instruction they are used. The
codesize cost of inline literals as you describe can be quite large, for
example all instructions accessing global variables would double in size.
When a literal pool is used, a literal can be reused several times by
different instructions (even in different functions). On a cached system
these literals end up in the cache and so are cheap to access.
It is also possible to use one literal for several purposes. A common
optimization is to use a single literal (base pointer) to access many global
variables. This reduces the number of literals needed even further and the
overhead of having to load them. On some RISCs several registers are
reserved just to hold these base pointers - ARM has 16 registers so
doesn't do this.
Wilco
>>>** it is not binary compatible
>>
>>
>> We've had this argument before. The ARM1156T2-S is fully backwards
>> compatible with ARM and THUMB binaries.
>>
>> However binaries compiled for Thumb-2 will only execute on a processor
>> implementing Thumb-2, just as binaries compiled specifically for an
>> ARMv6 processor will not execute on an ARMv5 processor.
>>
>> A Thumb-2 only processor exists in the shape of ARM Cortex-M3.
>
> .. and that Thumb-2-only is thus NOT binary compatible with any ARM core
> that supports Thumb and ARM opcodes.
Again, that is incorrect. They are binary compatible because both can
run Thumb-only binaries. This is exactly the same as other optional
features like VFP. A CPU without VFP is binary compatible with a
CPU with VFP as both can run binaries that don't use VFP. Compatibility
is a function of both the architecture and how the software is compiled.
You can compile a program to use all features of a CPU or to use a
subset so it can run on a range of CPUs. If you choose to use all
features then you're limiting your application to run on all CPUs of the
same or higher architecture and with the same set of optional features.
So all v4 ARM code runs on v4t, v5, v6 and Cortex-A and -R, while v4t
Thumb code runs on v5, v6 and Cortex-A, -R and -M. Additionally, all
Cortex-M code runs on Cortex-R and -A and all Cortex-R code runs on
Cortex-A. So Cortex is not different apart from the name.
If however you want to run an application on a wide range of architectures
then you'll have to compile for a subset of the CPU features. So for example
you may need to stop using VFP, Neon, v6 media, v5E DSP, ARM, Thumb-2
or Thumb-1 if the CPUs you want to compile for do not support these. It's
your decision, so if you use feature X in your application then yes you
can't
run it on a CPU that doesn't have feature X.
> You can see the confusion that results, from what is a poor clarity of
> names. I guess ARM marketing wants to 'brand' Cortex, and they have
> trouble grasping that not all Cortex cores are binary compatible, or
> perhaps believe they can just talk-over such issues.
> Philips tried that, and yes, the parts still exist, but no, they never
> hit critical mass.
The trouble is in your understanding of how architectures and binary
compatibility works. If you can't understand the above then you shouldn't
be making wild claims about compatibility and leave it to the experts!
Wilco
You miss the point. It is NOT about a common subset, but about what
opcodes MIGHT reside in a library, or hidden in a working system,
or in code that customers load to run on that system....
I'll make it very simple:
What does a Thumb-2-only core DO when it encounters a 32 bit ARM opcode ?
Perhaps our difference is the level of proof on Binary compatible -
You seem to require only that it can 'run some, common subset, code'
I apply a more rigourous, real world level of proof : If code that
RUNS on one core, CRASHES/fails on another, then it is NOT binary
compatible.
Perhaps users can say which "Binary Compatible" matters most to them ?
To me, one sounds like "marketing depts Binary Compatible"
[Viz: can we find _some_ code that runs on both ? - Hey, Great! ]
vs
" Engineers Binary Compatible "
[viz: Can we use our existing libraries & historical code ?
Can our customers use their proven code, on this new device ]
Just a simple Yes/No does here :)
Common-subset-Binary-Compatible is fine, provided you make that very
clear to users.
Right now, it matter little because there ARE no Cortex M3
microcontrollers.....
-jg
Binary compatible at the application level is one thing but binary
compatible at the OS level is another. Thumb 2 ISA-only cores handle
exceptions and interrupts in a manner that is different to ARM ISA cores. As
such they completely different. Now it may be possible to build code that
works with either by having seperate low level exception handlers, but thats
a bit like saying fat binaries are binary compatible because they have code
that runs on one ISA and code that runs on another. What peole need is
confidence that code developed for one system will work on another, which
means minimizing the distinct code paths.
Peter
>>>.. and that Thumb-2-only is thus NOT binary compatible with any ARM core
>>>that supports Thumb and ARM opcodes.
>>
>> Again, that is incorrect. They are binary compatible because both can
>> run Thumb-only binaries.
>
> You miss the point. It is NOT about a common subset, but about what
> opcodes MIGHT reside in a library, or hidden in a working system,
> or in code that customers load to run on that system....
The solution for this problem is to mark all objects and images with the
architecture version so that incompatibilities can be diagnosed.
> I'll make it very simple:
> What does a Thumb-2-only core DO when it encounters a 32 bit ARM opcode ?
It'll take the undefined instruction trap as it can't execute ARM
instructions.
At that point it's up to the OS to decide what to do: it can report an
error, kill
the current process, or emulate the instruction...
> Perhaps our difference is the level of proof on Binary compatible -
>
> You seem to require only that it can 'run some, common subset, code'
Your "common subset" is what most people call architecture.
> I apply a more rigourous, real world level of proof : If code that
> RUNS on one core, CRASHES/fails on another, then it is NOT binary
> compatible.
That is not rigourous but rubbish... For any two cores that are not
identical
you can find a code sequence that runs correctly on one but fails on the
other.
The architecture defines such sequences to be illegal. However the existence
proof of illegal instructions has no effect on binary compatibility
precisely
because the architecture defines them to be illegal.
A rigourous definition would be to say that code is compatible with two CPUs
A & B if and only if it uses an inclusive subset of the intersection of the
features
of A and B. The feature set needed by objects X and Y is the union of the
features of X and Y. Apply these rules recursively and you can calculate the
features needed by any piece of code, the features supplied by a set of CPUs
and therefore whether an application is compatible with set of CPUs.
Apply these rules to Cortex-M3 and say ARM7tdmi, and anybody can figure
out that you can run all v4t Thumb code on both, but not ARM. Since most
code compiled for ARM7 is Thumb code, not having ARM is not a problem.
> Perhaps users can say which "Binary Compatible" matters most to them ?
According to your definition AMD CPUs are incompatible with Intel CPUs.
How many people would agree with that statement, do you think?
> To me, one sounds like "marketing depts Binary Compatible"
> [Viz: can we find _some_ code that runs on both ? - Hey, Great! ]
> vs
> " Engineers Binary Compatible "
> [viz: Can we use our existing libraries & historical code ?
> Can our customers use their proven code, on this new device ]
>
> Just a simple Yes/No does here :)
Yes, all existing Thumb-1 code will run fine. Remember that Thumb-1 is
a subset of Thumb-2.
> Common-subset-Binary-Compatible is fine, provided you make that very clear
> to users.
What exactly is unclear in "Thumb-2-only"?
> Right now, it matter little because there ARE no Cortex M3
> microcontrollers.....
First silicon is expected soon. See http://www.embedded-developer.com/cm3/
Wilco
> Binary compatible at the application level is one thing but binary
> compatible at the OS level is another. Thumb 2 ISA-only cores handle
> exceptions and interrupts in a manner that is different to ARM ISA cores.
> As
> such they completely different.
No, most of the OS code is still the same, the only differences are at the
lowest
level (eg. the first few instructions in an interrupt handler). This is
nothing new
and is true for all ARM cores. CPUs with Thumb-1, Thumb-2, caches, MMU,
MPU, TCM, VFP, Neon, VIC, TrustZone, etc need different startup/exception
code to set up and use all those features.
However this fact has nothing to do with binary compatibility. Linux manages
to run the same binaries on many different CPUs eventhough they need
different startup code. So the suggestion that CPUs that need different
startup
code are not binary compatible is obviously wrong.
> Now it may be possible to build code that
> works with either by having seperate low level exception handlers, but
> thats
> a bit like saying fat binaries are binary compatible because they have
> code
> that runs on one ISA and code that runs on another. What peole need is
> confidence that code developed for one system will work on another, which
> means minimizing the distinct code paths.
The code at the lowest level always needs modifications for each CPU, but
the amount of such code is typically small. One of the goals of the new
exception model is to reduce the amount of such code further as well as the
amount of assembler. For example nested interrupt handlers are 100% C on
Cortex-M3, while on other ARMs you have to use assembler veneers.
Wilco
I think I'll rest my case here.
Users can decide themselves if that presents no problem, or something
that needs to be watched in their systems.
One question:
How does Thumb-2-only _know_ it is an ARM opcode ? - are all Thumb-2
opcodes mapped into empty/spare opcode space, on present ARM/Thumb
cores, or has Thumb-2 used some of those binary opcodes, for the
extensions ?
-jg
> One question:
> How does Thumb-2-only _know_ it is an ARM opcode ? - are
> all Thumb-2 opcodes mapped into empty/spare opcode space,
> on present ARM/Thumb cores, or has Thumb-2 used some of
> those binary opcodes, for the extensions ?
In excatly the same way as Thumb knows.
Code is just numbers, the ARM instruction set is 32bit numbers
and it covers all of that space - with some undefined or
unpredictable instructions, THUMB instruction set is 16bit
numbers, likeways it covers all of that space - with some
undefined or unpredictable instructions.
How does the core know to execute(IIRencodingC):
andeq r0,r0,r0 (ARM)
or add r0,r0,r0 add r0,r0,r0 (THUMB)
The numbers are the same, the core decides which decoding by
which _state_ it is executing in.
Thumb2 only cores will fault state change instructions instead
of switching to ARM state.