Yahoo Groups archive

Lpc2000

Index last updated: 2026-04-28 23:31 UTC

Message

Re: W(or not) to use ARM mode

2005-07-19 by embeddedjanitor

lp fan's comments are pretty much in agreement with what I have 
observed, but I can't help stir the pot some:

> may be there is one parameter missing in the equations we looked at 
> so far. That parameter is "how many instructions are needed to 
> perform a certain function?"
> 
> ARM mode provides instructions (conditional ones) that are not 
> availble in Thumb mode. In the average you need approx. 7 ARM 
> instructions to execute the same functionality as with 10 Thumb 
> instructions. The bottleneck is not the fetch any more if you have a 
> wide bus or 0 wait states, it is the number of instructions you need 
> to execute. Executing an ARM instruction does not take longer than 
> executing a Thumb instruction, so executing 7 ARM instructions is 
> faster than executing 10 Thumb instructions but 10 Thumb 
instructions 
> need less space in the Flash!

Thumb instructions are actually an encoding of ARM instructions so 
every Thumb instruction is essentially mapped to an equivalent ARM 
instruction for execution.

There are a few operations that you cannot do in Thumb (eg msr etc), 
but for most code the biggest difference is that the ARM instructions 
have the condition field on all instructions while the Thumb 
instuctions only have these on branching.

As a general rule of thumb, Thumb code takes 65% the space of ARM 
code. That means that the Thumb needs 65%/50% = 1.3x the amount of 
instructions as ARM code and will need to execute them.


As with all such general rules of thumb, they can only be applied in 
very broad-brush terms and there are some corner cases:

* Sometimes the ARM mode conditional fileds are not used and this 
provides no advantage.
* Sometimes the ARM code can execute many times faster than Thumb. For 
example using the condition fields often mean that the CPU can 
continue execution with no branching. This saves a lot because 
branching breaks the pipeline.


> 
> The other bottleneck, fetching the instructions so that CPU does not 
> have to wait is only true when you execute from a Flash that can not 
> keep up with the CPU execution cycles. If you can fetch the 
> instructions no matter whether ARM or Thumb ten time faster than 
> executing them, you will have to wait for the CPU.  It is the 
> combination of zero waitstates for 32-bit accesses and the CPU speed 
> in MHz (given the same ARM7 CPU) that determine the overall 
> performance.
> 
> To verify your or my approach, I would recommend to execute some 
code 
> from an internal RAM one time optimized for speed in ARM mode and 
one 
> time optimized for size in Thumb mode. 
> 
> Running the same code with the same optimization rules on a LPC2106 
> from Flash will give you almost identical results as running from 
> RAM. If you do the same thing on devices with narrower busses, e.g. 
> OKI and Atmel, the benefit for ARM evapurates, it will probably even 
> turn into a speed loss.

Yes it does. Thumb tends to be faster on 16-bit buses.

As always, read the docs. The SAM7, for example, is pretty much 
optimised for Thumb.

Attachments

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.