Synth-DIY Yahoo! Groups Archives

Hi Bastian,

may be there is one parameter missing in the equations we looked at 
so far. That parameter is "how many instructions are needed to 
perform a certain function?"

ARM mode provides instructions (conditional ones) that are not 
availble in Thumb mode. In the average you need approx. 7 ARM 
instructions to execute the same functionality as with 10 Thumb 
instructions. The bottleneck is not the fetch any more if you have a 
wide bus or 0 wait states, it is the number of instructions you need 
to execute. Executing an ARM instruction does not take longer than 
executing a Thumb instruction, so executing 7 ARM instructions is 
faster than executing 10 Thumb instructions but 10 Thumb instructions 
need less space in the Flash!

The other bottleneck, fetching the instructions so that CPU does not 
have to wait is only true when you execute from a Flash that can not 
keep up with the CPU execution cycles. If you can fetch the 
instructions no matter whether ARM or Thumb ten time faster than 
executing them, you will have to wait for the CPU.  It is the 
combination of zero waitstates for 32-bit accesses and the CPU speed 
in MHz (given the same ARM7 CPU) that determine the overall 
performance.

To verify your or my approach, I would recommend to execute some code 
from an internal RAM one time optimized for speed in ARM mode and one 
time optimized for size in Thumb mode. 

Running the same code with the same optimization rules on a LPC2106 
from Flash will give you almost identical results as running from 
RAM. If you do the same thing on devices with narrower busses, e.g. 
OKI and Atmel, the benefit for ARM evapurates, it will probably even 
turn into a speed loss.

The ARM mode is faster as long as the bus is not the bottleneck. 
Comparing the top speed from a LPC21xx running from Flash in ARM mode 
to a SAM7S also running from Flash in ARM mode will give you a 
significant difference. 

Bob.


--- In lpc2000@yahoogroups.com, 42Bastian Schick <bastian42@m...> 
wrote:
> lpc2100_fan <lpc2100_fan@y...> schrieb am Mon, 18 Jul 2005 16:03:08 
> -0000:
> 
> > If you have a 16-bit memory interface or worse a 16-bit Flash
> > interface, use Thumb mode under all circumstances! It is faster 
and
> > smaller code.
> 
> Agree.
> 
> > If you have a 32-bit interface and any kind of waitstates 
associated
> > with it, Thumb is probably still faster and always smaller. A 32-
bit
> > RAM with zero watistates though will give you up to 30% higher
> > performance in ARM mode compared to Thumb with the up to 30% code 
size
> > penalty.
> 
> Don't agree. At least not fully. If the CPU has a small cache (4 
bytes),
> then it fetches 2 thumb insns with one bus access.
> 
> I looked a lot into generated assembly code to check if I should use
> Thumb or ARM, and in 99% of the cases there is no real benefit of 
the
> ARM mode.
> I even tried to re-write parts of the RTOS - I write for living - 
in ARM
> mode and found that there is only in very small cases a benefit 
(and there
> of course I use it).
> 
> > Using busses like the LPC2106 has them (128-bit to the Flash), ARM
> > mode will definitely be faster. The same applies as for the RAM 
above.
> > Using ARM mode for a complete program would be waste of code 
memory
> > space, using ARM mode in interrupt service routines (that are 
entered
> > in ARM mode anyhow) is a smart idea with the Philips devices.
> 
> In thumb-mode you can fetch up to 8 insns from flash instead of 4 
ARM 
> insns.
> Giving the 30% you mentioned above, 8 thumb insns == 5.7 ARM insns.
> 
> 
> -- 
> 42Bastian Schick
Lpc2000

W(or not) to use ARM mode

Attachments

Move to quarantaine