Hi Bastian, may be there is one parameter missing in the equations we looked at so far. That parameter is "how many instructions are needed to perform a certain function?" ARM mode provides instructions (conditional ones) that are not availble in Thumb mode. In the average you need approx. 7 ARM instructions to execute the same functionality as with 10 Thumb instructions. The bottleneck is not the fetch any more if you have a wide bus or 0 wait states, it is the number of instructions you need to execute. Executing an ARM instruction does not take longer than executing a Thumb instruction, so executing 7 ARM instructions is faster than executing 10 Thumb instructions but 10 Thumb instructions need less space in the Flash! The other bottleneck, fetching the instructions so that CPU does not have to wait is only true when you execute from a Flash that can not keep up with the CPU execution cycles. If you can fetch the instructions no matter whether ARM or Thumb ten time faster than executing them, you will have to wait for the CPU. It is the combination of zero waitstates for 32-bit accesses and the CPU speed in MHz (given the same ARM7 CPU) that determine the overall performance. To verify your or my approach, I would recommend to execute some code from an internal RAM one time optimized for speed in ARM mode and one time optimized for size in Thumb mode. Running the same code with the same optimization rules on a LPC2106 from Flash will give you almost identical results as running from RAM. If you do the same thing on devices with narrower busses, e.g. OKI and Atmel, the benefit for ARM evapurates, it will probably even turn into a speed loss. The ARM mode is faster as long as the bus is not the bottleneck. Comparing the top speed from a LPC21xx running from Flash in ARM mode to a SAM7S also running from Flash in ARM mode will give you a significant difference. Bob. --- In lpc2000@yahoogroups.com, 42Bastian Schick <bastian42@m...> wrote: > lpc2100_fan <lpc2100_fan@y...> schrieb am Mon, 18 Jul 2005 16:03:08 > -0000: > > > If you have a 16-bit memory interface or worse a 16-bit Flash > > interface, use Thumb mode under all circumstances! It is faster and > > smaller code. > > Agree. > > > If you have a 32-bit interface and any kind of waitstates associated > > with it, Thumb is probably still faster and always smaller. A 32- bit > > RAM with zero watistates though will give you up to 30% higher > > performance in ARM mode compared to Thumb with the up to 30% code size > > penalty. > > Don't agree. At least not fully. If the CPU has a small cache (4 bytes), > then it fetches 2 thumb insns with one bus access. > > I looked a lot into generated assembly code to check if I should use > Thumb or ARM, and in 99% of the cases there is no real benefit of the > ARM mode. > I even tried to re-write parts of the RTOS - I write for living - in ARM > mode and found that there is only in very small cases a benefit (and there > of course I use it). > > > Using busses like the LPC2106 has them (128-bit to the Flash), ARM > > mode will definitely be faster. The same applies as for the RAM above. > > Using ARM mode for a complete program would be waste of code memory > > space, using ARM mode in interrupt service routines (that are entered > > in ARM mode anyhow) is a smart idea with the Philips devices. > > In thumb-mode you can fetch up to 8 insns from flash instead of 4 ARM > insns. > Giving the 30% you mentioned above, 8 thumb insns == 5.7 ARM insns. > > > -- > 42Bastian Schick
Message
W(or not) to use ARM mode
2005-07-19 by lpc2100_fan
Attachments
- No local attachments were found for this message.