Synth-DIY Yahoo! Groups Archives

Thread

LPC2000 vs 8-bit MCUs

2005-03-07 by geno_24@yahoo.com

I found that FreeRTOS did a performance comparison on several 8-bit 
and 32-bit parts.  The link is http://www.freertos.org/PC/

This is a good resource if you are trying to figure out the 
performance improvement you'll get by going to the LPC2000 from 8-
bit.  The test parameters are given on the website.

Does anyone else have these kinds of comparisons or know where I 
could find them?


Test	        Philips         Atmel AVR  Microchip   TI
                LPC2106        (ATMega323) PIC18F452   MSP430F449
                Flash[RAM]
16bit addition	9.2µs [7.4µs]   55.2µs     71.6µs   27µs

16bit multiply	 9.7µs [8.2µs]	 71.4µs    193µs   72.4µs

16bit division	 26.4µs [22µs]	 536µs	   940µs    480µs

32bit multiply	10.4µs [8.76µs]	 180µs	   344µs    182µs

32bit subtract	 9.1µs [7.6µs]	 88.1µs   76.4µs   57.2µs

Bubble sort	 432µs [420µs]	 834µs   3.33ms     992µs

Blk mem move&comp 1.1ms [1.08ms] 7.9ms	 12.4ms    6.75ms
Cond'l branch to  48µs [46.8µs] 245.6µ  169µs    131µs
PUSH'ing&POP'ing  43µs          258µs 412µs 314µs

The PUSH and POP test was performed by pushing and popping a single 
register at a time. The ARM7 is capable of pushing/poping more than 
one register in a single instruction.

Re: LPC2000 vs 8-bit MCUs

2005-03-07 by embeddedjanitor

I'm always very skeptical of the usefulness of these low-level 
performance numbers. They can be useful for seeding the intuition 
engine, but should not be used too much.

These numbers can be useful if you know you have a lot of algorithmic 
work to do (eg. let's say you need to do a bunch of floating point 
calculations on a regular basis).

When it comes to measuring performance of actual running code, a lot 
of other factors come into effect, especially in embedded schearios. 
For example, peripheral access speed: Some devices are very slow at 
bit toggling etc.

You also need to check that the benchmarks match the way you use code. 
For example, the bubble sort. Was it sorting 8-bit or 32-bit values? 
An 8-bitter will chew 8-bit values nicely and do a relatively poor job 
on 32-bit values. On an ARM, the 32-bit stuff will be faster than the 
8-bit stuff.

I'm not sure I believe some of these numbers. For example, the 32-bit 
subtraction should be way faster on an ARM. I hunch that a lot of time 
is being chewed up by looping overheads or saving/storing the values 
(something that is often optimised out in real code) or some other 
time waster.


In short, I subscribe to the adage: Lies, damnd lies and benchmarks.

-- Charles




--- In lpc2000@yahoogroups.com, geno_24@y... wrote:

Show quoted textHide quoted text

> 
> I found that FreeRTOS did a performance comparison on several 8-bit 
> and 32-bit parts.  The link is http://www.freertos.org/PC/
> 
> This is a good resource if you are trying to figure out the 
> performance improvement you'll get by going to the LPC2000 from 8-
> bit.  The test parameters are given on the website.
> 
> Does anyone else have these kinds of comparisons or know where I 
> could find them?
> 
> 
> Test	        Philips         Atmel AVR  Microchip   TI
>                 LPC2106        (ATMega323) PIC18F452   MSP430F449
>                 Flash[RAM]
> 16bit addition	9.2µs [7.4µs]   55.2µs     71.6µs   27µs
> 
> 16bit multiply	 9.7µs [8.2µs]	 71.4µs    193µs   72.4µs
> 
> 16bit division	 26.4µs [22µs]	 536µs	   940µs    480µs
> 
> 32bit multiply	10.4µs [8.76µs]	 180µs	   344µs    182µs
> 
> 32bit subtract	 9.1µs [7.6µs]	 88.1µs   76.4µs   57.2µs
> 
> Bubble sort	 432µs [420µs]	 834µs   3.33ms     992µs
> 
> Blk mem move&comp 1.1ms [1.08ms] 7.9ms	 12.4ms    6.75ms
> Cond'l branch to  48µs [46.8µs] 245.6µ  169µs    131µs
> PUSH'ing&POP'ing  43µs          258µs 412µs 314µs
> 
> The PUSH and POP test was performed by pushing and popping a single 
> register at a time. The ARM7 is capable of pushing/poping more than 
> one register in a single instruction.

Re: [lpc2000] LPC2000 vs 8-bit MCUs

2005-03-07 by microbit

Hi "geno" and Charles,


> I'm not sure I believe some of these numbers. For example, the 32-bit 
> subtraction should be way faster on an ARM. I hunch that a lot of time 
> is being chewed up by looping overheads or saving/storing the values 
> (something that is often optimised out in real code) or some other 
> time waster.

I concur here....
For example :




Test                      Philips               Atmel AVR          Microchip   TI
                            LPC2106            (ATMega323)     PIC18F452   MSP430F449
                            Flash[RAM]

32bit multiply        10.4µs [8.76µs]    180µs                 344µs         182µs


I find it doubtful that an ATMega takes the same amount of time for a 32 bit multiply as MSP430.
(Assuming the same clocks)
F449 has a 16X16 HW multiply, and AVR has only 8X8 (fract) multiply, notwithstanding the 8 vs. 16 bit architecture.
That is also assuming that a 16 X 16 into 32 bit multiply is intended.
Probably the HW MUL wasn't used on MSP430 ..... silly, since ARM has such an instruction, it should be a fair
equation that HWMUL is used.

-- Kris


[Non-text portions of this message have been removed]

Re: LPC2000 vs 8-bit MCUs

2005-03-08 by tonalbuilder2002

Don't limit yourself to integers.  Those LPC's shine in floating 
point math, which is a place few PIC's and such dare to go.

I did some comparisons on the type of real world calculations I use, 
comparing the same code on a LPC-2106 at 59mHz compiled with GCC in 
CrossWorks, and a 16-bit Freescale MC9S12 processor at 25mHz, 
compiled on CodeWarrior.

The following code snippets are the interiors of a for(;;) loops.  
The times are in microseconds per iteration, including the for() loop 
overhead.  The variable "i" is a 32 bit int for the LPC-2106, and a 
16 bit int for the MC9S12...

for (i = 0; i < 32767; i++)...

There are no optimizations, and I am confident all the code is 
actually present and being executed for both processors.

Both CPU's execute out of on-board flash.

volatile float f1,f2,f3;

//********************************
f1 = (float) i + 1;
f2 = f1 * 3.0f;
f3 = f1 / f2;

LPC-2106, 10usec / iteration
MC9S12, 32usec/iteration
//********************************
f1 = (float) i;
f2 = sqrtf(f1);

LPC-2106, 38usec / iteration
MC9S12, 260usec / iteration
//********************************
f1 = (float) (i & 0x7fff);
f2 = cosf(f1);

LPC-2106, 59usec / iteration
MC9S12, 410usec / iteration
//********************************

On a math intensive splining algorithm the LPC-2106 is about 4.5 
times faster than the MC9S12.

Times on a 59mHz LPC-2214 seem to be identical to the LPC-2106.

Naturally there are some caveats, such as CodeWarrior seems to make 
calls do double precision cos() from within its cosf() function.  But 
I just call 'em as I seem 'em.

Aside from the fast floating point, I am also impressed by the speed 
with which GNU converts between float and int, which is bread and 
butter for my applications and a place where some compilers I have 
used are very weak.

Bottom line...LPC kicks fanny in the world of floating point, if not 
most other places.  And don't turn up your nose at that GNU compiler, 
either!

Bill T.
http://www.kupercontrols.com

--- In lpc2000@yahoogroups.com, "microbit" <microbit@c...> wrote:
> Hi "geno" and Charles,
> 
> 
> > I'm not sure I believe some of these numbers. For example, the 32-
bit 
> > subtraction should be way faster on an ARM. I hunch that a lot of 
time 
> > is being chewed up by looping overheads or saving/storing the 
values 
> > (something that is often optimised out in real code) or some 
other 
> > time waster.
> 
> I concur here....
> For example :
> 
> 
> 
> 
> Test                      Philips               Atmel AVR          
Microchip   TI
>                             LPC2106            (ATMega323)     
PIC18F452   MSP430F449
>                             Flash[RAM]
> 
> 32bit multiply        10.4µs [8.76µs]    180µs            

344µs         182µs
> 
> 
> I find it doubtful that an ATMega takes the same amount of time for 
a 32 bit multiply as MSP430.
> (Assuming the same clocks)
> F449 has a 16X16 HW multiply, and AVR has only 8X8 (fract) 
multiply, notwithstanding the 8 vs. 16 bit architecture.
> That is also assuming that a 16 X 16 into 32 bit multiply is 
intended.
> Probably the HW MUL wasn't used on MSP430 ..... silly, since ARM 
has such an instruction, it should be a fair

Show quoted textHide quoted text

> equation that HWMUL is used.
> 
> -- Kris
> 
> 
> [Non-text portions of this message have been removed]

Re: LPC2000 vs 8-bit MCUs

2005-03-08 by Richard

The results on this page should be taken in context - see the 
page "Context" on the site.  In particular:

"The test results show a 'comparison' of several 'systems' being used 
in a 'normal' manner. 

+ 'Comparison' because I don't attempt to give an absolute measure - 
only provide results so different systems can be compared to each 
other. 

+ 'System' in that both the compiler and hardware are included. 

+ 'Normal' in that no attempt is made to optimise to the particular 
hardware. Only C code is used. If for example the hardware included a 
hardware multiplier then the test does not specifically write 
assembler code to ensure the multiplication is done in the fastest 
possible way"

The data does not pretend to be anything it is not and I agree with 
all the comments that have been made.

Loop overheads are of coarse present but minimised by performing each 
operation many multiple of time each loop.

Best regards.






--- In lpc2000@yahoogroups.com, "embeddedjanitor" <manningc2@a...> 
wrote:
> 
> I'm always very skeptical of the usefulness of these low-level 
> performance numbers. They can be useful for seeding the intuition 
> engine, but should not be used too much.
> 
> These numbers can be useful if you know you have a lot of 
algorithmic 
> work to do (eg. let's say you need to do a bunch of floating point 
> calculations on a regular basis).
> 
> When it comes to measuring performance of actual running code, a 
lot 
> of other factors come into effect, especially in embedded 
schearios. 
> For example, peripheral access speed: Some devices are very slow at 
> bit toggling etc.
> 
> You also need to check that the benchmarks match the way you use 
code. 
> For example, the bubble sort. Was it sorting 8-bit or 32-bit 
values? 
> An 8-bitter will chew 8-bit values nicely and do a relatively poor 
job 
> on 32-bit values. On an ARM, the 32-bit stuff will be faster than 
the 
> 8-bit stuff.
> 
> I'm not sure I believe some of these numbers. For example, the 32-
bit 
> subtraction should be way faster on an ARM. I hunch that a lot of 
time 
> is being chewed up by looping overheads or saving/storing the 
values 
> (something that is often optimised out in real code) or some other 
> time waster.
> 
> 
> In short, I subscribe to the adage: Lies, damnd lies and benchmarks.
> 
> -- Charles
> 
> 
> 
> 
> --- In lpc2000@yahoogroups.com, geno_24@y... wrote:
> > 
> > I found that FreeRTOS did a performance comparison on several 8-
bit 
> > and 32-bit parts.  The link is http://www.freertos.org/PC/
> > 
> > This is a good resource if you are trying to figure out the 
> > performance improvement you'll get by going to the LPC2000 from 8-
> > bit.  The test parameters are given on the website.
> > 
> > Does anyone else have these kinds of comparisons or know where I 
> > could find them?
> > 
> > 
> > Test	        Philips         Atmel AVR  Microchip   TI
> >                 LPC2106        (ATMega323) PIC18F452   MSP430F449
> >                 Flash[RAM]
> > 16bit addition	9.2µs [7.4µs]   55.2µs     71.6µs   27µs
> > 
> > 16bit multiply	 9.7µs [8.2µs]	 71.4µs    193µs   72.4µs
> > 
> > 16bit division	 26.4µs [22µs]	 536µs	   940µs    480µs
> > 
> > 32bit multiply	10.4µs [8.76µs]	 180µs	   344µs    182µs
> > 
> > 32bit subtract	 9.1µs [7.6µs]	 88.1µs   76.4µs   57.2µs
> > 
> > Bubble sort	 432µs [420µs]	 834µs   3.33ms     992µs
> > 
> > Blk mem move&comp 1.1ms [1.08ms] 7.9ms	 12.4ms    6.75ms
> > Cond'l branch to  48µs [46.8µs] 245.6µ  169µs    131µs
> > PUSH'ing&POP'ing  43µs          258µs 412µs 314µs
> > 
> > The PUSH and POP test was performed by pushing and popping a 
single 
> > register at a time. The ARM7 is capable of pushing/poping more 
than 
> > one register in a single instruction.

Lpc2000

LPC2000 vs 8-bit MCUs

LPC2000 vs 8-bit MCUs

Re: LPC2000 vs 8-bit MCUs

Re: [lpc2000] LPC2000 vs 8-bit MCUs

Re: LPC2000 vs 8-bit MCUs

Re: LPC2000 vs 8-bit MCUs

Move to quarantaine