Synth-DIY Yahoo! Groups Archives

I'd agree: speed of the multiply is only part of the issue. 

We've been implementing DSP-type code on 32-bit RISC for a few years
now (and on the LPC2000 series since it came out). A few pointers:

1. Generally, for core DSP-type functions, you're looking at a factor 
of about 10:1 over the speed of a pure DSP for most filter operations. 
The bulk of this is taken up with buffer management: it's the lack of 
circular buffers with single cycle updating of pointers that's the real 
killer.

2. Having said that, there's a lot you can do with the available speed: 
we run a V.22bis modem on the part with plenty of CPU left over. This 
uses a significant amount of both FIR and IIR filtering.

3. Only real way to get a figure for the speed of a multiply is to do 
some simple benchmarking, making sure you use real data (i.e. not zero 
times zero).  Try putting together an inner-core of multiply and shift 
operations, and putting it in a wrapper that's called 100k times, and 
timing how long the whole lot takes.

4. Optimisation settings on the compiler are ***critical***. This is 
common to practically all RISC architectures and all compilers for 
them. The good news is that the compilers are very good at optimising. 
We use the GNU on ARM7 and IAR for some other platforms. From memory we 
use the next to highest level of optimisation for the GNU (the highest 
tends to bloat the code size too much): contact me directly if you want 
more details. Note: we have had problems in the past with IAR at high 
optimisation levels with it producing bad (i.e. wrong) code. The best 
approach is to look at the assembler output from the compiler for the 
critical sections of code, and chose the optimisation setting that 
gives the best results. The bottom line is that we've found no need to 
hand-code in assembler.

5. There are a few tricks you can do with pointers to speed the 
implementation of filters. Can't give any details I'm afraid.

6. One thing I would say is that for very small filters (e.g. the 4-tap 
IIR mentioned below), we've found it more efficient to move the data 
through the filter, rather than use pointers, as in:

reg[3] = reg[2];
reg[2] = reg[1];
reg[1] = reg[0];
reg[0] = new;

Hope this is of some help.

BTW: I'd be curious to hear what other people are doing in this area 
(i.e. DSP on RISC).

Regards
Brendan Murphy


--- In lpc2000@yahoogroups.com, "James Dabbs" <jdabbs@t...> wrote:
> > I'm making an application on a LPC2132, that must performe a
> > number of DSP operations. To find out how many operations I 
> > can perform, I need to know how many clock cycles it takes 
> > the LPC2132 to performe a 16 bit times 16 bit = 32 bit 
multiplication?
> 
> I believe it's 2-5 cycles to execute the MUL instruction.
> 
> Compared to dedicated DSP's, I've found the bigger limitation to be
> ARM's lack of circular buffer management.  In a 4-pole IIR filter, I 
saw
> most of the CPU BW spent managing pointers and ldr/str'ing rather than
> doing the actual math.
Lpc2000

Re: LPC2132 as DSP processor

Attachments

Move to quarantaine