I'd agree: speed of the multiply is only part of the issue. We've been implementing DSP-type code on 32-bit RISC for a few years now (and on the LPC2000 series since it came out). A few pointers: 1. Generally, for core DSP-type functions, you're looking at a factor of about 10:1 over the speed of a pure DSP for most filter operations. The bulk of this is taken up with buffer management: it's the lack of circular buffers with single cycle updating of pointers that's the real killer. 2. Having said that, there's a lot you can do with the available speed: we run a V.22bis modem on the part with plenty of CPU left over. This uses a significant amount of both FIR and IIR filtering. 3. Only real way to get a figure for the speed of a multiply is to do some simple benchmarking, making sure you use real data (i.e. not zero times zero). Try putting together an inner-core of multiply and shift operations, and putting it in a wrapper that's called 100k times, and timing how long the whole lot takes. 4. Optimisation settings on the compiler are ***critical***. This is common to practically all RISC architectures and all compilers for them. The good news is that the compilers are very good at optimising. We use the GNU on ARM7 and IAR for some other platforms. From memory we use the next to highest level of optimisation for the GNU (the highest tends to bloat the code size too much): contact me directly if you want more details. Note: we have had problems in the past with IAR at high optimisation levels with it producing bad (i.e. wrong) code. The best approach is to look at the assembler output from the compiler for the critical sections of code, and chose the optimisation setting that gives the best results. The bottom line is that we've found no need to hand-code in assembler. 5. There are a few tricks you can do with pointers to speed the implementation of filters. Can't give any details I'm afraid. 6. One thing I would say is that for very small filters (e.g. the 4-tap IIR mentioned below), we've found it more efficient to move the data through the filter, rather than use pointers, as in: reg[3] = reg[2]; reg[2] = reg[1]; reg[1] = reg[0]; reg[0] = new; Hope this is of some help. BTW: I'd be curious to hear what other people are doing in this area (i.e. DSP on RISC). Regards Brendan Murphy --- In lpc2000@yahoogroups.com, "James Dabbs" <jdabbs@t...> wrote: > > I'm making an application on a LPC2132, that must performe a > > number of DSP operations. To find out how many operations I > > can perform, I need to know how many clock cycles it takes > > the LPC2132 to performe a 16 bit times 16 bit = 32 bit multiplication? > > I believe it's 2-5 cycles to execute the MUL instruction. > > Compared to dedicated DSP's, I've found the bigger limitation to be > ARM's lack of circular buffer management. In a 4-pole IIR filter, I saw > most of the CPU BW spent managing pointers and ldr/str'ing rather than > doing the actual math.
Message
Re: LPC2132 as DSP processor
2005-05-10 by brendanmurphy37
Attachments
- No local attachments were found for this message.