Yahoo Groups archive

Lpc2000

Index last updated: 2026-04-28 23:31 UTC

Thread

Re: [lpc2100] Optimization of Capture Routine -> cycle table ?

Re: [lpc2100] Optimization of Capture Routine -> cycle table ?

2004-02-01 by capiman@t-online.de

This was a great idea !

I am now at around 5,8956 MBytes / second, which is close to 5,898 MBytes /
sec. ( = Fosc * 4 / 10).
So the two operations ( ldr ip, [r0, #0] and strb ip, [r2], #1) seems to
take in sum 10 cycles.
Is this correct ?

Re-adding my shift instruction: gives me around 5,360400 MBytes / s, so 11
cycles. So shift itself takes 1 cycle ?

BTW: Is there a cycle table somewhere for LPC21xx on the net or in an
appnote / manual ? Is it the same as for original ARMs ?

Greetings,

          Martin

----- Original Message ----- 
From: "Ben Dooks" <ben@...>
To: <lpc2100@yahoogroups.com>
Sent: Sunday, February 01, 2004 11:55 AM
Subject: Re: [lpc2100] Optimization of Capture Routine


> On Sat, Jan 31, 2004 at 09:01:05PM +0100, capiman@... wrote:
> > Hello,
> >
> > i want to read in 1 byte multiple times from the port pins as fast as
possible:
> > Currently i have the following C code:
> >
> > unsigned char Data[60000];
> >
> > void CaptureBuffer()
> > {
> >     unsigned char *ptr = &Data[0];
> >     unsigned char *ptrend = &Data[60000];
> >
> >     while(ptr < ptrend)
> >     {
> >         (*ptr) = (IOPIN >> LA_D0_BIT) & 0xff;
> >         ptr++;
> >     }
> > }
> >
> > When i compile it with gcc and option -O3, i can capture with around 3,9
MBytes/sec.
> > Avoiding the shift (by using P0.0 - P0.7) gives me 4,2 MBytes/sec.
> > Leaving out the (*ptr) = (IOPIN...) instruction gives me 11,8
MBytes/sec, but no more functionality :-)
> >
> > Can i improve the speed with inline assembler ? Produced assembler code
already looks very compact...
Show quoted textHide quoted text
> >
> > .L142:
> >  ldr ip, [r0, #0]
> >  strb ip, [r2], #1
> >  cmp r2, r1
> >  bcc .L142
> >
> > Are there any other tricks ?
>
> unrolling the loop a bit may help, as it reduces the number of branch
> instructions needed.
>
> -- 
> Ben
>
> Q:      What's a light-year?
> A:      One-third less calories than a regular year.
>
>
>
>
> Yahoo! Groups Links
>
> To visit your group on the web, go to:
>  http://groups.yahoo.com/group/lpc2100/
>
> To unsubscribe from this group, send an email to:
>  lpc2100-unsubscribe@yahoogroups.com
>
> Your use of Yahoo! Groups is subject to:
>  http://docs.yahoo.com/info/terms/
>
>

Re: Optimization of Capture Routine -> cycle table -> memory access times

2004-02-01 by Peter

--- In lpc2100@yahoogroups.com, capiman@t... wrote:
> So the two operations ( ldr ip, [r0, #0] and strb ip, [r2], #1)
> seems to take in sum 10 cycles.
> Is this correct ?

Purely from a core point of view, on ARM7TDMI a load takes three 
cycles:

1) Sets up the address
2) Reads from memory
3) Writes the data into the specified register

A store takes two cycles

1) Set up the address
2) Write the register value to memory.

Now the second cycle in each case may take a number of processor 
clock ticks depending on the memory system - there may be a delay 
because of a bridge between the AMBA bus and the memory bus, or 
there may be waitstates due to memory setup/access delays.

Something I really need to know is how long the second cycle for I/O 
and RAM accesses takes in processor clock ticks on lpc210x. If 
anyone can help I'd be most grateful.

For a cycle table, I'd suggest downloading the 7TDMI datasheet from 
ARM: http://www.arm.com/pdfs/DDI0029G_7TDMI_R3_trm.pdf - it goes 
into great depth explaining what's going on in each cycle of an 
instruction, allowing you to determine if it will be affected by 
external timing effects.

Peter.

Re: Optimization of Capture Routine -> cycle table ?

2004-02-01 by lpc2100_fan

Hi Martin,

the answer to your question about the number of cycles needed for
certain instructions id "it depends". First of all the LPC2100 devices
are regular ARM7 devices so in case you have a table in general it is
valid. In particular there are several accelerators in the LPC2100
that speed up execution. 
First it has a 128-bit wide Flash interface. This means that you will
always fetch 128 bit after a branch. In fact the device has even two
memory blocks of 128-bit width which are accessed alternatively. The
long story short you get the best performance out of the device if you
can locate the start address of a time critical function on a 128-bit
boundry!
This is my best availble hint for generally speeding up the micro.

Cheers, Bob


--- In lpc2100@yahoogroups.com, capiman@t... wrote:
> This was a great idea !
> 
> I am now at around 5,8956 MBytes / second, which is close to 5,898
MBytes /
> sec. ( = Fosc * 4 / 10).
> So the two operations ( ldr ip, [r0, #0] and strb ip, [r2], #1) seems to
> take in sum 10 cycles.
> Is this correct ?
> 
> Re-adding my shift instruction: gives me around 5,360400 MBytes / s,
so 11
> cycles. So shift itself takes 1 cycle ?
> 
> BTW: Is there a cycle table somewhere for LPC21xx on the net or in an
> appnote / manual ? Is it the same as for original ARMs ?
> 
> Greetings,
> 
>           Martin
> 
> ----- Original Message ----- 
> From: "Ben Dooks" <ben@f...>
> To: <lpc2100@yahoogroups.com>
> Sent: Sunday, February 01, 2004 11:55 AM
> Subject: Re: [lpc2100] Optimization of Capture Routine
> 
> 
> > On Sat, Jan 31, 2004 at 09:01:05PM +0100, capiman@t... wrote:
> > > Hello,
> > >
> > > i want to read in 1 byte multiple times from the port pins as
fast as
> possible:
> > > Currently i have the following C code:
> > >
> > > unsigned char Data[60000];
> > >
> > > void CaptureBuffer()
> > > {
> > >     unsigned char *ptr = &Data[0];
> > >     unsigned char *ptrend = &Data[60000];
> > >
> > >     while(ptr < ptrend)
> > >     {
> > >         (*ptr) = (IOPIN >> LA_D0_BIT) & 0xff;
> > >         ptr++;
> > >     }
> > > }
> > >
> > > When i compile it with gcc and option -O3, i can capture with
around 3,9
> MBytes/sec.
> > > Avoiding the shift (by using P0.0 - P0.7) gives me 4,2 MBytes/sec.
> > > Leaving out the (*ptr) = (IOPIN...) instruction gives me 11,8
> MBytes/sec, but no more functionality :-)
> > >
> > > Can i improve the speed with inline assembler ? Produced
assembler code
Show quoted textHide quoted text
> already looks very compact...
> > >
> > > .L142:
> > >  ldr ip, [r0, #0]
> > >  strb ip, [r2], #1
> > >  cmp r2, r1
> > >  bcc .L142
> > >
> > > Are there any other tricks ?
> >
> > unrolling the loop a bit may help, as it reduces the number of branch
> > instructions needed.
> >
> > -- 
> > Ben
> >
> > Q:      What's a light-year?
> > A:      One-third less calories than a regular year.
> >
> >
> >
> >
> > Yahoo! Groups Links
> >
> > To visit your group on the web, go to:
> >  http://groups.yahoo.com/group/lpc2100/
> >
> > To unsubscribe from this group, send an email to:
> >  lpc2100-unsubscribe@yahoogroups.com
> >
> > Your use of Yahoo! Groups is subject to:
> >  http://docs.yahoo.com/info/terms/
> >
> >

Re: Optimization of Capture Routine -> cycle table -> memory access times

2004-02-02 by lpc2100

There may not be differences timing wise but the LPC21xx uses
7TDMI-S Rev4 

http://www.arm.com/pdfs/DDI0234A_7TDMIS_R4.pdf

--- In lpc2100@yahoogroups.com, "Peter " <pmaloy@c...> wrote:
> --- In lpc2100@yahoogroups.com, capiman@t... wrote:
> > So the two operations ( ldr ip, [r0, #0] and strb ip, [r2], #1)
> > seems to take in sum 10 cycles.
> > Is this correct ?
> 
> Purely from a core point of view, on ARM7TDMI a load takes three 
> cycles:
> 
> 1) Sets up the address
> 2) Reads from memory
> 3) Writes the data into the specified register
> 
> A store takes two cycles
> 
> 1) Set up the address
> 2) Write the register value to memory.
> 
> Now the second cycle in each case may take a number of processor 
> clock ticks depending on the memory system - there may be a delay 
> because of a bridge between the AMBA bus and the memory bus, or 
> there may be waitstates due to memory setup/access delays.
> 
> Something I really need to know is how long the second cycle for 
I/O 
Show quoted textHide quoted text
> and RAM accesses takes in processor clock ticks on lpc210x. If 
> anyone can help I'd be most grateful.
> 
> For a cycle table, I'd suggest downloading the 7TDMI datasheet from 
> ARM: http://www.arm.com/pdfs/DDI0029G_7TDMI_R3_trm.pdf - it goes 
> into great depth explaining what's going on in each cycle of an 
> instruction, allowing you to determine if it will be affected by 
> external timing effects.
> 
> Peter.

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.