Yahoo Groups archive

Lpc2000

Index last updated: 2026-04-28 23:31 UTC

Thread

Strange GCC compiler assembler output

Strange GCC compiler assembler output

2006-05-03 by Jan Thogersen

Hi,

Now I'm trying to dig into the assembler generation from the GCC 
compiler and I've noticed something that I don't quite understand.

Here is the dump that got my attention:
    if (buf_inpos == buf_outpos) T0MR0++; else { // The fifo is empty
       E59F3130   ldr r3, [pc, #304]
       E5D33000   ldrb r3, [r3]
       E51B101C   ldr r1, [r11, #-28]
       E1530001   cmp r3, r1
       1A000009   bne 0x0000093C
       E3A0220E   mov r2, #0xE0000000
       E2822901   add r2, r2, #0x00004000
       E2822018   add r2, r2, #0x00000018
       E3A0320E   mov r3, #0xE0000000
       E2833901   add r3, r3, #0x00004000
       E2833018   add r3, r3, #0x00000018
       E5933000   ldr r3, [r3]
       E2833001   add r3, r3, #0x00000001
       E5823000   str r3, [r2]
       EA000031   b 0x00000A04

How come it takes 3 instruction to load the correct address into r2. 
First it moves #0xE0000000 to r2, then adds #0x00004000 and finally adds 
#0x00000018 to end up with #0xE0004018 which is the address of T0MR0. 
But why don't it just move #0xE0004018 into the r2 in the first instruction?

This strange way of loading the mem location is seen throughout the hole 
dump!

I'm compiling af optimizion level 3.

Best regards
  Jan

Re: Strange GCC compiler assembler output

2006-05-03 by jayasooriah

--- In lpc2000@yahoogroups.com, Jan Thogersen <jan@...> wrote:

>        E3A0220E   mov r2, #0xE0000000
>        E2822901   add r2, r2, #0x00004000
>        E2822018   add r2, r2, #0x00000018
...
> How come it takes 3 instruction to load the correct address into r2. 

My guess is that you are generating 16-bit code with and compiler is
minimising code size.  The above three instructions then would occupy
three 16-bit locations in memory, while the replacement
 
>        ldr r2 #e0004018

would require one 32-bit instruction and another 32-bit to store the
constant.

Jaya

Re: [lpc2000] Strange GCC compiler assembler output

2006-05-03 by Dominic Rath

On Wednesday 03 May 2006 14:19, Jan Thogersen wrote:
> How come it takes 3 instruction to load the correct address into r2.
> First it moves #0xE0000000 to r2, then adds #0x00004000 and finally adds
> #0x00000018 to end up with #0xE0004018 which is the address of T0MR0.
> But why don't it just move #0xE0004018 into the r2 in the first
> instruction?

You can't have an opcode + 32-bit operand when your instruction is only 32-bit 
wide. Move immediate takes a 12-bit operand, consisting of an 8-bit value and 
a 4-bit rotating constant, allowing you to shift the 8-bit to any even bit 
location.

Regards,

Dominic

Re: Strange GCC compiler assembler output

2006-05-03 by jayasooriah

Oppps... I should have seen the binary code -- you *are* generating
32-bit code.

In this case, I think the compiler decided not to use literal pools
for some reason.  May be there is no place nearby to place the pool
constants.

So I really have no idea what it is trying to optimise, if this is
what it is doing!

Jaya

--- In lpc2000@yahoogroups.com, "jayasooriah" <jayasooriah@...> wrote:
Show quoted textHide quoted text
>
> --- In lpc2000@yahoogroups.com, Jan Thogersen <jan@> wrote:
> 
> >        E3A0220E   mov r2, #0xE0000000
> >        E2822901   add r2, r2, #0x00004000
> >        E2822018   add r2, r2, #0x00000018
> ...
> > How come it takes 3 instruction to load the correct address into r2. 
> 
> My guess is that you are generating 16-bit code with and compiler is
> minimising code size.  The above three instructions then would occupy
> three 16-bit locations in memory, while the replacement
>  
> >        ldr r2 #e0004018
> 
> would require one 32-bit instruction and another 32-bit to store the
> constant.
> 
> Jaya
>

Re: [lpc2000] Strange GCC compiler assembler output

2006-05-03 by Ralph Hempel

Jan Thogersen wrote:

> How come it takes 3 instruction to load the correct address into r2. 
> First it moves #0xE0000000 to r2, then adds #0x00004000 and finally adds 
> #0x00000018 to end up with #0xE0004018 which is the address of T0MR0. 
> But why don't it just move #0xE0004018 into the r2 in the first instruction?

It's because of the way ARM instructions are encoded. Each instruction
is exactly 32 bits long, so there's no way to load a 32 bit constant.

<http://www.heyrick.co.uk/assembler/qfinder.html>

The ARM move instruction actually works out to something like:

Move 8 bit value into Rn shifted by up to 31 bits.

<http://www.heyrick.co.uk/assembler/mov.html#mov>

Here's a quick reference card in PDF format:

<http://www.arm.com/pdfs/QRC0001H_rvct_v2.1_arm.pdf>

Hope this helps.

Ralph

Re: Strange GCC compiler assembler output

2006-05-03 by brendanmurphy37

Jan,

There's certainly something strange going on here.

As others have pointed out you can't load a 32-bit literal with the 
syntax you suggest. It can do done either with something like a 
load/shift/add over one or more instructions as is the case here, or 
loading indirect, as in:

 ldr pc, do_reset_addr

do_reset_addr: .long do_reset

(i.e. the 32-bit literal value is stored nearby).

That much is normal.

However, the something strange is more that you say the compiler 
optimisation is at level 3, yet it loads identical literal values 
into both r2 and r3 for no apparent reason.  GCC is normally very 
good at optimisation.

What way is TMR0 declared? Is "volatile" involved at all?

Also the initial comparison looks a bit strange: what way are 
buf_inpos and buf_outpos declared? 

Brendan


--- In lpc2000@yahoogroups.com, Jan Thogersen <jan@...> wrote:
>
> Hi,
> 
> Now I'm trying to dig into the assembler generation from the GCC 
> compiler and I've noticed something that I don't quite understand.
> 
> Here is the dump that got my attention:
>     if (buf_inpos == buf_outpos) T0MR0++; else { // The fifo is 
empty
>        E59F3130   ldr r3, [pc, #304]
>        E5D33000   ldrb r3, [r3]
>        E51B101C   ldr r1, [r11, #-28]
>        E1530001   cmp r3, r1
>        1A000009   bne 0x0000093C
>        E3A0220E   mov r2, #0xE0000000
>        E2822901   add r2, r2, #0x00004000
>        E2822018   add r2, r2, #0x00000018
>        E3A0320E   mov r3, #0xE0000000
>        E2833901   add r3, r3, #0x00004000
>        E2833018   add r3, r3, #0x00000018
>        E5933000   ldr r3, [r3]
>        E2833001   add r3, r3, #0x00000001
>        E5823000   str r3, [r2]
>        EA000031   b 0x00000A04
> 
> How come it takes 3 instruction to load the correct address into 
r2. 
> First it moves #0xE0000000 to r2, then adds #0x00004000 and finally 
adds 
> #0x00000018 to end up with #0xE0004018 which is the address of 
T0MR0. 
> But why don't it just move #0xE0004018 into the r2 in the first 
instruction?
> 
> This strange way of loading the mem location is seen throughout the 
hole 
Show quoted textHide quoted text
> dump!
> 
> I'm compiling af optimizion level 3.
> 
> Best regards
>   Jan
>

Re: [lpc2000] Re: Strange GCC compiler assembler output

2006-05-03 by Ralph Hempel

brendanmurphy37 wrote:

> However, the something strange is more that you say the compiler 
> optimisation is at level 3, yet it loads identical literal values 
> into both r2 and r3 for no apparent reason.  GCC is normally very 
> good at optimisation.
> 
> What way is TMR0 declared? Is "volatile" involved at all?
> 
> Also the initial comparison looks a bit strange: what way are 
> buf_inpos and buf_outpos declared? 

The whole sequence might have been even better as:

   mov r2, #0xE0000000
   add r2, r2, #0x00004000
   add r2, r2, #0x00000018
   mov r3, [r2]
   add r3, r3, #0x00000001
   str r3, [r2]

But if you REALLY are that short of cycles, maybe inline assembler
is a better choice. (Kidding)

The buf_inpos and buf_outpos are still a mystery. One looks like it's
being loaded from a fixed value that's stored close by in code space:

   ldr r3, [pc, #304]

while the other looks like it's coming off the stack frame:

   ldr r1, [r11, #-28]

Ralph
Show quoted textHide quoted text
> 
> Brendan
> 
> 
> --- In lpc2000@yahoogroups.com, Jan Thogersen <jan@...> wrote:
>> Hi,
>>
>> Now I'm trying to dig into the assembler generation from the GCC 
>> compiler and I've noticed something that I don't quite understand.
>>
>> Here is the dump that got my attention:
>>     if (buf_inpos == buf_outpos) T0MR0++; else { // The fifo is 
> empty
>>        E59F3130   ldr r3, [pc, #304]
>>        E5D33000   ldrb r3, [r3]
>>        E51B101C   ldr r1, [r11, #-28]
>>        E1530001   cmp r3, r1
>>        1A000009   bne 0x0000093C
>>        E3A0220E   mov r2, #0xE0000000
>>        E2822901   add r2, r2, #0x00004000
>>        E2822018   add r2, r2, #0x00000018
>>        E3A0320E   mov r3, #0xE0000000
>>        E2833901   add r3, r3, #0x00004000
>>        E2833018   add r3, r3, #0x00000018
>>        E5933000   ldr r3, [r3]
>>        E2833001   add r3, r3, #0x00000001
>>        E5823000   str r3, [r2]
>>        EA000031   b 0x00000A04
>>
>> How come it takes 3 instruction to load the correct address into 
> r2. 
>> First it moves #0xE0000000 to r2, then adds #0x00004000 and finally 
> adds 
>> #0x00000018 to end up with #0xE0004018 which is the address of 
> T0MR0. 
>> But why don't it just move #0xE0004018 into the r2 in the first 
> instruction?
>> This strange way of loading the mem location is seen throughout the 
> hole 
>> dump!
>>
>> I'm compiling af optimizion level 3.
>>
>> Best regards
>>   Jan
>>
> 
> 
> 
> 
> 
> 
>  
> Yahoo! Groups Links
> 
> 
> 
>  
> 
> 
> 
> 
>

Re: Strange GCC compiler assembler output

2006-05-03 by jayasooriah

I did a quick experiment that possibly explains what is happening.

--- In lpc2000@yahoogroups.com, Jan Thogersen <jan@...> wrote:
>        E3A0220E   mov r2, #0xE0000000
>        E2822901   add r2, r2, #0x00004000
>        E2822018   add r2, r2, #0x00000018
...
> How come it takes 3 instruction to load the correct address into r2. 
> First it moves #0xE0000000 to r2, then adds #0x00004000 and finally
adds 
> #0x00000018 to end up with #0xE0004018 which is the address of T0MR0. 

My source code looks like this:

> int
> foo(void)
> {
>         return (0x12345678);
> 
> } // foo()

The -O3 output looks like this:

> e59f0000        ldr     r0, [pc, #0]
> e1a0f00e        mov     pc, lr
> 12345678        .word   0x12345678

Changing return value to 0x12005678 gives:

> e3a02412        mov     r2, #0x12000000
> e2821c56        add     r1, r2, #0x5600
> e2810078        add     r0, r1, #0x78
> e1a0f00e        mov     pc, lr

The logic seems to be that if the constant can be loaded with MOV or
MOV plus up to two ADD instructions, GCC prefers this sequence to LDR.

It is possible that MOV plus two ADDs is more optimal than LDR taking
into account speed and size, possibly because of instruction cache
penalty when fetching memory pool constants in LDR case.

I dont know if anyone has added -mcpu for LPC that could use a
different cost table given MAM does not necessarily incur speed
penalty in accessing memory pool constants.

Jaya

Re: [lpc2000] Re: Strange GCC compiler assembler output

2006-05-03 by Jan Thogersen

Hi,

My declarations is like the following:
global:
#define T0MR0 (*(volatile unsigned long *)0xE0004018)
static uint8 buf_inpos;

local inside function:
  register uint8 buf_outpos;

Is the strange asm output because of the volatile thingy? AFAIK volatile 
tells the compiler NOT to reuse values left in regs. Instead reload it 
from the original location every single time? Right?

regards
  Jan

brendanmurphy37 wrote:
Show quoted textHide quoted text
>
> Jan,
>
> There's certainly something strange going on here.
>
> As others have pointed out you can't load a 32-bit literal with the
> syntax you suggest. It can do done either with something like a
> load/shift/add over one or more instructions as is the case here, or
> loading indirect, as in:
>
> ldr pc, do_reset_addr
>
> do_reset_addr: .long do_reset
>
> (i.e. the 32-bit literal value is stored nearby).
>
> That much is normal.
>
> However, the something strange is more that you say the compiler
> optimisation is at level 3, yet it loads identical literal values
> into both r2 and r3 for no apparent reason.  GCC is normally very
> good at optimisation.
>
> What way is TMR0 declared? Is "volatile" involved at all?
>
> Also the initial comparison looks a bit strange: what way are
> buf_inpos and buf_outpos declared?
>
> Brendan

Re: [lpc2000] Re: Strange GCC compiler assembler output

2006-05-03 by Jan Thogersen

Hi,

My declarations is like the following:
global:
#define T0MR0 (*(volatile unsigned long *)0xE0004018)
static uint8 buf_inpos;

local inside function:
 register uint8 buf_outpos;

Is the strange asm output because of the volatile thingy? AFAIK volatile 
tells the compiler NOT to reuse values left in regs. Instead reload it 
from the original location every single time? Right?

regards
 Jan


brendanmurphy37 wrote:
Show quoted textHide quoted text
>
> Jan,
>
> There's certainly something strange going on here.
>
> As others have pointed out you can't load a 32-bit literal with the
> syntax you suggest. It can do done either with something like a
> load/shift/add over one or more instructions as is the case here, or
> loading indirect, as in:
>
> ldr pc, do_reset_addr
>
> do_reset_addr: .long do_reset
>
> (i.e. the 32-bit literal value is stored nearby).
>
> That much is normal.
>
> However, the something strange is more that you say the compiler
> optimisation is at level 3, yet it loads identical literal values
> into both r2 and r3 for no apparent reason.  GCC is normally very
> good at optimisation.
>
> What way is TMR0 declared? Is "volatile" involved at all?
>
> Also the initial comparison looks a bit strange: what way are
> buf_inpos and buf_outpos declared?
>

Re: Strange GCC compiler assembler output

2006-05-03 by brendanmurphy37

--- In lpc2000@yahoogroups.com, Ralph Hempel <rhempel@...> wrote:

> The buf_inpos and buf_outpos are still a mystery. One looks like 
it's
> being loaded from a fixed value that's stored close by in code 
space:
> 
>    ldr r3, [pc, #304]
> 
> while the other looks like it's coming off the stack frame:
> 
>    ldr r1, [r11, #-28]
> 
> Ralph

The thing that struck me as strange about the comparison is that it's 
loading one as a byte the other as a word and then comparing.

> >>        E59F3130   ldr r3, [pc, #304]
> >>        E5D33000   ldrb r3, [r3]
> >>        E51B101C   ldr r1, [r11, #-28]
> >>        E1530001   cmp r3, r1

Hance my interest in seeing how they were declared.

Isn't this fun, second-guessing a compiler! NOT! 

Note to self: find something more productive to do with my time....

Brendan

Re: Strange GCC compiler assembler output

2006-05-03 by brendanmurphy37

--- In lpc2000@yahoogroups.com, Jan Thogersen <jan@...> wrote:
>
> Hi,
> 
> My declarations is like the following:
> global:
> #define T0MR0 (*(volatile unsigned long *)0xE0004018)
> static uint8 buf_inpos;
> 
> local inside function:
>   register uint8 buf_outpos;
> 
> Is the strange asm output because of the volatile thingy? AFAIK 
volatile 
> tells the compiler NOT to reuse values left in regs. Instead reload 
it 
> from the original location every single time? Right?
> 
> regards
>   Jan

Yes, you're correct: volatile tells it to load the specified object 
each time. The odd thing here though is that it's loading the address 
of the object twice.

I'd advise you to check the T0MR0 declaration very carefully that 
you're not casting the constant to a "volatile pointer to an int" 
rather than a "pointer to a volatile int". The two aren't the same: 
you need the latter. Maybe someone else could advise: I'm afraid I 
don't carry around the prcedence rules in my head. Having said that 
chances are it's correct. It's still a mystery why the compiler 
didn't just to a "mov r2, r3" having gone to the trouble of loading 
up the literal.

Given there's no "memory increment" ARM instruction (I think!), it's 
always going to have to do a "load/increment/store" by the way.

Given the declaration of the buffer pointers, the other generated 
code kind of makes sense: it certainly explains why they are loaded 
differently (one being global, the other a local).

Brendan

Re: Strange GCC compiler assembler output

2006-05-03 by brendanmurphy37

Jan,

Looking at it again, the declaration is fine. Still a mystery why the 
compiler loaded the literal twice though, rather than copied from one 
register to another.

Brendan

--- In lpc2000@yahoogroups.com, "brendanmurphy37" 
<brendanmurphy37@...> wrote:
>
> --- In lpc2000@yahoogroups.com, Jan Thogersen <jan@> wrote:
> >
> > Hi,
> > 
> > My declarations is like the following:
> > global:
> > #define T0MR0 (*(volatile unsigned long *)0xE0004018)
> > static uint8 buf_inpos;
> > 
> > local inside function:
> >   register uint8 buf_outpos;
> > 
> > Is the strange asm output because of the volatile thingy? AFAIK 
> volatile 
> > tells the compiler NOT to reuse values left in regs. Instead 
reload 
> it 
> > from the original location every single time? Right?
> > 
> > regards
> >   Jan
> 
> Yes, you're correct: volatile tells it to load the specified object 
> each time. The odd thing here though is that it's loading the 
address 
Show quoted textHide quoted text
> of the object twice.
> 
> I'd advise you to check the T0MR0 declaration very carefully that 
> you're not casting the constant to a "volatile pointer to an int" 
> rather than a "pointer to a volatile int". The two aren't the same: 
> you need the latter. Maybe someone else could advise: I'm afraid I 
> don't carry around the prcedence rules in my head. Having said that 
> chances are it's correct. It's still a mystery why the compiler 
> didn't just to a "mov r2, r3" having gone to the trouble of loading 
> up the literal.
>

Re: Strange GCC compiler assembler output

2006-05-03 by jayasooriah

Jan,

The output you got seems very strange.  Which platform are you
compiling and with which version of GCC?

I suggest you do the equivalent of the following where I have created
the source "foo.c" with the T0MRO defined as you have, and then got my
build of the compiler to generate assembler source.

What it generates seems reasonable but very different from yours. I
suspect there could be something else obscuring your code that is not
visible from your excerpt.

Jaya

> [temp] cat foo.c
> #define T0MR0 (*(volatile unsigned long *)0xE0004018)
> 
> void foo(void)
> {
>         T0MR0++;
> 
> } // foo()
> [temp] arm-esdk-gcc -O3 -S foo.c
> [temp] cat foo.s
>         .file   "foo.c"
>         .text
>         .align  2
>         .global _foo
>         .type   _foo, %function
> _foo:
>         @ args = 0, pretend = 0, frame = 0
>         @ frame_needed = 0, uses_anonymous_args = 0
>         @ link register save eliminated.
>         mov     r3, #-536870912
>         add     r0, r3, #16384
>         ldr     ip, [r0, #24]
>         add     r1, ip, #1
>         @ lr needed for prologue
>         str     r1, [r0, #24]
>         mov     pc, lr
>         .size   _foo, .-_foo
>         .ident  "GCC: (GNU) 3.3.2"
> [temp]

--- In lpc2000@yahoogroups.com, Jan Thogersen <jan@...> wrote:
>
> Hi,
> 
> My declarations is like the following:
> global:
> #define T0MR0 (*(volatile unsigned long *)0xE0004018)
> static uint8 buf_inpos;
> 
> local inside function:
>   register uint8 buf_outpos;
> 
> Is the strange asm output because of the volatile thingy? AFAIK
volatile 
Show quoted textHide quoted text
> tells the compiler NOT to reuse values left in regs. Instead reload it 
> from the original location every single time? Right?
> 
> regards
>   Jan

Re: [lpc2000] Re: Strange GCC compiler assembler output

2006-05-05 by 42Bastian Schick

jayasooriah schrieb:
> Oppps... I should have seen the binary code -- you *are* generating
> 32-bit code.
> 
> In this case, I think the compiler decided not to use literal pools
> for some reason.  May be there is no place nearby to place the pool
> constants.
> 
> So I really have no idea what it is trying to optimise, if this is
> what it is doing!

An ldr rn,[pc,#n] (which is the real opcode behind ldr rn,=<const>)
might have some pipeline stalls longer than the 3 mov instructions.
Would be interessting to see if the compiler make differences
if optimized for size and not for speed ?


-- 
42Bastian

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.