Synth-DIY Yahoo! Groups Archives

Thread

Re: [AVR-Chat] Re: ASM vs. C

2009-01-15 by subscriptions@aeolusdevelopment.com

David VanHorn Wrote
>> With asm, you have to tediously study one great column of code before and
>> after you do any chopping and pasting. HLLs have a 2D visual aspect to
it.
>
>It's certainly possible to write ASM as one huge tarball of cruft, but
>that's not how I do it.
>
>Here's an ISR that I just compiled with optimization set for smallest
>code, in GCC:
>
>// Timer 0 overflow ISR
>ISR(TIMER0_OVF_vect)
>{
>     498:	1f 92       	push	r1

<snip>

>Only TWO of those instructions are actually needed!
>If it were me writing, it would look like this:
>
>T0_OVF_ISR:
>          sbi ADCSRA,(1<<ADSC)
>          reti
>
>Since SBI and CBI don't affect flags, and no registers are used,
>nothing needs to be pushed or popped at all!
>I'm still astounded that at this late date, this kind of thing happens.

This is one of the reasons I don't trust architecture specific keywords
(interrupt keywords in particular).

And also a good illustration of where mixing asm with C can be of great
benefit (reducing the latency substantially and decreasing size to boot)

Robert

--------------------------------------------------------------------
mail2web.com - Microsoft® Exchange solutions from a leading provider -
http://link.mail2web.com/Business/Exchange

Re: [AVR-Chat] Re: ASM vs. C

2009-01-15 by David VanHorn

> This is one of the reasons I don't trust architecture specific keywords
> (interrupt keywords in particular).
>
> And also a good illustration of where mixing asm with C can be of great
> benefit (reducing the latency substantially and decreasing size to boot)

This is hilarious to me in several layers.

The most obvious one is the one I pointed out, NOTHING needs be pushed.
Then we have the fact that ISRs aren't supposed to be able to count on
R1 being zeroed, but the code DOES that.

Can't a modern compiler figure out that all this was unnecessary?

Re: ASM vs. C

2009-01-15 by Don Kinzer

--- In AVR-Chat@yahoogroups.com, David Kelly <dkelly@...> wrote:
> If you read somewhere that its not supposed to be then
> clearly you are reading old documentation.
This misconception may have arisen from something that I tried to 
explain in an earlier post.  What I meant to convey is that the code 
generated for an arbitrary function f() relies on r1 being zero.  
Since an ISR may call an arbitrary function, the code generated for 
an ISR must ensure that r1 is zero before it makes such a call.  
Consequently, since an ISR may be invoked at an arbitrary time 
(particularly when r1 is non-zero as it might be after a MUL 
instruction), the prologue of the generated ISR code needs to save 
r1 on the stack and set it to zero.

Dave's point is that his ISR clearly doesn't call any external 
functions so the saving/setting/restoring of r1 is rather 
pointless.  While that is certainly true, it is also true that if 
this minor inefficiency is intolerable in your application, you can 
always write the ISR in pure assembly language.  This can be done in 
either of two ways.

1) You can use __attribute__((naked)) to tell the compiler not to 
emit any prologue/epilogue code and then use inline assembly 
language in the ISR to produce *all* of the code needed for the ISR 
including the RETI.

2) You can implement the ISR in assembly language in a .S file.

Here is the code for your Timer0 Overflow ISR to put in a .S file.

#include <avr/io.h>

.section .text

.global TIMER0_OVF_vect
TIMER0_OVF_vect:
  sbi ADCSRA, ADSC
  reti

.end

Of course, this code won't assembly for any AVR device which has its 
ADCSRA above address 0x1f.  If you want the code to work for those 
devices, too, you'll have to add some conditionals which might look 
like this:

#define IO(x) _SFR_IO_ADDR(x)

.global TIMER0_OVF_vect
TIMER0_OVF_vect:
#if (IO(ADCSRA) < 0x20)
 sbi IO(ADCSRA), ADSC
#else
#define tmpReg r24
 push tmpReg
 in tmpReg, SREG
 push tmpReg
 lds tmpReg, ADCSRA
 ori tmpReg, _BV(ADSC)
 sts ADCSRA, tmpReg
 pop tmpReg
 out SREG, tmpReg
 pop tmpReg
#undef tmpReg
#endif
  reti

Don Kinzer
ZBasic Microcontrollers
http://www.basic.net

Re: [AVR-Chat] Re: ASM vs. C

2009-01-15 by David Kelly

On Thu, Jan 15, 2009 at 10:55:41AM -0500, David VanHorn wrote:
> > This is one of the reasons I don't trust architecture specific
> > keywords (interrupt keywords in particular).
> >
> > And also a good illustration of where mixing asm with C can be of
> > great benefit (reducing the latency substantially and decreasing
> > size to boot)
> 
> This is hilarious to me in several layers.
> 
> The most obvious one is the one I pointed out, NOTHING needs be
> pushed. Then we have the fact that ISRs aren't supposed to be able to
> count on R1 being zeroed, but the code DOES that.

Why not count on R1 being zeroed? If you read somewhere that its not
supposed to be then clearly you are reading old documentation. Welcome
to Open Source.

> Can't a modern compiler figure out that all this was unnecessary?

No doubt it could. But why sweat it?

IIRC it takes 4 or 5 cycles to to RTI, and and Joerg says at
http://www.nongnu.org/avr-libc/user-manual/group__asmdemo.html that
dispatch takes 4 cycles to setup and 2 to RJMP the vector. (Probably 3
for > 64K AVRs). The generated code was 11 instructions vs 2. So the
"big" difference is 20 (6 for IRQ call, 10 instructions, 4 RTI) cycles
vs 11.

Twice almost-zero is still almost zero. This is not a "substantial"
improvement as has been claimed.

Q: If one builds a million of these things, how many centuries would these
million have to run to save as much time as has already been wasted
talking about it?

A: Never. The time saved will be burned in a busy loop waiting on other
things.

-- 
David Kelly N4HHE, dkelly@HiWAAY.net
========================================================================
Whom computers would destroy, they must first drive mad.

Re: [AVR-Chat] Re: ASM vs. C

2009-01-15 by David VanHorn

> Twice almost-zero is still almost zero. This is not a "substantial"
> improvement as has been claimed.

I've built systems where I had to scratch for every instruction.. LOTS
of code re-use, and still barely fit it in the device.
"buy a larger chip" is the classic answer, but $0.25 * a few million
is pretty significant money.


> A: Never. The time saved will be burned in a busy loop waiting on other
> things.

Could be..  In some applications, (NOT this one..) you're fighting for
speed, and literally need every cycle.
Buy a faster chip is the classic solution, see above, plus higher EMI
emissions and power consumption.