Yahoo Groups archive

Lpc2000

Index last updated: 2026-04-28 23:31 UTC

Thread

32 bit Division

32 bit Division

2004-08-31 by bobbruce000

Anyone have fast working code for unsigned 32 bit division?
Currently, I am using the __udivsi3 routine that comes with
gcc, but it is slow and appears to be written to work on
early ARMs without hardware multipliers.

Re: 32 bit Division

2004-09-07 by Frank Sergeant

"bobbruce000" <bobbruce000@...> writes:

> Anyone have fast working code for unsigned 32 bit division?
> Currently, I am using the __udivsi3 routine that comes with
> gcc, but it is slow and appears to be written to work on
> early ARMs without hardware multipliers.

Below my sig is the division routine, in ARM assembler, that I am using
in my Forth.  I don't know that it is any faster.  It doesn't use the
multiplier.  How would you use the multiplier such that it would be
faster?  I can see how to do it with a known divisor (by multiplying by
the reciprocal of the divisor) but not with a routine that accepts an
arbitrary divisor.  Do you have some sort of Newton's Method that would
be faster?  I don't actually do enough division at the moment to need it
any faster, but it is an interesting topic.

Also, see the question in the code in case anyone knows a way to reduce
the two instructions to just one, repeated here:

         @mov  r1, r1, rol #1    @ this is what we'd like to do
         mov r1, r1, lsl #1      @   but we accomplish it with these
         adc r1, r1, #0          @   two instructions.  Is there a
                                 @   way to do it in just one?

In the listing, 'DSTK' is the register R10 and 'nxt' is more or less the
Forth equivalent of a subroutine return in that it transfers control to
the next Forth word.

I suspect there are some "early out" improvements that could be made for
smaller divisors, etc.  Also, I think ARM may have had some sample code
including a division algorithm.


-- 
Frank
http://pygmy.utoh.org/riscy


@@@@ U/MOD ( dividend divisor - uR uQ) 
@ Divide 32bits by 32bits with 32bit remainder and 32bit quotient.
@ See Patterson & Hennessy, Computer Organization & Design, around
@  pp. 219-221.  This algorithm is a cross between the book's
@  second and third versions.
UslashMOD:
        @ divisor is in r0 (TOS)
        @ put dividend into r2 and extend it to 64 bits with r1=0
        ldr r2, [DSTK]
        mov r1, #0
         @ Divide   r1:r2  by  r0
         @ For each of the 32 bits,
         @   shift r1:r2 left
         @   subtract r0 from r1
         @   if result is negative
         @      undo subtraction and new quotient bit (lsbit of r2) gets 0
         @   otherwise
         @      new quotient bit (lsbit of r2) gets 1
         @ Remainder is in r1 and quotient is in r2
         @ Use r3 to count the iterations
      mov r3, #32
       @ First, shift dividend (r1:r2 left by one bit)
1:       movs r2, r2, lsl #1     @ msbit is now in Carry
         @mov  r1, r1, rol #1    @ this is what we'd like to do
         mov r1, r1, lsl #1      @   but we accomplish it with these
         adc r1, r1, #0          @   two instructions.  Is there a
                                 @   way to do it in just one?
       @ Subtract divisor from dividend.  If result is not negative,
       @  set least significant bit of quotient (in r2).  Otherwise,
       @  undo the subtraction by adding divisor back to r1.
        subs r1, r1, TOS
         @ new quotient bit should be 1 only if subtraction result 
         @ was non-negative, i.e. add complement of sign to r2
        addpl r2, r2, #1
         @ if subtraction result was negative, add back the divisor
         @ to r1, i.e. undo the subtraction
        addmi r1, r1, TOS
      subs r3, r3, #1
      bne 1b
      @ At this point, r2 contains the quotient and r1 the remainder 
      str r1, [DSTK]
      mov TOS, r2
      nxt

Re: 32 bit Division

2004-09-07 by Peter

--- In lpc2000@yahoogroups.com, Frank Sergeant <frank@p...> wrote:
<snip>

> Also, see the question in the code in case anyone knows a way to 
reduce
> the two instructions to just one, repeated here:
> 
>          @mov  r1, r1, rol #1    @ this is what we'd like to do
>          mov r1, r1, lsl #1      @   but we accomplish it with 
these
>          adc r1, r1, #0          @   two instructions.  Is there a
>                                  @   way to do it in just one?
> 

Unfortunately, the immediate and the shifted register have to come 
out on the same register read port in ARM7.

An alternative is to hold zero in another register (for example R12) 
and do this:

    adc r1, r12, r1, lsl #1    @ R12 contains zero 

The added speed *may* be worth the cost of tying up a register. R12 
is allowed to be corrupted during a function call so it doesn't have 
to be stacked.

Peter.

Re: 32 bit Division

2004-09-07 by lpc2100

Info on the following link may also be useful

http://www.poppyfields.net/acorn/tech/division.shtml

Tom

Re: 32 bit Division

2004-09-08 by Frank Sergeant

> An alternative is to hold zero in another register (for example R12) 
> and do this:
>     adc r1, r12, r1, lsl #1    @ R12 contains zero 

> http://www.poppyfields.net/acorn/tech/division.shtml

Thank you, Peter and Tom, for the suggestion and link.

I believe I'll start with a scratch register to hold zero and pursue the
link when/if the speed becomes more important.  It looks like a great
resource. 


-- 
Frank

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.