Yahoo Groups archive

Lpc2000

Index last updated: 2026-04-28 23:31 UTC

Message

Re: 32 bit Division

2004-09-07 by Frank Sergeant

"bobbruce000" <bobbruce000@...> writes:

> Anyone have fast working code for unsigned 32 bit division?
> Currently, I am using the __udivsi3 routine that comes with
> gcc, but it is slow and appears to be written to work on
> early ARMs without hardware multipliers.

Below my sig is the division routine, in ARM assembler, that I am using
in my Forth.  I don't know that it is any faster.  It doesn't use the
multiplier.  How would you use the multiplier such that it would be
faster?  I can see how to do it with a known divisor (by multiplying by
the reciprocal of the divisor) but not with a routine that accepts an
arbitrary divisor.  Do you have some sort of Newton's Method that would
be faster?  I don't actually do enough division at the moment to need it
any faster, but it is an interesting topic.

Also, see the question in the code in case anyone knows a way to reduce
the two instructions to just one, repeated here:

         @mov  r1, r1, rol #1    @ this is what we'd like to do
         mov r1, r1, lsl #1      @   but we accomplish it with these
         adc r1, r1, #0          @   two instructions.  Is there a
                                 @   way to do it in just one?

In the listing, 'DSTK' is the register R10 and 'nxt' is more or less the
Forth equivalent of a subroutine return in that it transfers control to
the next Forth word.

I suspect there are some "early out" improvements that could be made for
smaller divisors, etc.  Also, I think ARM may have had some sample code
including a division algorithm.


-- 
Frank
http://pygmy.utoh.org/riscy


@@@@ U/MOD ( dividend divisor - uR uQ) 
@ Divide 32bits by 32bits with 32bit remainder and 32bit quotient.
@ See Patterson & Hennessy, Computer Organization & Design, around
@  pp. 219-221.  This algorithm is a cross between the book's
@  second and third versions.
UslashMOD:
        @ divisor is in r0 (TOS)
        @ put dividend into r2 and extend it to 64 bits with r1=0
        ldr r2, [DSTK]
        mov r1, #0
         @ Divide   r1:r2  by  r0
         @ For each of the 32 bits,
         @   shift r1:r2 left
         @   subtract r0 from r1
         @   if result is negative
         @      undo subtraction and new quotient bit (lsbit of r2) gets 0
         @   otherwise
         @      new quotient bit (lsbit of r2) gets 1
         @ Remainder is in r1 and quotient is in r2
         @ Use r3 to count the iterations
      mov r3, #32
       @ First, shift dividend (r1:r2 left by one bit)
1:       movs r2, r2, lsl #1     @ msbit is now in Carry
         @mov  r1, r1, rol #1    @ this is what we'd like to do
         mov r1, r1, lsl #1      @   but we accomplish it with these
         adc r1, r1, #0          @   two instructions.  Is there a
                                 @   way to do it in just one?
       @ Subtract divisor from dividend.  If result is not negative,
       @  set least significant bit of quotient (in r2).  Otherwise,
       @  undo the subtraction by adding divisor back to r1.
        subs r1, r1, TOS
         @ new quotient bit should be 1 only if subtraction result 
         @ was non-negative, i.e. add complement of sign to r2
        addpl r2, r2, #1
         @ if subtraction result was negative, add back the divisor
         @ to r1, i.e. undo the subtraction
        addmi r1, r1, TOS
      subs r3, r3, #1
      bne 1b
      @ At this point, r2 contains the quotient and r1 the remainder 
      str r1, [DSTK]
      mov TOS, r2
      nxt

Attachments

Move to quarantaine

This moves the raw source file on disk only. The archive index is not changed automatically, so you still need to run a manual refresh afterward.