# Math : Novel Methods of Integer Multiplication and Division

This video explains the basics of multiplication using binary shift (doubling) and addition... just like the Egyptians used to do it!

Also:

• http://www.reenigne.org/blog/multiplying-faster-with-squares/ (a+b)2 = a2 + b2 + 2ab and (a-b)2 = a2 + b2 - 2ab and subtracting these gives: 4ab = (a+b)2 - (a-b)2 or ab = (a+b)2/4 - (a-b)2/4. So if we keep in memory a table of x2/4 we can do a multiply with an add, two subtractions and two table lookups. The tables have a+b number of entries rather than a*b, but must hold values as large as a*b if the full range of possible output is to be supported.+
• http://www.azillionmonkeys.com/qed/amult.html x86 Integer multiplying by constants
• http://www.cs.uiowa.edu/~jones/bcd/divide.html Jones on reciprocal multiplication. Many examples in C.
```	DST = ((SRC     ) + SRC)     ; /* times 2 */
DST = ((SRC << 1) + SRC)     ; /* times 3 */
DST = ((SRC     ) + SRC) << 1; /* times 4 */
DST = ((SRC << 2) + SRC)     ; /* times 5 */
DST = ((SRC << 1) + SRC) << 1; /* times 6 */
DST = ((SRC     ) + SRC) << 2; /* times 8 */
DST = ((SRC << 3) + SRC)     ; /* times 9 */
DST = ((SRC << 2) + SRC) << 1; /* times 10 */
DST = ((SRC << 1) + SRC) << 2; /* times 12 */
```

```	DST = ((((SRC << 1) + SRC) << 1) + SRC); /* times 7 */

```

Notation:

• x5:0 indicates bytes 0...5, all collectively representing the 48 bit number "x".
• x2:0 represents the least-significant 3 bytes of the number "x", a 24-bit quantity.

Nikolai Golovchenko says:

## 24 x 24 Multiplication algorithm

```        x5:0 = x2:0 * y2:0,

x2:0 - input1,
y2:0 - input2,
x5:0 - output,
0 - least significant byte.

1) clear sum x5:3 = 0
2) clear carry
3) shift x2:0 right

4) if carry set, add y2:0 to x5:3
5) shift x5:0 right (carry from previous addition shifted to MSb and
next bit of input to carry)
6) repeat 23 more times from 4
```

## 48 / 24 Division algorithm

```        x5:0 = x5:0 / y2:0

1) Check y2:0 for zero
2) Clear temporary Temp2:0 = 0
3) Set up counter Counter = 48

4) Shift left x5:0
5) Shift left Temp2:0 to get next bit from x5:0
6) Save carry (MSb of Temp might be set), e.g. in Counter<7>
7) Substract y2:0 from Temp2:0
8) If carry=1 (no borrow), set Counter<7>
9) If Counter<7>=0 (borrow and Msb of Temp was zero), add y2:0 to Temp2:0 to
restore it
10) carry=Counter<7> (next bit of result)
11) clear Counter<7>
12) decrement Counter, and repeat from 4 if not zero
13) Final shift left x5:0
```

Since result is known to fit in 3 bytes, then first 24 loops can be skipped. It would save also 3 temporary bytes Temp2:0.

# Novel Methods of Integer Multiplication and Division

G. Reichborn-Kjennerud
Tandberg Data A/S
POB 9, Korsvoll, Oslo 8
Norway

You may be familiar with the method of multiplication, variously alleged to be of Kenyan, Russian, or even Himalayan origin, in which you repeatedly halve the multiplicand and double the multiplier until the multiplicand becomes 1. Then the sum of those multipliers that have a multiplicand counterpart of odd value becomes the product. This sounds complicated, but it's really not; table 1 shows an example.

Multiplicand Multiplier Partial Column (c) Remainder Sum Expressed of Division in Terms of of Column Original (a) by 2 Multiplier Procedure: Repeatedly halve the multiplicand (discarding re- mainders) and double the multiplier until the former is 1. For every odd multiplicand, add the respective multiplier. Example: 44 x 51 (a) (b) (c) (d) (e) 44 51 0 22 102 0 11 204 204 4 x 51 1 5 408 408 8x51 1 2 816 0 1 1632 1632 32 x 51 1 Total 2244 = 44 x 51 101100 is binary for 44

This algorithm readily lends itself to coding, as exemplified by the sequence in 8080 code shown in listing 1. Halving is done by shifting to the right, and the odd/even test is performed by checking the carry. Doubling is done by adding to itself using the DAD instruction, which is also used for summing up the output terms.

Repeated halving of a number and then noting the odd/even results is a nice way of finding the binary form of the number (the last bit found being the most significant one). It also tells something of the binary nature of the Kenyan method.

Listing 1: An implementation of the Kenyan algorithm for integer multiplication for the 8080 microprocessor.

```;multiplibation program MULT

;input multiplication factors in HL and DE, one of which must.
;necessarily be an 8-bit number; if not, carry is set
;output product in DE, carry set if overflow.

;********************** Initial test to find 8-bit factor
MULT:	xra	a	;clear A
ora	d	;is D zero?
jz	found	;yes, DE number is 8-bit fabtor
xra	a	;no, DE number was not 8-bit factor
ora	h	;is H zero then?
stc
rnz		;no, return with carry set
xchg		;yes, place 8-bit factor in DE
found:	mov	a,e	;transfer multiplicand to A

;********************** Multiplication starts in earnest
lxi	d,0	;clear DE to receive output terms
ana	a	;8-bit factor now in A; clear carry.
next:	rar		;halve the multiplicand;- result odd?
jnc     even	;no, don't add multiplier term
xchg		;yes, therefore,
rb		;overflow, carry set on return
xchg		;put multiplier back. in HL
even:	ana	a	;already reached 1 by halving?
rz		;Yes, retuPn with result, carry cleared
dad	h	;no, double the multip.lier and
jnc	next	;continue the process.
ret		;overflow, carry set on return
```

Some time ago I became intrigued by the possibility of finding a procedure for division that was similar to the Kenyan method of multiplication. I came up with the following scheme: The divisor is repeatedly doubled until just less than the dividend, then successively subtracted from the dividend. Every time the subtraction operation gives a positive result, a 1 is noted; otherwise a O is recorded. Remarkably enough, the resultant sequence of 0's and 1's constitutes the quotient directly in binary form, as shown in table 2.

 Procedure: Double the divisor until it is just less than the divi- dend. Then try to subtract the doubled divisors, starting with the largest, from the dividend. Note a 1 if the subtraction is possible otherwise, note a zero and do not perform the subtraction. The 1s and Os constitute the binary form of the quotient. To ob- tain the decimal form, multiply the latter digits with the corre- sponding terms in a power of 2 series, arranged in reverse order. The quotient is the sum of the resultant terms. To obtain decimal accuracy, multiply the dividend initially by an Nth power of 10. Then, after the division is complete, divide the quotient by the same power of 10 (moving the decimal point N places). Example: 2246/51 Counter Double: 51 0 102 1 204 2 408 3 816 4 1632 5 Subtract: 2246 -1632 614 1 X 32 = 32 5 -816 0 x 16 = 0 4 614 -408 206 1 x 8 = 8 3 -204 2 1 x 4 = 4 2 -102 0 x 2 = 0 1 2 -51 0 x 1 = O 0 Remainder: 2 Quotient: 101100 = 44

Notice that the procedure is quite mechanical, with none of the trial-and-error search for the next correct quotient digit that is characteristic of the conventional method Furthermore, it lends itself beautifully to coding (see listing 2). There need be no 8-bit restrictions on any of the numbers; the dividend, divisor, quotient, and remainder can all be entered as 16-bit numbers.

Listing 2: An implementation of the author's integer-division algorithm for the 8080 microprocessor.

```;division program DIVIDE

;input dividend in BC and input divisor in HL.
:output quotient in HL and output remainder in DE.
;carry set if division by zero

;********************** Test for division by zero and prepare
DIVIDE:	mov	a,h	;for reverse polarity subtraction
ora
stc
rz		;division by zero; abort operation; carry set
mov	a,b	;put 2's complement of BC +1 into DE for
cma		;purposes of subtraction. (BC will be
mov	d,a	;incremented to enable subtraction when minuend
mov	a,c	;and subtrahend are having equal values).
cma
mov	e,a	;dividend in negative form now in DE
inx	b	;BC +1; dividend incremented
xra	a	;reset counter A and
sta	quot	;clear the quotient buffer
sta	quot+1	;(high-order part of quotient buffer)
jmp	double	;start the division in earnest

;*********************** First phase: Doubling the divisor
double: inr	a	;increment counter
push	h	;save divisor
dad	h	;double it, but go to second phase if
jc	change	;HL now is larger than dividend in B
dad	d	;comparison with dividend by subtraction
jnc	restore	;keep doubling unless HL now is larger than BC

;*********************** Second phase: Subtracting from the dividend
;                                      and accumulating quotient bits.

change: mov	b,a	;transfer count to new counter
subtrct:pop	h	;Fetch halved divisor as positive subtrahend
dad	d	;subtract by using negative dividend as minuend
jc	shiftc	;the carry bit becomes the quotient bit
xchg		;equivalent of adding back if subtraction fails
shiftc:	cmc		;invert quotient bit from reverse polarity
lda	quot	;shift quotient bits
ral
sta	quot	;and place into temporary storage
lda	quot+l
ral
sta	quot+1
dcr	b	;count-down finished?
jnz	subtrct	;no, continue process
lhld	quot	;yes, place output quotient in HL.
mov	a,e	;change remainder in DE into proper polarity

cma
mov	e,a
mov	a,d
cma
mov	d,a
ret		;division operation completed

quot:	d	2	;buffer for evolving quotient
```

To handle 16-bit numbers, the add-to-itself DAD H instruction is used for doubling the divisor, and the necessary comparison with the dividend is accomplished by reverse-polarity addition, using the negative value of the dividend (in the DE register pair) and testing on the carry. Care is taken to restore the divisor before the next doubling by adding back the positive value (in the BC register pair). The doubled divisors are put in temporary storage by pushing them to the stack.

For the necessary subtraction of the doubled divisors from the dividend, reverse-polarity addition is used again. Luckily, the dividend is already present in negative form (in the DE register pair), and the divisors can be used in their existing positive form as they are popped from the stack for subtraction. The carry is then indicative of a positive or negative result, and for every subtraction, it is shifted into a register pair to form the final quotient. A counter sees to it that there are no more subtractions than there were doubling operations. The contents of the DE register pair constitute the remainder (in complemented form).

As we have seen, odd ways of multiplying and dividing can lead to useful code algorithms. But the reverse can also be true. Machine-code algorithms can lead to odd but perhaps not so useful manual methods.

First, consider a table used for multiplying by a fixed number K, based on using the 8080 DAD instruction (see table 3). The multiplicand is loaded into two register pairs (HL and DE), and the product is obtained by executing a sequence of DAD H and DAD D commands in the order given beneath each value of K (operand sequences for K=2 to K=32 have been included). DAD H doubles the accumulated multiplicand in the HL pair, and DAD D adds the original multiplicand to the HL pair.

Procedure: Input multiplicand in both HL and DE register pairs. Constant K is the multiplier. Then perform a series of DAD D and DAD H Instructions in the order given by the sequence of Ds and Hs under the given value of K. The final product will be in the HL register pair If every DAD instruction is followed by a test of carry (JC or RC), carry will be set in case of overflow.

```K = 2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
DAD H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H
DAD    D  H  H  D  D  H  H  H  H  D  D  D  D  H  H  H  H  H  H  H  D  D  D  D  D  D  D  D  H  H
DAD          D  H  H  H  H  D  D  H  H  H  H  H  H  H  H  D  D  D  D  H  H  H  H  H  H  H  H  H
DAD                D     D  H  H  H  H  D  D  H  H  D  D  H  H  H  H  H  H  H  H  D  D  D  D  H
DAD                            D     D  H  H     D  H  H  H  H  D  D  H  H  D  D  H  H  H  H  H
DAD                                        D           D     D  H  H     D  H  H  H  H  D  D
DAD                                                                D           D     D  H  H

```

Table 3: An algorithm for integer multiplication for 8080 microporcessors.

It seems natural to look for a general algorithm based on DAD Hs and DAD Ds. If you look hard at table 3, you'll see a familiar pattern emerge: the Hs and Ds actually represent K in binary form. The Os are represented by H, whereas the 1's are represented by H and D as a group. True, the most significant bit is missing, but that will always be a 1 anyway. As an example, consider K=19. The sequence is H H (H D) (H D), which translates into (1) 0 0 1 1.

Thus, we can multiply by shifting the multiplier and examining the carry. When carry is cleared, we perform a DAD H operation, and when it is set, we do both a DAD H and a DAD D. This gives us the code in listing 3

Listing 3: an implementation in 8080 assembly language of the integer-multiplication algorithm given in tables 3 and 4.

```;multiplication program DADDY

;input multiplicand in DE and input multiplier in A
;output product in HL, carry set if overflow

;********************** Test for zero and leading zeroes, (8-bit
;                         factor already determined and placed in A)

DADDY:	lxi	h,0	;clear output product register
mvi	b,8+1	;set bit counter
ana	a	;is multiplier in A zero? (carry cleared)
rz		;yes, skip multiplication operation; O in HL
skip:	dcr	b	;check multiplier bit
jnc	skip	;yes, ignore it and check next bit
; cleared
;********************** Multiplication starts in earnest
next:	dcr	b	;more multiplier bits?
rz		;no, return with result in HL
rc		;overflow, carry set on return
ral		;is the multiplier bit a l?
jnc	next	;no, check the next bit
jnc	next	;check the next bit multiplicand
ret		;overflow, carry set on return

```

Now for the manual method that can be derived from this: Repeatedly halve the multipler until it becomes 1 (in order to find the binary form). Reverse the sequence of halved multipliers and ignore the 1. Repeatedly double the multiplicand. Whenever the corresponding halved multiplier is odd, add also the original multiplicand to the accumulated doubled multiplicands; table 4 gives us an example of this method. Oh well, not everything is progress. But then, progress isn't everything.

 Procedure: Repeatedly halve the multiplier (discarding remainders) until you reach 1. Ignore the 1 and arrange the resultant halved multipliers vertically in reverse order. For each halved multiplier, double the multiplicand. Add also the initial multiplicand if the halved multiplier is an odd number. Example: 44 x 51 Repeatedly halve the multiplier: 51 25 12 6 3 odd/even halved multipliers: Resultant Comment 44 44 Double the multiplicand by adding to itself 3 +44 Add initial multiplicand 132 132 6 + 0 Don't add initial multiplicand 264 264 12 +0 528 528 25 + 44 1100 1100 51 + 44 Final product: 2244

Questions:

how about 'vedic' multiplication methods? while not so beautifull without parallelism (i.e. fpga array) , they are still interesting, and allow performing multiplication of large numbers in just few steps (i.e. 6 clock cycles for 64bit mul. if fpga is used)
simplest way to perform it is to multiply bit like one multiplies on paper, so i.e.
```      101011
*       0010
------------
000000
+    101011
+   000000
-------------
01010110
```

notice that while CPU has to repeat the 'shift and multiply'
for each bit of the multiplier , fpga can do it in parallel in just one cycle (shift is just adressing to destination register, moving data there (including masking by multiplier) - takes just one cycle , actually one 'slope' , as not even full 'cycle' is needed.)
then we have \$multiplier_bit_count_size array of sums to make.
for 64bit multiplier, this would equal to as many as 64 operations, so 64 cycles, but we can once again try to be smart the 'vedic' way; addition operations can be split into parallel operations :

a+b+c+d=(a+b)+(c+d)

which means when we have addition list resulting from multiply:

```        0110
*       0101
------------
0110 (a)
0000  (b)
0110   (c)
+    0000    (d)
------------
```

we can add it in pretty much any order i.e. (0110 + 00000) + (011000 + 0000000) according to the above rule . This mean we can quickly add each pair , which makes it 32 parallel operations per first clock cyle, then 16 for 2nd, 8 for 3rd, 4, for 4th , 2 for 5th , and voila.

Also if extra logic would be use to detect 'pairs' which will unlikely cause any overflow in the addition (i.e. 001 + 010) they could be instantly merged to peform (a+b)+c additions, as it does not require involving additional 'overflow sum' cycle. We can group our adds to ones which will unlikely influence each other.

The order can be even more free, if we have logic allowing us to detect zeros, and choose only 'non zero' substrate, allowing to further reduce amount of operations (though at cost of non-predictable lenght of operation and more logic complexity)

Including the masked preloading of the array mentioned earlier, this all equals to just 6 cycles, and of those, just two involve ALU bus (fetching number and multiplier is one, and placing final addition result in destination is another)

so assuming clocking the MUL array 6x faster than the ALU bus (which is quite doable in cmos, assuming we talk about up to ~1ghz speeds), we can practicaly deliver MUL instruction in just 2 ALU bus clock cycles, while if registers can be independent (separate ALU bus to result register) - just one clock cycle.

In asm it takes bit more looping unfortunatelly , but it still makes practical method for multiplying of insanely large (64bit and more) numbers quite a breeze.

+

 file: /Techref/method/math/muldiv.htm, 28KB, , updated: 2021/6/13 11:02, local time: 2024/7/25 09:50, TOP NEW HELP FIND:  44.192.94.177:LOG IN

 ©2024 These pages are served without commercial sponsorship. (No popup ads, etc...).Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE. Questions?Please DO link to this page! Digg it! / MAKE! Novel Methods of Integer Multiplication and Division

After you find an appropriate page, you are invited to your to this massmind site! (posts will be visible only to you before review) Just type a nice message (short messages are blocked as spam) in the box and press the Post button. (HTML welcomed, but not the <A tag: Instead, use the link box to link to another page. A tutorial is available Members can login to post directly, become page editors, and be credited for their posts.

Attn spammers: All posts are reviewed before being made visible to anyone other than the poster.
 Did you find what you needed? "No. I'm looking for: " "No. Take me to the search page." "No. Take me to the top so I can drill down by catagory" "No. I'm willing to pay for help, please refer me to a qualified consultant"

.