 # PICMicrocontollerBasicMath Square Methods

Calculating x^2.

Archive:

Summary:

• A 4 bit number IMHO is most efficiently squared using a lookup table.
```       ; assumes W contains a 4 bit number, and squares it.
; by John Payson
Sqr4:
db      0,1,4,9,16,25,36,49,64,81,100,121,144,169,196,225

```
• John Payson notes that "BTW, on the subject of squaring, a potentially useful observation in some cases is that
``` (a+b)^2 = a^2 + b^2 + 2ab
```

Thus, any multiplication could be performed as three squaring operations, a subtraction, and a shift."

• David Cary notes that multiplication could be performed as three t() lookup operations, and some addition and subtraction:
```a*b = t(a+b) - ( t(a) + t(b) )

; calculate t(W), for 0 <= W <= 22
; by David Cary
get_t_w:
PAGESEL \$+2
db 0,1,3,6, H'0A', H'0F', H'15', H'1C', H'24', H'2D'
db H'37', H'42', H'4E', H'5B', H'69', H'78'
db H'88', H'99', H'AB', H'BE', H'D2', H'E7', H'FD'
```
• If you already have subroutines to perform NxN bit multiply, the smallest *code* to do N bit squaring is to call the NxN bit multiply routine.
• Andy David asked "What's the fastest way to square a 10-bit number?"

Stuart Taylor suggested starting with a standard 16x16 bit multiply and trimming it down to a 10x10 bit multiply.

Keith Dowsett suggested "break it down into three simpler multiplications:

```(a+b)^2 = a^2 + b^2 + 2ab
```

where b is the bottom 8 bits of the number, a is the top 2 bits, and use a conventional 8x8 bit multiply for b^2, and (for speed) use a customized 2x8 bit multiply for 2ab, and a 4 entry lookup table for a^2.

• Inspired by Keith Dowsett, Scott Dattalo suggests, instead of breaking the number N into 2 bits and 8 bits, breaking the number into 5 bits and 5 bits:
```N = a*32 + b

a = 5 msb's
b = 5 lsb's

Now using Keith's observation:

N^2 = 32^2 * a^2 + 64*a*b + b^2

=  (a^2)<<10 + (a*b)<<5 + b^2
```

which can be evaluated with one general-purpose 5x5 bit multiply and a lookup table for squaring 5 bit numbers.

Then Scott wonders if breaking the number into 3 nybbles would be even faster:

```  N = a*256 + b*16 + c
N^2 = (a*256)^2 + (b*16)^2 + c^2 +
2*(a*256*b*16 + a*256*c + b*16*c)
= (a^2)<<16 + (b^2)<<8 + c^2 +
((a*b)<<12 + (a*c)<<8 + b*c<<4))<<1
```

(See 4x4 bit multiply )

Andrew Warren speculates that a standard 10-bit x 10-bit multiplication will turn out to be faster than any of the complicated schemes above.

• Andy David has some tips on speeding up a 10 square by using 10x10 bit multiply:
A 10x10 multiply IS faster than 16x16, {for calculating a square} mostly because the result is guaranteed to fit in three bytes rather than 4. If you don't have the code space to unroll the whole multiplication into inline code (as I showed in my "Solution #1"), looped code can still be made pretty fast by writing three loops:

One for the first few bits of the multiplier, where the result of each addition of the multiplicand to the intermediate result is guaranteed to fit in the least-significant two bytes,

one for the next group of bits, where the addition may affect all three bytes of the result, and

one for the final few bits, where the addition is guaranteed to only affect the two most-significant bytes of the result.

John Payson implemented Andy David's tips, in a completely unrolled loop (sacrificing code space for speed) See the debugged version

```; [warning: untested]
; by John Payson 1996-11-05
clrf    DstH
clrf    DstM
clrf    DstL
movf    SrcL,w
andlw   \$0F
btfss   Src,4
rrf     DstM
rrf     DstL
btfss   Src,5
rrf     DstM
rrf     DstL
btfss   Src,6
rrf     DstM
rrf     DstL
btfss   Src,7
call    Sqr4
swapf   SrcL
andlw   \$0F
call    Sqr4
; At this point, 16-bit result is in DstM:DstH
; 25 words of code prior to this point (plus a
; 17-word table-lookup).  Total execution time:
; 35 cycles up to this point.
btfss   SrcH,0
goto   NoBit8
movf    SrcL,w
btfsc   C
incf   DstH
btfsc   C
incf   DstH
incf    DstH
; Another 9 words for bit 8; 3 or 9 cycles to exec.
NoBit8:
btfss   SrcH,1
goto   NoBit9
movlw   4
btfss   SrcH,0
movlw  8
rlf     SrcL,w
btfsc   C
incf   DstH
btfsc   C
incf   DstH
btfsc   C
incf   DstH
btfsc   C
incf   DstH
; Another 17 words for bit 9; 3 or 17 cycles to execute
; Total worst-case time: 35+26 = 61 cycles.
NoBit9:
retlw   0       ; All done!
Sqr4:
db      0,1,4,9,16,25,36,49,64,81,100,121,144,169,196,225
```

Questions:

• hishamel2_AT_hotmail.com asks: " Is there a consise routine (i.e less than 80 instructions) for taking the square (x^2) of 12 bit unsigned number.

Hisham" +

 file: /Techref/microchip/math/sq/index.htm, 7KB, , updated: 2010/4/14 23:14, local time: 2023/5/31 22:44, TOP NEW HELP FIND:  35.172.111.47:LOG IN

 ©2023 These pages are served without commercial sponsorship. (No popup ads, etc...).Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE. Questions?Please DO link to this page! Digg it! / MAKE! PIC Microcontoller Basic Math Square Methods

After you find an appropriate page, you are invited to your to this massmind site! (posts will be visible only to you before review) Just type a nice message (short messages are blocked as spam) in the box and press the Post button. (HTML welcomed, but not the <A tag: Instead, use the link box to link to another page. A tutorial is available Members can login to post directly, become page editors, and be credited for their posts.

Attn spammers: All posts are reviewed before being made visible to anyone other than the poster.
 Did you find what you needed? "No. I'm looking for: " "No. Take me to the search page." "No. Take me to the top so I can drill down by catagory" "No. I'm willing to pay for help, please refer me to a qualified consultant"

.