Archive for October, 2007

4.2.4 ANSWERS (Web hosting faq) TO EXERCISES 577 10. When 1

Thursday, October 25th, 2007

4.2.4 ANSWERS TO EXERCISES 577 10. When 1 < r < 10 the generating function C(z) has simple poles at the points 1 f wn, where wn = 2Tni/ In 10, hence C(z) = l qor; l + c * (ln ;,;l yl;lw ) + E(z) n#O n where E(z) is analytic in the entire plane. Thus if B = arctan(27r/ In lo), 2 -w,lnr _ 1 cm = log,, r -1 - ~ !J? e + em In lo n>OCC wn(l + w,p ) sin(m0 + 2~ log,, T) -sin(&) 1 = log,, r -1 + ~(1 -k 4r2/(ln 10)2)m 2 (1 + 16+/(ln 10)2)m/2 > 11. When (log, U) mod 1 is uniformly distributed in [ 0, I), so is (log, l/U) mod 1 = (1 -log,U)modl. 12. We have h(z) = JITb f(x) dx g(z/bx)/bx + sZ1 f(x) dx g(z/x)/x, and the same holds for 1(z) = llTb f(x) dx l(z/bx)/bx + s, f(x) dx @/x)/x, hence g(z/bx)qzpxj dz/x) w - l(z)= = dx -+ -@lx) l(z) sl,bf(x) l(zlbx)sz f(x)dx @lx). Since f(x) 2 0, ](h(z)-L(z))/~(z)] 5 S:/,f(~)dxA(g)+~~ f(x)dxA(g) for all z, hence A(h) 5 A(g). By symmetry, A(h) 5 A(f). [Bell System Tech. J. 49 (1970), 1609-1625.1 13. Let X = (log, U) mod 1 and Y = (log, V) mod 1, so that X and Y are inde- pendently and uniformly distributed in [ 0,l). No left shift is needed if and only if X + Y 2 1, and this occurs with probability 3. (Similarly, the probability is 3 that floating point division by Algorithm 4.2.lM needs no normalization shifts; this needs only the weaker assumption that both of the operands independently have the same distribution.) 14. For convenience, the calculations are shown here for b = 10. If k = 0, the probability of a carry is 01 (See Fig. A-7.) The value of the integral is Fig. A-7. and

576 ANSWERS TO EXERCISES 4.2.4 3. log,, 2.4 (Web design programs)

Wednesday, October 24th, 2007

576 ANSWERS TO EXERCISES 4.2.4 3. log,, 2.4 -log,, 2.3 z 1.84834%. 4. The pages would be uniformly gray (same as random point on a slide rule ). 5. The probability that 1Ofu 5 r is (r -l)/lO + (r -l)/lOO + … = (r -1)/9. So in this case the leading digits are uniformly distributed; e.g., leading digit 1 occurs with probability 4. 6. The probability that there are three leading zero bits is log,s 2 = a; the probability that there are two leading zero bits is log,, 4 -log,, 2 = $; and similarly for the other two cases. The average number of leading zero bits is 11, so the average number of significant bits is p + a. The worst case, p -1 bits, occurs however with rather high probability. In practice, it is usually necessary to base error estimates on the worst case, since a chain of calculations is only as strong as its weakest link. In the error analysis of Section 4.2.2, the upper bound on relative rounding error for floating hex is 21pp. In the binary case we can have p + 1 significant bits at all times (cf. exercise 4.2.1-3) with relative rounding errors bounded by 2-imp. Extensive computational experience confirms that floating binary (even with p-bit precision instead of p + 1) produces significantly more accurate results than (p + 2)-bit floating hex. Tables 1 and 2 show that hexadecimal arithmetic can be done a little faster, since fewer cycles are needed when scaling to the right or normalizing to the left. But this fact is insignificant compared to the substantial advantages of b = 2 over other radices (cf. also Theorem 4.2.2C and exercises 4.2.2-13, 15, 21), especially since floating binary can be made as fast as floating hex with only a tiny increase in total processor cost. 7. For example, suppose that xm(F(lOk . sk) -F(lOk )) = log5k/log 10k and also that xm(F(lOkm . 4k) -F(lOk )) = log4k/ log 10 ; then c (F(lOk . 5k) -F(lOkm 4k)) = log,, 2 4 m for all k. But now let E be a small positive number, and choose 6 > 0 so that F(z) < E for 0 < x < 6, and choose A4 > 0 so that F(z) > 1 - E for x > M. We can take k so large that 10pk . 5k < 6 and 4k > M; hence by the monotonicity of F, c (F(lOk 5k) -F(lOkm . 4k)) 5 c (F(lOk . sk) -F(lOk@- ) 5k)) m ?7%<0 + c (WO k(m+l) .4k) _ F(lOkm . 4k)) WI20 = F(10-k5k) + 1 -F(10k4k) < 2~. 8. When s > r, Ps(lOns) is 1 for small n, and 0 when [lO s] > [lO r]. The least n for which this happens may be arbitrarily large, so no uniform bound can be given for NO(E) independent of s. (In general, calculus textbooks prove that such a uniform bound would imply that the limit function So(s) would be continuous, and it isn t.) 9. Let ql, q2, . be such that PO(n) = ql(n;l) + q2( y1) +. . . for all n. It follows that P,(n) = l-mqi(n;l) + 2-,qs(,, ) $ 9.. for all m and n.

4.2.4 ANSWERS (Frontpage web hosting) TO EXERCISES 575 STZ EXPO ExPo+-0.

Tuesday, October 23rd, 2007

4.2.4 ANSWERS TO EXERCISES 575 STZ EXPO ExPo+-0. SLAX 1 Remove exponent. JMP DNORM Normalize and exit. SINGLE STJ EXITF Convert to single precision: JOV OFLO Ensure overflow is off. STA TEMP LD2 TEMP(EXPD) r12 + e. DEC2 QQ-Q Correct for difference in excess. SLAX 2 Remove exponent. JMP NORM Normalize, round, and exit. 1 7. All three routines give zero as the answer if and only if the exact result would be zero, so we need not worry about zero denominators in the expressions for relative error. The worst case of the addition routine is pretty bad: Visualized in decimal notation, if the inputs are 1.0000000 and .99999999, the answer is b- instead of b- ; thus the maximum relative error 61 is b - 1, where b is the byte size. For multiplication and division, we may assume that both operands are positive and have the same exponent QQ. The maximum error in multiplication is readily bounded by considering Fig. 4: When uv 2 l/b, we have 0 5 u u -2~ @v < 3bmg + (b -l)b- , so the relative error is bounded by (b + 2)bF . When l/b2 5 wu < l/b, we have 0 2 uv -u @J v < 3bVg, so the relative error in this case is bounded by 3beg/uv < 3be7. We take 62 to be the larger of the two estimates, namely 3bp7. Division requires a more careful analysis of Program D. The quantity actually computed by the subroutine is cr - 6 -bc((a -6 )(p -6 ) -6 ) -5, where (Y = (uLm + al)/bvmr p = vl/bvm, and the nonnegative truncation errors (b,6 , 6 , 6 ) are respectively less than (b-l , bp5, bh5, be6); finally 6, (the truncation during normaliza- tion) is nonnegative and less than either b- or b- , depending on whether scaling occurs or not. The actual value of the quotient is a/(1 + b@) = a -b@3 f b2cQ2b , where 6 is the nonnegative error due to truncation of the infinite series (2); here ,j < E;I = b-10, since it is an alternating series. The relative error is therefore the absolute value of (bd $ b&P/a + b& /a) -(6/a + bd 6 fa + b2P26 f 6,/a), times (1 + be@. The positive terms in this expression are bounded by b- + bps + b- , and the negative terms are bounded by bps + b-l2 + bps plus the contribution by the normalizing phase, which can be about bp7 in magnitude. It is therefore clear that the potentially greatest part of the relative error comes during the normalization phase, and that 6s = (b + 2)b- is a safe upper bound for the relative error. 8. Addition: If e, 5 e, + 1, the entire relative error occurs during the normalization phase, so it is bounded above by b- . If e, 2 e, + 2, and if the signs are the same, again the entire error may be ascribed to normalization; if the signs are opposite, the error due to shifting digits out of the register is in the opposite direction from the subsequent error introduced during normalization. Both of these errors are bounded by bp7, hence & = be7. (This is substantially better then the result in exercise 7.) Multiplication: An analysis as in exercise 7 gives 62 = (b + 2)bF . SECTION 4.2.4 1. Since fraction overflow can occur only when the operands have the same sign, this is the probability that fraction overflow occurs divided by the probability that the operands have the same sign, namely, 7%/($(91%)) = 15%.

574 ANSWERS (Cedant web hosting) TO EXERCISES 4.2.2 ul < 0

Tuesday, October 23rd, 2007

574 ANSWERS TO EXERCISES 4.2.2 ul < 0 < 1~~ and 2)~ < 0 < v,; in the latter case we compute four products, and the answer is [min(ul v We, 1~~ v vl), max(ul A wi, u7 A vY)].) Finally, u @ v is undefined if 2ri < 0 < v,; otherwise we use the formulas for multiplication with 01 and w,. replaced respectively by v, and VT , where z A y. = xLAy, xvy- = xvy, (*0)-l = jy20, (+m)-1 = *o. [Cf. E. R. Hansen, Math. Comp. 22 (1968), 374-384. An alternative scheme, in which division by 0 gives no error messages and intervals may be neighborhoods of co, has been proposed by W. M. Kahan. In Kahan s scheme, for example, the reciprocal of [-1, +l] is [+I, -11, and an attempt to multiply an interval containing 0 by an interval containing co yields [-co, fco], the set of all numbers. See Numerical Analysis, Univ. Michigan Engineering Summer Conf. Notes No. 6818 (1968).] 25. Cancellation reveals previous errors in the computation of u and w. For example, if E is small, we often get poor accuracy when computing f(~ + E) 8 f(z), because the rounded calculation of f(x + t) destroys much of the information about E. It is desirable to rewrite such formulas as E @ g(x, E), where g(x, E) = (f(x + E) - f(x))/6 is first computed symbolically. Thus, if f(x) = x2 then g(x, E) = 2x + 6; if f(x) = fi then g(z, E) = l/( fi $ &). 26. See Math. Comp. 32 (1978), 227-232. SECTION 4.2.3 1. First, (wm, wl) = (.573, .248); then wmul/vm = .290; so the answer is (.572, ,958). This in fact is the correct result to six decimals. 2. The answer is not affected, since the normalization routine truncates to eight places and can never look at this particular byte position. (Scaling to the left occurs at most once during normalization, since the inputs are normalized.) 3. Overflow obviously cannot occur at line 09, since we are adding two-byte quantities, or at line 22, since we are adding four-byte quantities. In line 30 we are computing the sum of three four-byte quantities, so this cannot overflow. Finally, in line 32, overflow is impossible because the product fiLfv must be less than unity. 4. Insert JOV OFLO; ENTI 0 between lines 03 and 04. Replace lines 21-22 by ADD TEMP(ABS) ; JNOV *+2; INCl 1 , and change lines 28-31 to SLAX 5; ADD TEMP; JNOV *+2; INCI 1; ENTX 0,l; SRC 5 . This adds five lines of code and only 1, 2, or 3 units of execution time. 5. Insert JOV OFLO after line 06. Change lines 22, 31, 39 respectively to SRAX 0, I , SLAX 5 , ADD ACC Between lines 40 and 41, insert DECZ 1; JNOV DNORM; INC2 1; INCX 1; SRC 1 . (It s tempting to remove the DEC2 1 in favor of STZ EXPO , but then INC2 1 might overflow rI2!) This adds six lines of code; the running time decreases by 3u, unless there is fraction overflow, when it increases by 721. 6.DOUBLE STJ EXITDF Convert to double precision: ENTX 0 Clear rX. STA TEMP LD2 TEMP(EXP) rI2 + e. INC2 QQ-Q Correct for difference in excess.

4.2.2 ANSWERS TO EXERCISES 573 Ykqk + qkok (Web hosting account)

Monday, October 22nd, 2007

4.2.2 ANSWERS TO EXERCISES 573 Ykqk + qkok + YkvkOk)gk-I, for 1 < k < n. Thus fn = 1 + ql -71 + (4n t'?rmS of 2nd order) + (higher order terms) = 1 + ~1 -yi +,0(nc2) is sufficiently small. [The Kahan summation formula was first published in CACA4 8 (1965), 40; cf. Proc. IFIP Congress (lgi l), 2, 1232. For another approach to accurate summation, see R. J. Hanson, CACM 18 (1975), 57-58. See also G. Bohlender, IEEE Trans. C-26 (1977), 621-632, for algorithms that compute round(zi $. . .fsn) and round(zi . . xn) exactly, given (~1,. . . , xn}.] 20. By the proof of Theorem C, (47) fails for e w = p only if ] u( + f 2 ]u, -u( 2 bP- +b- ; hence ]fU] 2 ]fv] 2 l-($b-l)beP. Thisratherrarecase, inwhich ]fw] before normalization takes its maximum value 2, is necessary and sufficient for failure. 21. (Solution by G. W. Veltkamp.) Let c = 2 rpP1 + 1; we may assume that p 2 2, so c is representable. First compute u = u @ c, ui = (u 0 u ) @ u , u2 = u @ ul; similarly, v = w @ c, 2ri = (v 8 v ) $ u , v2 = w 0 VI. Then set w t u @ u, w' + (((Ul @Vl@W) cB('1Ll@aV2))$(U2 @Wl))@(zLZ @w2). It suffices to prove this when u, v > 0 and e U = e, = p, so that u and v are integers E [2p- , 2p). Then u = ui + u2 where 2p-1 2 ui 2 2p, u1 mod2rp/ 1 = 0, and ]uz] 5 2Tp 21-1; similarly u = wi + 212. The operations during the calculation of 20 are exact, because w -uivl is a multiple of 2p- such that ]w -urvi ( 5 Jw - uw] + IUZWl + UlW2 + t&w21 2 2p- + 2p+ p12 + 2p- ; and similarly ]w -2~1~ -uivz] 5 Iw -uw( + Iu2wI < 2p-l + 2 p +l+p, where w -ulul -ulv2 is a multiple of 2 p/21. 22. We may assume that bp- 2 u, v < bP. If uw 2 b2p-1, then xi = uv -r where JrJ 2 $bp- , hence 22 = round(u -r/v) = xo (since [r/v] 5 +bpP1/bp- 2 3, and equality implies v = bp- hence r = 0). If UD > b2p-1, then ~1 = uv -r where JrJ 5 3bp, hence xl/v 2 u -r/v < bP + +b and x2 2 bP. If 52 = bP, then x3 = x1 (for otherwise (bP -$)w 5 xi 2 bp(w -1)). If x2 < bP and xi > b2p-1, then let x2 = XI/W +q where )q) 5 f; we have xs = round(xl + qw) = xi. Finally if 22 < bP and x1 = b2p- and 5s < b2p-1, then x4 = x2 by the first case above. This situation arises, for example, when b = 10, p = 2, u = 19, w = 55, xi = 1000, 22 = 18, z3 = 990. 23. Let [uJ = n, so that u @ 1 = u 0 n = u -n + r where Jr] 5 +bAp; we wish to show that round(n -r) = n. The result is clear if In] > 1; and r = 0 when n = 0 or 1, so the only subtle case is when n = -1, r = –+bmp. The identity fails iff b is a multiple of 4 and –b-l < u < -be2 and umod2bYP = 2bCp (e.g., p = 3, b = 8, u = -(.0124)8). 24. Let u = [ui, u7], w = [v,, We]. Then u $ u = [ul v ~1, ur Au?], where x A y = y A 2, x A +0 = x for all x, 5 f& -0 = x for all x # +0, Ic A +oo = fee for all x # -co, and x & --oo needn t be defined; x v y = -((-x) A(-y)). If x @ y would overflow in normal floating point arithmetic because x + y is too large, then 2 A y is foe and x v y is the largest representable number. For subtraction, let u 0 v = u $ (-v), where --2, = (-v,, -wr]. Multiplication is somewhat more complicated. The correct procedure is to let u@w = [min(ul~vl,ul~v,,u,~vl,2L~~wr), max(ul~wl,u1~w7.,u7~w1,uT~wIUT)], where xAy = yAs, xa(-y) = -(xvy) = (-x)Ay; xA+0 = (j-0 for x > 0, -0 for z < 0); xA--0 = -(xA+O); xA+oo = (+co for x > +0, –co for x < -0). (It is possible to determine the min and max simply by looking at the signs of ul, u,., VI, and v,., thereby computing only two of the eight products, except when

Web hosting india - 572 ANSWERSTOEXERCISES 42.2 15. u < 2, implies

Sunday, October 21st, 2007

572 ANSWERSTOEXERCISES 42.2 15. u < 2, implies that (u @ u) @ 2 5 (u @ u) @ 2 5 (v @ v) @ 2, so the condition holds for all u and v iff it holds whenever ZL = v. For base b = 2, the condition is therefore always satisfied (barring overflow); but for b > 2 there are numbers v # w such that v $ v = w $ w, hence the condition fails. [On the other hand, the formula u CD ((v 8 4 0 2) does g ive a midpoint in the correct range. Proof: It suffices to show that 2~ + (v 8 u) @ 2 5 v, i.e., (v 8 u) @ 2 2 v -u; and it is easy to verify that round(+round(z)) 5 x for all x 2 0.1 16. (a) Exponent changes occur at C,, = 11.111111, Es, = 101.11111, Es,, = 1001.1102, c,,,, = 10001 .020, c9000, = 100000.91, ~soosls = 1000000.0; therefore = 1109099.1. c 1000000 (b) C l n. It suffices to find the coefficient of xl, since the coefficient of xk will be just the same except with all subscripts increased by k -1. Let (fk, gk) denote the coefficient of 21 in (Sk -ck, ck) respectively. Then fr = (~+rll)(~-~Y1-~l6l-~l~l–61a~-~lbl~l), 91 =(1+61)(1+rll)(yl+al+ylal), and fk = (1 -7kffk -bkck -ykbkok)fk-1 + (?k -77k + Yk6k + ykqk + ykbkqk + ykqkck + 6kvkak + ykbkqkgk)gk-1, gk = gk(l + yk)(l + bk)fk-1 -(1 + 6k)(-/k +

Free web servers - 4.2.2 ANSWERS TO EXERCISES 571 4. Yes; let

Sunday, October 21st, 2007

4.2.2 ANSWERS TO EXERCISES 571 4. Yes; let l/u x o = w, where v is large. 5. Not always; in decimal arithmetic take u = w = 9. 6. (a) Yes. (b) Only for b+p 5 4 (try u = 1-bPp). [W. M. Kahan observes that the identity does hold whenever b-l < fU 5 b-l/ . It follows that 1 @ (1 @ (1 @ u)) = 1 @ u for all u.] 7. If u and w are consecutive floating binary numbers, u $ v = 2u or 27,~. When it is 2v we often have u@@J@ < 2v@. For example, u = (.lO . . . 001)2, v = (.lO . . . OlO)s, u @v = 2v, and u@ + v@ = (.lO.. 011)2. 8. (a) -, ZZ; (b) -, zz; (c) -, M; (d) -; (e) -. 9. ]u -w] 2 ]u -v] + ]w - w] 5 ~1 min(beUeq, bevpq) + ~2 min(beuyq, beW-q) 5 elbeUmq + E2bewpq < (~1 + ~2) max(beUeq, bewpq ). The result cannot be strengthened in general, since for &ample we might have e, very small compared to both e, and ezu, and this means that u -w might be fairly large under the hypotheses. 10. We have (.a~. . . ap--lap)* @ (.9.. .99)b = (.a~. up-l(ap -l))b if aP 2 1; here 9 stands for b -1. Furthermore, (.a~. . . up-iap)b @ (1.0.. . O)b = (.a,. u,-~O)~, so the multiplication is not monotone if b > 2 and up 2 2. But when b = 2, this argument can be extended to show that multiplication is monotone; obviously the certain computer had b > 2. 11. Without loss of generality, let z be an integer, ]z] < bP. If e 5 0, then t = 0. If 0 < e 5 p, then z -t has at most p f 1 digits, the least significant being zero. If e > p, then z -t = 0. [The result holds also under the weaker hypothesis Jt] < 2be.] 12. Assume that e, = p, e, 5 0, u > 0. Case 1, u > bp- . Case (la), w = u + 1, w 2 f, e, = 0. Then u = u or u+ 1, v = 1, u = u, w = 1 or 0. Case (lb), w = u, (v] 5 f. Then u = u, v = 0, u = u, v = 0. If ]v] = f and more general rounding is permitted we might also have u = u* 1, v = Fl. Case (lc), w = u-1, v 5 -4, e, = 0. Then u = u or u -1, v = -1, u = u, w = -1 or 0. Case 2, u = bpel. Case (2a), w = u + 1, v 2 f, e, = 0. Like (la). Case (2b), w = u, Iv] 5 f, u 2 u. Like (lb). Case (2c), w = u, ]v] 5 +, u < u. Then u = u -j/b where u = j/b + VI and ]vi I 5 f b-l for some positive integer j 5 4 b; we have v = 0, u = u, v = j/b. Case (2d), w < u. Then w = u -jfb where v = ---j/b + v1 and ]vi] < ab- for some positive integer j 5 b; we have (v , u ) = (--j/b, u), and (u , v ) = (u, --j/b) or (u-l/b, (1 --j)/b), the latter case only when vi = abE . In all cases ueu = u-u , v~~ =v-v ,~~~ =u-~ ,v~w =~-~ ,round(w-u-w)=w-~-v. 13. Since round(z) = 0 iff z = 0, we want to find a large set of integer pairs (m, n) with the property that m @ n is an integer iff m/n is. Assume that ]m], In] < bP. If m/n is an integer, then m @ n = m/n is also. Conversely if m/n is not an integer, but m @ n is, we have l/In] < Im @n -m/n] < ~lm/nlbl-P, hence ]m] > 2bP- . Our answer is therefore to require ]m] < 2bp- and 0 < In] < bP. (Slightly weaker hypotheses are also possible.) 14. I(~@~)c3w–vwl I I(u~ u)@w-(uc3 u)wI+IwI Iu@v-uvl I S(,,,),,$ be,-+, 6U@U I (1-t b)6(UB,,)B,. Now le(,B,)B, -e,m(,B,,)] 5 2, so we may take E = ;(l + b)b2-p.

Make web site - 570 ANSWERSTOEXERCISES 4.2.1 LD2 TEMP(EXP) r12 t e,.

Saturday, October 20th, 2007

570 ANSWERSTOEXERCISES 4.2.1 LD2 TEMP(EXP) r12 t e,. DEC2 Q J2NP ++3 SLA 0.2 Remove integer part of U. ENT2 0 JANN l.F ENN2 0.2 Fraction is negative: find SRAX 0,2 its complement. ENT2 0 JAZ ++2 INCA 1 ADD WMl Add word size minus one. 1H INC2 Q Prepare to normalize the answer. JMP NORM Normalize, round, and exit. 8H EQU l(l:l) WMl CON 8B-1,8B-l(1: 4) Word size minus one 1 16. If Jc] 2 Id], then set r + d @ c, s + c @ (r @I d); x +-(a @ (b @ r)) @ s, y t (bG(a@r))0s. 0th erwise set r +-c@d, s t $@(T@c); z t ((a@r)$b)@s, y + ((b@r)@a)@s. Then x+iy is the desired approximation to (a+bi)/(c+di). [CACM 5 (1963) 435. Other algorithms for complex arithmetic and function evaluation are given by P. Wynn, BIT 2 (1962), 232-255; see also Paul Friedland, CACM 10 (1967), 665.1 17. See Robert Morris, IEEE Transactions C-20 (1971) 1578-1579. Error analysis is more difficult with such systems, so interval arithmetic is correspondingly more desirable. 18. For positive numbers: shift fraction left until fi = 1, then round, then if the fraction is zero (rounding overflow) shift it right again. For negative numbers: shift fraction left until fi = 0, then round, then if the fraction is zero (rounding underflow) shift it right again. 19. (43f(l if e, < e,)-(1 if fraction overflow)-(10 if result zero)+(4 if magnitude is rounded up)+(l if first rounding digit is b/2)+(5 if rounding digits are b/20.. .0)+(7 if rounding overflow) + 7N + A(-1 + (11 if N > O)))u, where N is the number of left shifts during normalization and A = 1 if rX receives nonzero digits (otherwise A = 0). The maximum time of 73~ occurs for example when v. = +50 01000000, v == -4649999999, b = 100. [The average time, considering the data in Section 4.2.4, will be about 4512~1 SECTION 4.2.2 1.u~v=u$-w=-v~u=-(v$–2L)=-(w~u). 2. u @ x 2 u @ 0 = u, by (8), (2), (6); h ence by (8) again, (U @ x) $ v 2 u $ v. Similarly, (8) and (6) together with (2) imply that (U @ x) $ (V $ y) 2 (U @ x) @ v. 3. u = 8.0000001, v = 1.2500008, w = 8.0000008; (u @ v) @ w = 80.000064, u @ (w @ m) = 80.000057.

4.2.1 ANSWERS TO EXERCISES 569 6. (Business web site) (Consider the

Friday, October 19th, 2007

4.2.1 ANSWERS TO EXERCISES 569 6. (Consider the case ezL = cur fU = -fv in Program A.) Register A retains its previous sign, as in ADD. 7. Say that a number is normalized iff it is zero or its fraction part lies in the range Q < IfI < 4. A (p + 1)-place accumulator suffices for addition and subtraction; rounding (except during division) is equivalent to truncation. A very pleasant system indeed! We might represent numbers with excess-zero exponent, inserted between the first and subsequent digits of the fraction, and complemented if the fraction is negative, so that fixed-point order is preserved. 8. (a) (06, +.12345679) $ (06, –.12345678), (01, +.10345678) $ (00, -.94000000); (b) (99, +X7654321) $ itself, (99, +.99999999) @ (91, +.50000000). 9. a = c = (-50, +.lOOOOOOO), b = (-41, +.20000000), d = (-41, +.80000000), y = (11, +.looooooo). 10. (50, +.99999000) @ (55, +.99999000). 11. (50, +.10000001) @ (50, +.99999990). 12. If 0 < lfvl < IfUl, then lfvl I IfvI-bb-p; hence l/b < Ifu/fvl 5 l-b-p/lfvJ < 1-bpP. If 0 < lfvl 5 IfiLl, we have l/b < Ifu/f,,l/b 5 ((l-b-p)/(l/b))/b = l-bmP. 13. See J. Michael Yohe, IEEE Transactions C-22 (1973), 577-586; cf. also exercise 4.2.2-24. 14. FIX STJ 9F Float-to-fix subroutine: STA TEMP LDl TEMP(EXP) rI1 + e. SLA 1 rA+fffffO. JAZ 9F Is input zero? DECl 1 CMPA =0= (1: 1) If leading byte is zero, JE q-4 shift left again. ENNI -Q-4,1 JIN FIXOVFLO Is magnitude too large? ENTX 0 SHAX 0,l CMPX =1//2= JL 9F JG *+2 JAO 9F STA *+1 (0 : 0) Round, if necessary. INCA 1 Add &l (overflow is impossible) 9H JMP * Exit from subroutine. 1 15. FP STJ EXITF Fractional part subroutine: JOV OFLO Ensure overflow is off. STA TEMP TEMP t u. ENTX 0 SLA 1 rA t fU.

568 ANSWERS TO EXERCISES 4.1 of integers that (Geocities web hosting)

Friday, October 19th, 2007

568 ANSWERS TO EXERCISES 4.1 of integers that can be expressed as {ti b + al , . . . , t, b + a,} in the above manner for m different sequences (ai, . . , 07), summed over all choices of m different sequences (al,…,&). Given m different sequences (a?), . . , a$ )) for 1 2 1 2 m, the number of such sets is k,-i({(si f j -a! ))/b 1 1 5 i 5 r, 1 5 1 5 m}). Thus there is a collection of sets T(S) such that k(S)= c CTLl(T), where each CT is an integer. Furthermore if T E T(S), its elements are near those of S; we have min T 2 (min S - max D)/b and max T 5 (max S + b -1 - min D)/b. Thus we obtain simultaneous recurrence relations for the sequences (km(S)), where S runs through the nonempty integer subsets of [I, u + 11, in the notation of exercise 19. Since k, = k,(S) for any one-element set S, the sequence (kn) appears these recurrences. The coefficients CT can be computed from the first few values of kn(S), so we can obtain a system of equations defining the generating functions ks(z) = C k,(S),? = Wll I 1) + z CTET(S) w-4. For example, when D = {-l,O, 3) and b = 3 we have 1 = -4 and 2~ = 4, so the relevant sets S are {0}, (0, l}, (-1, l}, and (-1, 0, l}. The corresponding sequences for n 5 3 are (1,3,8,21), (0, 1,3,8), (O,O, 1,4), and (O,O, 0,O); so we obtain ko(z) = 1+ z(3ko(z) -kOl(Z)), koz(z) = z(ko1(z)+ koz(z)), kOl(Z) = zko(z), lOlZ(Z) = 0, and k(z) = l/(1 -32 + z ). In this case k, = Fznt2 and ,k,({O, 2)) = Fznel - I. SECTION 4.2.1 1. iV = (62,+.60 22 52 00); h = (37, +.lO 54 50 00). Note that 10h would be (38, +.Ol 05 45 00). 2. bE-q(l _ b-p), b–q-P; b-(1 -b-p), b-9-1, 3. When e does not have its smallest value, the most significant one bit (which appears in all such normalized numbers) need not appear in the computer word. 4. (51, +.10209877); (50, +.12346000); (53, -j-.99999999). The third answer would be (54, +.lOOOOOOO) if the first operand had been (45, –.50000000). 5. If x -y and m is an integer then mb + .r - mb + y. Furthermore x -y implies x/b -y/b, by considering all possible cases. Another crucial property is that x and y will round to the same integer, whenever x -y. Now if bApp2 F,, # fV we must have ( bp-t2 fV) mod b # 0; hence the transformation leaves fV unchanged unless e, -e, > 2. Since u was normalized, it is nonzero and IfiL + fvl > b- -b-2 2 b-2: th e 1ead m g nonzero digit of fU + fv must be at most two places to the right of the radix point, and the rounding operation will convert bpf3(fu f f,,) to an integer, where j 5 1. The proof will be complete if we can show that bp+j+ (fu + fv) -bpfjW1 (f,, + bppp2F,). By the previous paragraph, we have bp+ (f,, +f,,) -bp+ f,, + F, = bpf2(fu + bYpp2F,), which implies the desired result for allj 2 1. Note that, when b > 2 is even, such an integer F, always exists; but when b = 2 we require p+ 3 bits (let 2F, be an integer). When b is odd, an integer F, always exists except in the case of division, when a remainder of i b is possible.