4.2.4 ANSWERS (Frontpage web hosting) TO EXERCISES 575 STZ EXPO ExPo+-0.

4.2.4 ANSWERS TO EXERCISES 575 STZ EXPO ExPo+-0. SLAX 1 Remove exponent. JMP DNORM Normalize and exit. SINGLE STJ EXITF Convert to single precision: JOV OFLO Ensure overflow is off. STA TEMP LD2 TEMP(EXPD) r12 + e. DEC2 QQ-Q Correct for difference in excess. SLAX 2 Remove exponent. JMP NORM Normalize, round, and exit. 1 7. All three routines give zero as the answer if and only if the exact result would be zero, so we need not worry about zero denominators in the expressions for relative error. The worst case of the addition routine is pretty bad: Visualized in decimal notation, if the inputs are 1.0000000 and .99999999, the answer is b- instead of b- ; thus the maximum relative error 61 is b - 1, where b is the byte size. For multiplication and division, we may assume that both operands are positive and have the same exponent QQ. The maximum error in multiplication is readily bounded by considering Fig. 4: When uv 2 l/b, we have 0 5 u u -2~ @v < 3bmg + (b -l)b- , so the relative error is bounded by (b + 2)bF . When l/b2 5 wu < l/b, we have 0 2 uv -u @J v < 3bVg, so the relative error in this case is bounded by 3beg/uv < 3be7. We take 62 to be the larger of the two estimates, namely 3bp7. Division requires a more careful analysis of Program D. The quantity actually computed by the subroutine is cr - 6 -bc((a -6 )(p -6 ) -6 ) -5, where (Y = (uLm + al)/bvmr p = vl/bvm, and the nonnegative truncation errors (b,6 , 6 , 6 ) are respectively less than (b-l , bp5, bh5, be6); finally 6, (the truncation during normaliza- tion) is nonnegative and less than either b- or b- , depending on whether scaling occurs or not. The actual value of the quotient is a/(1 + b@) = a -b@3 f b2cQ2b , where 6 is the nonnegative error due to truncation of the infinite series (2); here ,j < E;I = b-10, since it is an alternating series. The relative error is therefore the absolute value of (bd $ b&P/a + b& /a) -(6/a + bd 6 fa + b2P26 f 6,/a), times (1 + be@. The positive terms in this expression are bounded by b- + bps + b- , and the negative terms are bounded by bps + b-l2 + bps plus the contribution by the normalizing phase, which can be about bp7 in magnitude. It is therefore clear that the potentially greatest part of the relative error comes during the normalization phase, and that 6s = (b + 2)b- is a safe upper bound for the relative error. 8. Addition: If e, 5 e, + 1, the entire relative error occurs during the normalization phase, so it is bounded above by b- . If e, 2 e, + 2, and if the signs are the same, again the entire error may be ascribed to normalization; if the signs are opposite, the error due to shifting digits out of the register is in the opposite direction from the subsequent error introduced during normalization. Both of these errors are bounded by bp7, hence & = be7. (This is substantially better then the result in exercise 7.) Multiplication: An analysis as in exercise 7 gives 62 = (b + 2)bF . SECTION 4.2.4 1. Since fraction overflow can occur only when the operands have the same sign, this is the probability that fraction overflow occurs divided by the probability that the operands have the same sign, namely, 7%/($(91%)) = 15%.

Leave a Reply