Archive for August, 2007

4.6.4 EVALUATION OF (X web hosting) POLYNOMIALS 495 Theorem W (S.

Friday, August 24th, 2007

4.6.4 EVALUATION OF POLYNOMIALS 495 Theorem W (S. Winograd, 1975). Let p(u) be a mom? polynomial of degree n whose complete factorization over a given infinite field is p(u) = p1(u)Q.. .p&p. (73) Then the rank of the tensor (72) corresponding to the bilinear forms (69) is 2n-q over this field. Proof. The bilinear forms can be evaluated with only 2n-q chain multiplications by using rules (56), (573, (58) m an appropriate fashion, so we must prove only that the rank r is 2 2n -q. The above discussion establishes the fact that rank(Tcij,k) = n; hence by Lemma T, any n X r realization A, B, C of (T,jk) has rank(C) = n. Our strategy will be to use Lemma T again, by finding a vector (~0,211,. . . , ~~-1) that has the following two properties: a) The vector (~0, ~1,. . . , u,-~)C has at most q + T - n nonzero coefficients. b) The matrix v(P) = C o < k < n vk Pk is nonsingular. - This and Lemma T will prove that q + T - n 2 n, since the identity shows how to realize the n X n X 1 tensor w(P) of rank n with q + r -n chain multiplications. We may assume for convenience that the first n columns of C are linearly independent. Let D be the n X n. matrix such that the first n columns of DC are equal to the identity matrix. Our goal will be achieved if there is a linear combination (~0, ~1, . . . , ~1,~~) of at most q rows of D, such that v(P) is nonsingular; such a vector will satisfy conditions (a) and (b). Since the rows of D are linearly independent, no irreducible factor PA(U) divides the polynomials corresponding to every row. Given a vector w = (wet Wl , . . . , w,–I), let covered(w) be the set of all X such that w(u) is not a multiple of PA(U). From two vectors v and w we can find a linear combination v + cyw such that covered(v + cyw) = covered(v) U covered(w), (74) for some a! in the field. The reason is that if X is covered by v or w but not both, then X is covered by v + aw for all nonzero cy; if X is covered by both v and w but X is not covered by v + QW, then X is covered by v + /3w for all p # cy. By trying q + 1 different values of (u, at least one must yield (74). In this way we can systematically construct a linear combination of at most q rows of D, covering all X for 1 5 X < q. I

Best web site - 494 ARITHMETIC 4.6.4 illustrated important techniques that are

Friday, August 24th, 2007

494 ARITHMETIC 4.6.4 illustrated important techniques that are useful in a variety of other situations. For example, Winograd has used this approach to compute Fourier transforms using significantly fewer multiplications than the fast Fourier transform algo- rithm needs (see exercise 53). Let us conclude this section by determining the exact rank of the n X n X n tensor that corresponds to the multiplication of two polynomials modulo a third, = (ZO + ZIU + . . . + ~~-lu~-~)(yo + ylu + . . . + yn.-iu+ )modp(u). (69) Here p(u) stands for any given manic polynomial of degree n; in particular, p(u) might be 2~~ -1, so one of the results of our investigation will be to deduce the rank of the tensor corresponding to cyclic convolution of degree n. It will be convenient to write p(u) in the form p(u) = un -pn-lun–l -. . . - p1u -pi-J, (70) so that un E po + plu + . . . + pn.–l~n-l (modulo p(u)). The tensor element tijk is the coefficient of uk in ui+j mod p(u); and this is the element in row i, column Ic of the matrix Pj, where 0 1 0 . . . 0 0 0 1 . . . 0 p= i ; i (71) 0 0 0 . . . i i PO Pl P2 1. * Pn-1 I is called the companion matrix of p(u). (The indices i, j, k in our discussion will run from 0 to n -1 instead of from 1 to n.) It is convenient to transpose the tensor, for if Tijk = t&j the individual layers of (Tijk) for /C = 0, 1, 2, . . . , n - 1 are simply given by the matrices I P P2 . . . pn–l. (72) The first rows of the matrices in (72) are respectively the unit vectors (l,O, 0,. . . ,O), (O,l,O,. . . ,O), (O,O, 1,. . . ,O), . . . , (O,O,O,. .., l), hence a linear combination such as C o< k

Dedicated web hosting - 4.6.4 EVALUATION OF POLYNOMIALS 493 treating the subscripts

Thursday, August 23rd, 2007

4.6.4 EVALUATION OF POLYNOMIALS 493 treating the subscripts modulo n, since tijk = 1 if and only if i + j = k (modulo n). Thus if (ail), (bjl), (ckl ) is a realization of the cyclic convolution, so is (C&l), (b-j,l), (ail); in particular, we can realize (61) by transforming (64) into Now all of the complicated scalars appear in the A matrix. This is important in practice, since we often want to compute the convolution for many values of yo, yl, y2, ys but for a fixed choice of Q,, ~1, ~2, 5s. In such a situation, the arithmetic on x s can be done once and for all, and we need not count it. Thus (66) leads to the following scheme for evaluating the cyclic convolution WJO, 201, w2, w3 when x0, x1, x2, x3 are known in advance: 31 = Yo +y2, s2 = Yl + y3, $3 = 31 + s2, s4 = Sl -s2, s5 = Yo -Y2, 36 = Y3 -Yl, 37 = +j -~96; ~0+~1+~2+~3 =3-zl+zZ-Q ~O+~l–zZ–zS ml = 4. ~3, m2 = 4~ ~4, m3 = 2 . ~5, m4 = —~o+~l+~r-% . sg, m5 = y . ST; tl=ml+m2, t2=m3+m5, t3=ml–2, t4=m4-m5; wo = t1 + t2, Wl = t3+t4, w2 = t1 -t2, w3 = t3 -t4. (67) There are 5 multiplications and 15 additions, while the definition of cyclic con- volution involves 16 multiplications and 12 additions. We will prove later that 5 multiplications are necessary. Going back to our original multiplication problem (52), using (60), we have derived the realization This scheme uses one more than the minimum number of chain multiplications, but it requires far fewer parameter multiplications than (55). Of course, it must be admitted that the scheme is still rather complicated: If our goal is simply to compute the coefficients ~0, zl, . . . , 25 of the product of two given polyno- mials (x0 + xru + xzu2)(y~ + yru + yzu2 + ysu3), as a one-shot problem, our best bet is still to use the obvious method that does 12 multiplications and 6 additions-unless (say) the x s and y s are matrices. Note that if the x s are fixed as the y s vary, the new scheme does the evaluation with 7 multiplications and 17 additions. Even though (68) isn t especially useful as it stands, our derivation has

492 ARITHMETIC 4.6.4 kth coefficient wk is the (Cpanel web hosting)

Wednesday, August 22nd, 2007

492 ARITHMETIC 4.6.4 kth coefficient wk is the bilinear form c ~i yj summed over all i and j with i + j G k (modulo n). The cyclic convolution of degree 4 can be obtained by applying rule (57). The first step is to find the factors of u4 -1, namely (U - l)(u + 1)(u2 + 1). We could write this as (u -1)(u2 + l), then apply rule (57), then use (57) again on the part modulo (u2 -1) = (U -l)(u + 1); but it is easier to generalize the Chinese remainder rule (57) directly to the case of several relatively prime factors. For example, we have z(u)y(u)modql(u)q2(u)q3(u) = al(U)q2(U)q3(U)(Z(U)Y(U)modql(u)) + a2(u)ql(u)q3( 11)(2(U)Y(U)modq2(u)) ( + u3(u)q1(11)42(u)(2(u)y(u)modg3cu,,> modql(uMuMu), (62) where ul(uMu)q3(u) + 4u)ql(u)q3(u) -t m(u)ql(u)q2(u) = 1. (The latter equation can be understood in another way, by noting that the partial fraction expansion of l/ql(u)q2(u)qdu) is ul(~)/ql(~)+u2(~)/q2(~)+~3(~)/q3(~). When each of the q s is a linear polynomial u -cy2, the generalized Chinese remainder rule reduces to ordinary interpolation as in Eq. (41), since f(u) mod (U -oi) = f(oi).) From (62) we obtain z(u)y(u) mod (u4 -1) = ( U3-tv~+U+1 Z( l)y( 1) -U3-Ui+U-1 z(-l)y(-1) -v(z(u)y(u) mod (u + 1))) mod (u4 -1). (63) The remaining problem is to evaluate z(u)y(u) mod (u2 + l), and it is time to invoke rule (58). First we reduce Z(U) and y(u) mod (u2 + l), obtaining X(U) = (TO -~2) -I-(~1 -23)~~ Y(u) = (YO -YZ) + (yl -y3)u. Then (58) tells us to evaluate X(U)Y(u) = 20 + Zru + Z2u2, and to reduce this in turn modulo (u2 + l), obtaining (20 -Z2) + 2 1~. The job of computing X(u)Y(u) is simple; we can use rule (56) with p(u) = U(U f 1) and we get 20 = XOY,, Z-1 = XOYO -(xo-x1)(Yo-Y~) + X,Y,, 2, = X,Y,. (We have thereby rediscovered the trick of Eq. 4.3.3-2 in a more systematic way.) Putting everything together yields the following realization A, B, C of degree-4 cyclic convolution: Here i stands for -1 and 2 for -2. The tensor for cyclic convolution of degree n satisfies tz,j,k = h-j,%, (65)

4.6.4 EVALUATION OF POLYNOMIALS 491 where ~(U)?(U) +

Tuesday, August 21st, 2007

4.6.4 EVALUATION OF POLYNOMIALS 491 where ~(U)?(U) + b(u)q(u) = 1; this is essentially the Chinese remainder theorem applied to polynomials. In the third place, to evaluate the coefficients of z(~)y(~)modp(u) when p(u) has only one irreducible factor over the field of coefficients, one can use the identity duly(u) mod p(u) = (4~) mod du))(y(u) moddu)) mod p(u). (58) Repeated application of (56) (577, and (58) tends to produce efficient schemes, as we shall see. For our example problem (52), let us choose P(U) = u5 -u and apply (56); the reason for this choice of p(u) will appear as we proceed. Writing p(u) = u(u -l), rule (57) reduces to x(u)y(u) mod ~(21~ -1) = (-(U -l)zoye + u4(x(u)y(u) mod (u -1))) mod (u5 -u). (59) Here we have used the fact that z(u)y(u) mod u = zeyo; in general it is a good idea to choose p(u) in such a way that p(O) = 0, so that this simplification can be used. If we could now determine the coefficients we, wi, ws, ws of the polynomial z(u)y(u)mod(~~ -1) = wo + wiu + wsu2 + wsu3, our problem would be solved, since u4(z(u)y(u) mod (u4 -1)) mod (ti5 -u) = wou4 + wiu + w2u2 + w3u3, and the combination of (56) and (59) would reduce to (This formula can, of course, be verified directly.) The problem remaining to be solved is to compute ~(u)y(u)mod(u~ -1); and this subproblem is interesting in itself. Let us momentarily allow X(U) to be of degree 3 instead of degree 2. Then the coefficients of z(u)y(u) mod (u -1) are respectively zoyo + TlYQ + 572Y2 + 23Y1, zO!/l + ZlYO + 22Y3 + z3!/2, zoy2 + ZlYl + Z2Yo + Z3Y3, TOY3 + XlY2 + Z2Yl + Z3Y0, and the corresponding tensor is In general when deg(z) = deg(y) = n-l, the coefficients of z(u)y(u) mod (~ -1) are called the cyclic convolution of (~0, x1, . . . , qP1) and (ye, yl, . . . , ynPl). The

490 ARITHMETIC 4.6.4 For brevity, we may write (Abyss web server)

Tuesday, August 21st, 2007

490 ARITHMETIC 4.6.4 For brevity, we may write (52) as z(u)y(u) = Z(U), letting Z(U) denote the polynomial ~0 + zlu + x2u2, etc. Note that we have come full circle from the way we began this section, since Eq. (1) refers to u(z), not x(u); the notation has changed because the coefficients of the polynomials are now the variables of interest to us. If each of the six matrices in (53) is regarded as a vector of length 12 indexed by (i, j), it is clear that the vectors are linearly independent, since they are nonzero in different positions; hence the rank of (53) is at least 6 by Lemma T. Conversely, it is possible to obtain the coefficients ~0, 21, . . . , zs by making only six chain multiplications, for example by computing X(O)Y(O), XP)Y(l), -. .P x(5)!/(5); (54) this gives the values of z(O), z(l), . . . , z(5), and the formulas developed above for interpolation will yield the coefficients of z(u). The evaluation of x(j) and y(j) can be carried out entirely in terms of additions and/or parameter multiplications, and the interpolation formula merely takes linear combinations of these values. Thus, all of the chain multiplications are shown in (54) and the rank of (53) is 6. (We used essentially this same technique when multiplying high-precision numbers in Algorithm 4.3.3C.) The realization A, B, C of (53) sketched in the above paragraph turns out to be Thus, the scheme does indeed require the minimum number of chain multiplica- tions, but it is completely impractical because it involves so many additions and parameter multiplications. We shall now study a practical approach to the generation of more efficient schemes, suggested by S. Winograd. In the first place, to evaluate the coefficients of x(~)y(u) when deg(z) = m and deg(y) = n, one can use the identity 44~04 = (xb.4~(4 mod p(4) + z~Y~P(~, (56) when p(u) is any manic polynomial of degree m+n. The polynomial p(u) should be chosen so that the coefficients of x(u)y(~)modp(u) are easy to evaluate. In the second place, to evaluate the coefficients of x(u)y(u)modp(u), when the polynomial p(u) can be factored into q(u)r(u) where gcd(q(u), T(U)> = 1, one can use the identity x(~>Y(u) mod q(u)r(u) = (4u)r(u)(x(u)~(u) mod q(u)) + W&~(XWY(~ mod +4)) mod &4@4 (57)

4.6.4 EVALUATION OF POLYNOMIALS 489 Nauk 27,5 (1972),

Monday, August 20th, 2007

4.6.4 EVALUATION OF POLYNOMIALS 489 Nauk 27,5 (1972), 249-250; J. E. Hopcroft and J. Musinski, SIAM J. Computing 2 (1973), 159-173.1 When the tensor (+) can be represented as a sum (49) of T rank-one tensors, let A, B, C be the matrices (ail), (byl), (ckl) of respective sizes m X T, n X T, s X r; we shall say that A, B, C is a realization of the tensor (t%jk). For example, the realization of 2 X 2 matrix multiplication in (50) can be specified by the matrices An m X n X s tensor (&k) can also be represented as a matrix by grouping its subscripts together. We shall write (tci3)h) for the mn X s matrix whose rows are indexed by the pair of subscripts (i, j) and whose columns are indexed by k. Similarly, (tk(ij)) stands for the s X mn matrix that contains tzjk in row k and column (i, j); (t(ik)j) is an ms X n matrix, and so on. The indices of an array need not be integers, and we are using ordered pairs as indices here. We can use this notation to derive the following simple but useful lower bound on the rank of a tensor. Lemma T. Let A, B, C be a realization of an m x n x s tensor (t%jk). Then rank(A) 2 rank(+)), rank(B) 2 rank(tjcik,), and rank(C) > rank(tk(ij)); consequently rank(&) 2 max(rank(h(,k)), rank($(ik)), rank(tk(ij))). Proof. It suffices by symmetry to show that r > rank(A) 2 rank(&)). Since A is an m x r matrix, it is obvious that A cannot have rank greater than r. Furthermore, according to (49), the matrix (ti(jk)) is equal to A&, where Q is the T X ns matrix defined by &l(j,k) = b31 ckl . If 17: is any row vector such that zA = 0 then zAQ = 0, hence all linear dependencies in A occur also in A&. It follows that rank(AQ) 5 rank(A). m As an example of the use of Lemma T, let us consider the problem of polynomial multiplication. Suppose we want to multiply a general polynomial of degree 2 by a general polynomial of degree 3, obtaining the coefficients of the product: (20 + XlU + ~zU )(Yo + YlU + Y2U2 + Y3U3) = zo + ZlU + 222 + z3u3 + z4u4 + 25u5. (52) This is the problem of evaluating six bilinear forms corresponding to the 3 X 4 X 6 tensor

488 ARITHMETIC 4.6.4 A nonzero (Web hosting resellers) tensor (tijk) is

Monday, August 20th, 2007

488 ARITHMETIC 4.6.4 A nonzero tensor (tijk) is said to be of rank one if there are three vectors (al,…, f-4, @l ,.*.,bn), (Cl,. . . , C,) such that tijk = ai bj ck for all i, j, k. We can extend this definition to all tensors by saying that the rank of (tijk) is the minimum number r such that (tijk) is expressible as the sum of r rank-one tensors in the given field. Comparing this definition with Eq. (49) shows that the rank of a tensor is the minimum number of chain multiplications in a normal evaluation of the corresponding bilinear forms. Incidentally, when s = 1 the tensor (tijk) is just an ordinary matrix, and the rank of (tijl) as a tensor is the same as its rank as a matrix (see exercise 49). The concept of tensor rank was introduced by F. L. Hitchcock in J. Math. and Physics 6 (1927), 164-189; its application to the complexity of polynomial evaluation was pointed out in an important paper by V. Strassen, J. fiir die reine und angew. Math. 264 (1973), 184-202. Winograd s scheme (35) for matrix multiplication is abnormal because it mixes z s and y s before multiplying them. The Strassen-Winograd scheme (36), on the other hand, does not rely on the commutativity of multiplication, so it is normal. In fact, (36) corresponds to the following way to represent the 4 X 4 X 4 tensor for 2 x 2 matrix multiplication as a sum of seven rank-one tensors: (Here i stands for -1.) The fact that (49) is symmetric in i, j, Ic and invariant under a variety of transformations makes the study of tensor rank mathematically tractable, and it also leads to some surprising consequences about bilinear forms. We can permute the indices i, j, k to obtain transposed bilinear forms, and the transposed tensor clearly has the same rank; but the corresponding bilinear forms are conceptually quite different. For example, a normal scheme for evaluating an (m X n) times (n X s) matrix product implies the existence of a normal scheme to evaluate an (n x s) times (s x m) matrix product, using the same number of chain multiplications. In matrix terms these two problems hardly seem to be related at all-they involve different numbers of dot products on vectors of different sizes-but in tensor terms they are equivalent. [Cf. V. I^a. Pan, Uspekhi Mat.

4.6.4 EVALUATION OF POLYNOMIALS 487 A remarkable (Web hosting domain) modification

Sunday, August 19th, 2007

4.6.4 EVALUATION OF POLYNOMIALS 487 A remarkable modification of the method of divided differences, an extension that applies to rational functions instead of to polynomials, was introduced by T. N. Thiele in 1909. Thiele s method of reciprocal differences is discussed in L. M. Milne-Thompson s Calculus of Finite Differences (London: MacMillan, 1933), Chapter 5; see also R. W. Floyd, CAClM 3 (1960), 508. *Bilinear forms. Several of the problems we have considered in this section are special cases of the general problem of evaluating a set of bilinear forms zk= c tijkxiyjt for 1 5 k 5 s, (45) l

486 ARITHMETIC 4.6.4 For example, suppose that we (Web design)

Saturday, August 18th, 2007

486 ARITHMETIC 4.6.4 For example, suppose that we want to estimate $! from the values of O!, l!, 2!, and 3!, using a cubic polynomial. The divided differences are X Y Y Y Y f 0 1 1 1 0 1 2 2 3 6 so dOI(x) = uqx) = 1, &l(x) = 4x(x -1) + 1, [31(X) = &x(x -1)(x -2) + ax(a:-l)+l. Setting x = 4 in the latter polynomial gives -& + 3 + 1 = 1.25; presumably the correct value is I($ + 1) = $fi z 1.33. An important and somewhat surprising application of polynomial interpola- tion was discovered by Adi Shamir [CACM 22 (1979), 612-6131, who observed that polynomials mod p can be used to share a secret. This means that we can design a system of secret keys or passwords such that the knowledge of any n + 1 of the keys enables efficient calculation of a magic number N that un- locks a door (say), but the knowledge of any n of the keys gives no information whatsoever about N. Shamir s amazingly simple solution to this problem is to choose a random polynomial U(X) = u,xn +. . . + uix + uc, where 0 5 ui < p and p is a large prime number. Each part of the secret is an integer z in the range 0 < x < p, together with the value of u(x)modp; and the supersecret number N is the constant term uc. Given n + 1 values u(xi), we can deduce N by interpolation. But if only n values of u(x~) are given, there is a unique polynomial U(X) having a given constant term but the same values at xl, . . . , x,; thus the n values do not make one particular N more likely than any other. It is instructive to note that evaluation of the interpolation polynomial is just a special case of the Chinese remainder algorithm of Section 4.3.2 and exercise 4.6.2-3, since we know the values of u[ l(x) modulo the relatively prime polynomialsa:-xc, . . . . x-xz,. (As we have seen in Section 4.6.2, f(x) mod (x-xc) = 1(x0).) Under th is interpretation, Newton s formula (42) is precisely the mixed-radix representation of Eq. 4.3.2-24; and 4.3.2-23 yields another way to compute ~0, . . . , cy, using the same number of operations as (44). By applying fast Fourier transforms, it is possible to reduce the running time for interpolation to O(n (logn)2), and a similar reduction can also be made for related algorithms such as the solution to the Chinese remainder problem and the evaluation of an nth degree polynomial at n different points. [See E. Horowitz, Inf. Proc. Letters 1 (1972), 157-163; R. Moenck and A. Borodin, J. Comp. Syst. Sci. 8 (1974), 336-385; and A. Borodin, Complexity of Sequential and Parallel Numerical Algorithms, ed. by J. F. Traub (New York: Academic Press, 1973), 149-180.1 However, this must be regarded as a purely theoretical possibility at present, since the known algorithms have a rather large overhead factor that makes them unattractive unless n is quite large.