A Class Of Array Codes Correcting Multiple ... - Semantic Scholar

Report 2 Downloads 101 Views
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

1843

A Class of Array Codes Correcting Multiple Column Erasures Osnat Keren and Simon Litsyn, Member, IEEE

Abstract—A family of binary array codes of size (p 0 1) 2 n, p a prime, correcting multiple column erasures is proposed.

The codes coincide with a subclass of shortened Reed–Solomon codes and achieve the maximum possible correcting capability. Complexity of encoding and decoding is proportional to rnp, where r is the number of correctable erasures, i.e., is simpler than the Forney decoding algorithm. The length n of the codes is at most 2p 0 1, that is, twice as big as the length of the Blaum–Roth codes having comparable decoding complexity. Index Terms—Array codes, burst correction, erasures correction, decoding, Reed–Solomon codes.

I. INTRODUCTION

C

ONSIDER a code consisting of binary arrays of size , where the first columns carry information. The number of codewords is thus . Let stand for such a code capable to correct erasures of columns. The problem of constructing is equivalent to finding a code over GF of length and minimum distance . The maximal correcting capability is achieved if it is maximum-distance separable (MDS), i.e., it corrects up to erasures. Such codes find applications in storage systems, communications over parallel channels, and packet transmission in networks (see, e.g., [3]–[6], [8], [11]–[14]). A code for can be obtained by shortening the Reed–Solomon code over GF . The complexity of the best known algorithms for decoding Reed–Solomon codes is proportional to bit operations (see, e.g., [1]). However, it is possible to construct codes achieving the maximal error-correcting capability, and at the same time having smaller complexity of decoding. In earlier papers [3] and [9] (see also [2], [8], [12], and reference therein), such codes have been presented. In particular, Blaum and Roth [3] described a family of codes of length at most , where is a prime, having decoding complexity proportional to . The decoding procedure uses only cyclic shifts and XOR operations. The penalty for reducing complexity is a restriction on the maximal length of the codes. In this paper, we introduce a family of codes, of length with parity symbols (binary columns), where is a prime. The maximal length of the codes is twice as big as the one of the Blaum–Roth codes, whereas the decoding complexity remains of the same order. Manuscript received May 20, 1996; revised April 15, 1997. The authors are with the Department of Electrical Engineering-Systems, Tel-Aviv University, Ramat-Aviv 69978, Tel-Aviv, Israel. Publisher Item Identifier S 0018-9448(97)06709-6.

The simplification is achieved by considering the columns as elements of GF and performing the calculations over a ring containing the field. In general, correcting erasures is simpler than correcting errors since the locations of the erroneous columns are known. By Forney [7], the contribution of each erased column to the syndrome can be singled out. Indeed, it can be seen as a system of binary equations. The matrix of the system is a product of sparse circulant matrices, each of them having a structure corresponding to the case of two erasures. Hence retrieving erasures can be performed iteratively by applying the algorithm for decoding two erasures times. Correcting erasures is, therefore, times as complex as correcting two erasures. Decoding two erasures involves Gaussian elimination. We simplify the triangulation process by rearranging the order of rows and columns. Indeed, we choose a relevant field basis such that the matrix has a narrow strip of along the diagonal. Clearly, the thinner the strip is, the less calculations are required. We show that there always exists a basis that leads to a narrow enough strip. Notice that encoding can be implemented using the decoding algorithm. Indeed, we may consider the redundant columns as being erased. The paper is organized as follows. In Section II, we introduce the codes and show that they are MDS. In the next section, we give an example of decoding two erasures in a array code. Then, in Section IV, we discuss properties of the fields and their enveloping rings. In Section V, we formalize the decoding scheme for the case of two erasures. Finally, in Section VI, we generalize the algorithm to decode an arbitrary number of erasures.

II. CODE CONSTRUCTION Let be a prime number such that is primitive in GF . For such , the polynomial is irreducible (see Lemma 3 in the Appendix). Denote by the field GF defined by . Since is a factor of , it is an irreducible nonprimitive polynomial with a root, say , of order . Let be the polynomial ring modulo . To avoid confusion between the elements of and , we use Greek letters for the elements of , and bold italic letters for the elements of ; binary coefficients are underlined. With a slight abuse of notation we identify the vectorial and polynomial representation of the field and ring elements.

0018–9448/97$10.00  1997 IEEE

1844

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

Let be the cyclic multiplicative group of order . Since is the minimal polynomial of , every collection of elements of is linearly independent. The set can be used as a basis for . Denote by the set . Clearly, , otherwise, there exists a polynomial of degree at most with a root , contradicting that is the minimal polynomial of . Let be the union of and , . As it is shown later, consists of the elements whose representation in has at most two nonzero coefficients. Now, we define an array code of length with redundant symbols. The parity check matrix of the code is shown at the bottom of this page. Here and . The matrix is the parity-check matrix of a shortened Reed–Solomon code over , and therefore Lemma 1: The extended array code is an MDS code of length over , with minimum distance . III. EXAMPLE OF DECODING THE

CODE

To illustrate the idea of the decoding algorithm we start with an example. Let . It is easy to verify that is primitive in GF . The field GF is defined by

Let be a root of , of order Clearly, is also a root of . We define the code by the following parity check matrix :

in

: (2)

namely, (3) This is equivalent to (4) (5) where . It is convenient to solve (5) in the enveloping ring , where multiplication by a power of is simply a cyclic shift. Recall that we can return to the original field by reducing modulo the polynomial . Equation (5), in terms of elements in , is of the form (6) In the next section, we will show how to map elements in to elements in and vice versa using an operator . Moreover, we will prove that, since GCD , there exists an that satisfies (6) with . Equation (6) can be written as a set of eleven equations over GF

.. .

.

.. . (7) (1)

Let , , be the transmitted codeword, and let be the received word. The vector equals , except in the coordinates where erasures have occurred. At these coordinates we set to be . Let be the syndrome vector, . Assume that and were erased, i.e., we have erasures in positions and . The vector

satisfies . To reconstruct the value of the erased symbols we have to solve the following set of linear equations

.. .

.. .

.. .

.. .

which is equivalent to (8) at the top of the following page. The matrix is defined by the locator polynomial . Notice that the rows of are cyclic shifts of the first row. Moreover, the matrix is of the form , where is (upper) triangular matrix, and is the matrix the containing the remaining rows. In order to find we need to get rid of the nonzero elements below the main diagonal of the six rows of . We use the term “triangulate” for the process of eliminating the zeroes below the main diagonal. We now show how the size of can be reduced to three rows. Recall that is defined by the equation (9) or

.. .

.. .

KEREN AND LITSYN: ARRAY CODES CORRECTING MULTIPLE COLUMN ERASURES

1845

(8)

(10) Let us define the following permutation: . Namely,

where

and

(13)

(14) Equation (8) now becomes: is triangular, and we get The matrix elimination starting with

by successive

.. . (15) (11) The element The size of the triangular submatrix is , and the size of is . The number of rows to be triangulated has been reduced to three. The matrix has two parts, . contains the row that has one on both sides of the diagonal. is the submatrix of the two rows that have two ’s to the left of the diagonal. Notice that the second row of is a cyclic shift of the first one. Clearly, it is sufficient to define how to triangulate the first row of , and then to apply it to the second row. To make triangular in respect to , we need to XOR rows and with the row . To triangulate row , the first row of , we need to XOR rows and with the row . In order to triangulate the second row of we simply have to XOR it with rows shifted by one, namely, XOR rows , and with the row . We get (12)

is obtained from

by the inverse permutation (16)

The values of the erased columns are and

IV. PROPERTIES

OF THE

ENVELOPING RING

The reduced complexity of decoding is achieved by performing the calculations in the enveloping ring. In this section, we describe some properties of . Recall that is the field defined by and is the enveloping ring of polynomials modulo . As before, we use Greek letters for the elements of and bold italic letters for the elements of . Let and be the vectors and of the relevant size (which is clear from the context). Let

1846

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

be the vector of coefficients of

Define

TABLE I

by

(17) , Notice that there are two values of such that namely, and . The basis elements are mapped to themselves, for . Furthermore, let us define the rotation operator as (18) where the indices are regarded modulo . Multiplication of by is simply a cyclic shift of , and . Moreover, since , it is easy to verify that

(19) is . where, by convention, The conventional representation of an element

is a polynomial in

However, can be also seen as a polynomial in any . We can move from one representation to the other by applying the permutation defined below on the vector of coefficients. Define , and . is a bijection (one-to-one and onto), therefore. Clearly,

Denote by

the permutation of the coordinates of

(20)

Lemma 2: For any , there exists such that , where . The proof follows directly from Lemma 6 in the Appendix. Table I lists the values of for which the width of is minimal. For example, let , and

an Indeed,

(21) in . Denote For example, consider the element by the maximal number of successive zero coordinates of where the indices are regarded modulo . Define the width of as .

Decoding of a code defined over usually involves arithmetic operations in that field. To find the value of the erased symbol, say , we need to solve the equation (22)

KEREN AND LITSYN: ARRAY CODES CORRECTING MULTIPLE COLUMN ERASURES

where and are elements of . There exists a unique solution to (22), but finding the inverse of in is quite a complicated task. Clearly, it would be much easier to solve it in , where the arithmetic involves simple binary operations. The question is whether there exists a solution to (22) in . By Lemma 5 in the Appendix, if (22) is transformed into (23) and is where either or (depending on GCD of polynomials defined in the lemma), then there exists which satisfies (23). For example 1) Consider the equation in . GCD therefore,

1847

Let

be the values of the erased columns at positions and , respectively. Algorithm A: When both erasures occur in the left part or in the right part, i.e., , or , the linear equations coincide with those in [3]

or, equivalently, (26) (27) where

, and

may be

or . The equation transforms to in . 2) Consider the equation in .

. (28)

GCD To simplify the computation we define

since for any , has an even number of terms. Therefore, the equation transforms to in .

(29)

V. CORRECTING TWO ERASURES In this section, we give a formal description of the decoding of two erasures in the code. Let be the transmitted codeword. Assume that two erasures occurred in coordinates and . Let be the received codeword with zero columns at locations and . If , then one erasure occurred, and the value of the erased column is as defined below. Let . The syndrome , is defined by , namely,

(24) Note that if

then,

We transform (27) to

as follows: (30)

where or . So the algorithm proposed by Blaum and Roth [3] can be used here. The complexity of decoding in terms of the number of bit XOR operations is . The following table presents the complexity of each step of the algorithm. Description Syndrome calculation (24) Calculate (28) Extract (30) Inverse transaform, extract

Algorithm A

and

Algorithm B is invoked when the erasures are located one in each part. Assume that one erasure occurred at position , , in the left part of and the second erasure occurred at coordinate , , in the right part of . The erased symbols, say and , satisfy the following set of linear equations:

. Otherwise

and we get a set of two linear equations over (25) has two parts: the Recall that the parity-check matrix columns in the left part are of the form , and the columns in the right one are of the form . We split the decoding algorithm into two different algorithms. Algorithm A is invoked when the two erasures are located in the same part. Algorithm B is used when the erasures are located one at each side.

(31) or (32) (33) where

.

1848

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

By (23), translating (33) to

leads to the following: (34)

, , and by Lemma 5, where We can also write it as a set of binary equations

and expressed in terms of

. Equation (35) and

is (38)

. is defined by the permutation (35)

is a matrix over GF . The matrix is where defined by the locator polynomial , GF , and all its cyclic shifts. Notice that is partially triangular, , the size of the triangular part is (without loss of generality we assume that ). To extract we triangulate . Clearly, if we start with a matrix which is partially triangular, we can reduce the complexity of decoding. Our goal is then to minimize the width of the locator polynomial that defines . We pick a permutation for which , GF , is of width smaller than . The matrix , corresponding to the last polynomial, has less than rows which have not been triangulated. Now we give a description of Algorithm B. • Step 1: Calculate the syndrome (24) in the ring; Let . We have

Indeed, and . • Notice that is also of the form and is partially triangular, but now the size of the nontriangular part is . less than • Step 4: Extract . We use the cyclic structure of to eliminate the nonzero elements of that are placed under the main diagonal of . The matrix has two parts . contains rows with one placed below the diagonal. contains rows with two ’s below the diagonal. Notice that rows of are cyclic shifts of the first row of . So, it is sufficient to describe how to eliminate ’s preceding the diagonal, and apply the same procedure to the remaining rows. Denote by the matrix whose rows are triangulated in respect to . Equation (35) transforms into (39)

(36)

where (40)

• Step 2: Calculate in the ring, . • Step 3: Find a relevant permutation . The permutation that minimizes the width of is determined uniquely by the values of and . It can be stored in a table (e.g., Table I), or it can be computed in real time as follows: we define the absolute value of as

a) Find

(i.e., find

such that

• Step 6: Compute . We compute and apply inverse transformation from to to obtain and , and

). b) Calculate

c) Calculate

,

, and

.

and are taken Unless the values of from a table, they can be computed by no more than operations over GF . An efficient algorithm can be based on continuous fractions. We omit details. Recall that we picked which minimizes

therefore, may be negative. In that case, we need to rotate the rows of the matrix. Let be defined as follows: otherwise

Processing one row requires at most binary operations. Therefore, the complexity of triangulating is upperbounded by the complexity of triangulating a matrix. • Step 5: Apply the inverse permutation to

(37)

The complexity of Algorithm B is dominated by the complexity of calculating the syndrome (Step 1), therefore, the complexity of Algorithm B is of order . The following table gives the complexity of each step of decoding in Algorithm B.

Description Syndrome calculation Calculate Apply relevant permutation Extract Inverse permutation to Inverse transform, extract and

Step 1 2 3

Algorithm B

additions in GF 4 5 6

KEREN AND LITSYN: ARRAY CODES CORRECTING MULTIPLE COLUMN ERASURES

VI. CORRECTING

ERASURES

1849

Let

In this section, we address decoding erasures. Recall that is a binary block code with columns which are elements of , and parity symbols. Let be the transmitted codeword and be the received word. As before, the word equals to in all coordinates but those where the erasures occurred. In these coordinates, we substitute . Let , be the syndrome,

(46) We now describe Algorithm C for decoding erasures: Algorithm C: • Step 1: Syndrome calculation (41): Let be a vector of elements over .

(41)

For

columns have been erased at Assume now that coordinates . Let be the value of the erased column in the position . The elements satisfy a set of linear equations over

For

to

do:

to For

do: to

do:

(42) Indeed, it is sufficient to compute only the first syndromes. Recall that the column vector can be represented as a power of a primitive , . Equation (42) is

.. .

.. .

.. .

.. .

.. .

Clearly, the calculation of the syndrome requires no more than binary operations. • Step 2: Calculate —the right side of (44): We start by calculating in the ring by multiplying by as follows: (done once for all erasures):

Calculate For .. .

(43)

Clearly, system (43) has a unique solution. From Forney’s algorithm for decoding BCH codes we have that each satisfies

to

do:

Notice that

therefore, the coefficient of

is

(44) where

is the coefficient of in the polynomial . The polynomial is the erasure-evaluator polynomial . The polynomial is the syndrome polynomial, and is the erasure-locator polynomial defined as

(47) This yields the following procedure for calculating in the ring: to

For

do (for each erasure):

(45) For These polynomials are defined over the field. The mapping between a polynomial over and a polynomial over is defined as the mapping of the coefficients. For example,

to

do: .

• Step 3: Extract —the left side of (44). Notice that and are elements of , therefore, their sum , is either , or , or , where and . Now the “piling” algorithm proposed by Blaum and Roth [3] can be used:

1850

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997

is the ring of polynomials of degree over GF , modulo the polynomial , where is of degree . Clearly . are elements of . is a polynomial of degree . .

correcting erasures is performed iteratively by applying times. the algorithm of two-erasures decoding For to do (for each erasure): For If

and

to do: solve the equation using the relevant Algorithm A or B (Section V)

else . The complexity of the last step is times the complexity . Clearly, of Algorithm A or B, namely, it is of order the syndrome calculation (Step 1), requiring no more than binary operations, dominates the complexity of the whole decoding algorithm, i.e., the complexity of Algorithm C is of . order be the complexity of multiplication in . The best Let is . The following known estimate for table compares the complexities of the proposed algorithm and the Forney algorithm (see the bottom of this page). APPENDIX A. Irreducibility of Lemma 3: Let be a prime for which is a primitive element of GF . Then, the polynomial is irreducible over GF . GF be a root of Proof: Let . Since is prime, the element is of order . The , is determined degree of the minimal polynomial of , by the size of the cyclotomic coset containing , namely,

We now present two lemmas, the first one states that an can be solved equation that can be solved over the ring over the field . The second lemma states that there exists a mapping of equation over to equation over the ring such that the transformed equation has a solution in . Lemma 4: If there exists an satisfying the equation (50) satisfies in

then

It is simple to verify it by applying to both sides of (50). We now show that there is always such for which satisfy (50). Lemma 5: There exists a polynomial of degree and a polynomial such that (50) holds. Proof: Let GCD

GCD

is the smallest positive integer such that . Equivalently, or (49)

Since is primitive in GF , the smallest that satisfies (49) . Therefore, the degree of is . Indeed, is since the minimal polynomial is unique. B. Equation

(51)

is less than or equal to . Notice that Equation (50) has a solution if . Therefore, if equals , or, if , then for any , and in particular for , there is a solution to (50). Otherwise, a solution exists if (52)

(48) Here

the equation

, by the Euclidean algorithm, Since GCD there exists of degree at most such that (52) holds. Now, let and play the role of and , respectively. Since , the degree of is less than . Therefore, or . By Lemma 5, the equation in transforms to in , where , and is either or .

in

Let us use the following notations: is the field of polynomials of degree over GF , modulo an irreducible polynomial .

Description Syndrome calculation , right side of (44) , right side of (44) Product, left side of (44) Extract —left side of (44)

C. Representation of

in

The following two lemmas state that there exists a representation of , , as a polynomial in such that the width of is less than .

Forney

Algorithm C

KEREN AND LITSYN: ARRAY CODES CORRECTING MULTIPLE COLUMN ERASURES

Lemma 6: For any prime and GF , , such that

GF

there exists

1851

is (58)

(53) Proof: By the Dirichlet theorem, for every real number and an arbitrary , there exists a rational fraction , such that

Since each

defines uniquely , (58) is equivalent to (59)

By Lemma 6, there always exist such that . Therefore, Let be the real number exist and such that

and

, then there

(54) By (54),

is in the range (55)

is less than , we can apply modulo to both sides Since of (55), yielding . This completes the proof since we have shown that there exists an element in GF such that . We now prove Lemma 2 of Section II which states that for any , there exists an such that , where . Proof of Lemma 2: Recall that is a polynomial in , namely,

To simplify notation define

(56) This is equivalent to

(57) has the unique inverse, thus Every element of GF . The width of is upperbounded by , where

Table I lists for given minimizes .

and

GF

the values of

and . for which

ACKNOWLEDGMENT In the process of working on this paper, the authors enjoyed inspiring discussions with M. Blaum. They are also grateful to T. Kløve for very useful comments and suggestions. REFERENCES [1] R. E. Blahut, Theory and Practice of Error Control Codes. Reading, MA: Addison-Wesley, 1984. [2] M. Blaum, J. Bruck, and A. Vardy, “MDS array codes with independent parity symbols,” IEEE Trans. Inform. Theory, vol. 42, pp. 529–542, Mar. 1996. [3] M. Blaum and R. M. Roth, “New array codes for multiple phased burst correction,” IEEE Trans. Inform. Theory, vol. 39, pp. 66–77, Jan. 1993. [4] M. Blaum, P. G. Farrell, and H. C. A. van Tilborg, “Multiple burstcorrecting array codes,” IEEE Trans. Inform. Theory, vol. 34, pp. 1061–1066, Jan. 1988. [5] D. Cohn and R. L. Stevenson, “Using redundancy to speed up disk arrays,” in Communications and Cryptography. Norwell, MA: Kluwer, 1994, pp. 59–67. [6] P. G. Farrell and S. J. Hopkins, “Burst-error-correcting array codes,” Radio Elect. Eng., vol. 52, pp. 188–192, 1982. [7] G. D. Forney, “On decoding BCH codes,” IEEE Trans. Inform. Theory, vol. IT-11, pp. 549–557, 1965. [8] S. J. Hong and A. Patel, “A general class of maximal codes for computer applications,” IEEE Trans. Comput., vol. C-21, no. 12, pp. 1322–1331, 1972. [9] O. Keren and S. Litsyn, “Codes for correcting phased burst erasures,” submitted for publication. [10] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-Correcting Codes. Amsterdam, The Netherlands: North-Holland, 1977. [11] A. M. Patel, “Adaptive cross parity code for a high density magnetic tape subsystem,” IBM J. Res. Develop., vol. 29, pp. 546–562, 1985. [12] T. R. N. Rao and E. Fujiwara, Error Control Coding for Computer Systems. Englewood Cliffs, NJ: Prentice-Hall, 1989. [13] C. V. Srinivasan, “Codes for error correcting in high speed memory system, Pt. II: Correction of temporary and catastrophic errors,” IEEE Trans. Comput., vol. C-20, pp. 1514–1520, Dec. 1971. [14] G. V. Zaitsev, V. A. Zinoviev, and N. V. Semakov, “Codes with minimal density of checks for correcting erroneous bytes, erasures and defects,” Probl. Inform. Transm., vol. 19, no. 3, pp. 29–37, 1983.