Information-theoretic approach to unimodal density estimation ...

Report 2 Downloads 66 Views
824

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 3, MAY 1995

So the error positions are (1. 3. 1). (1. (j’. l ) , (1. J4. 1).(Note that changing SO to 1 in this example will lead to

E? = 1

+ y/:

Ex = ( I + y / t ) ( l + . r y / : ’ )

+ !//Z)(l+

El0 = (1

.r2y/:3)

all R-multiples of Es. The error positions in this case are ( j. 1. l ) , (J2. 1. l ) , and ( J 4 , 1, l ) . )

VII. SUMMARY

In general, when decoding an error relative to an algebraic-geometric code C*(D. 7rtP), there is a vector space S ( g ) l of error-locator functions of dimension Z(m)- e. Most algorithms settle for any element of this space as an error-locator and deal with extraneous zeros later. If one considers the ideal Z of all error-locator functions, then there is a generating set of size at most p, the smallest positive pole size at P. The one-dimensional BerlekampMassey version of the Feng-Rao algorithm given here is sufficient to reasonably efficiently produce such a generating set, and the error positions (for any error of weight at most c < S”*, = m - 29 2, the designed distance of the code) will be exactly the common zeros L’(2) of those error-locator functions. (For further efficiency, “higher dimensional” BerlekampMassey algorithms can be worked out in a straightforward manner as well.) This Feng-Rao type algorithm gives the designed distance independent of the Riemann-Roch theorem, and the algorithm to get these is merely row-reduction with shifting (as with any BerlekampMassey type algorithm), coupled with a Feng-Rao majority-vote scheme to produce further syndromes. Moreover, such a strategy can be used on arbitrary divisors (5 (though at present it is not provable that one can achieve decoding up to the designed distance efficiently in this manner). The generating set found may, in addition, allow for efficient calculation of the common zeros. So the algorithm given here has the advantages that

J. Justesen, K. J. Larsen, H. E. Jensen, and T. Heholdt, “Fast decoding of codes from algebraic plane curves,” IEEE Trans. Inform. Theory, vol. 38, no. I , pp. 111-1 19, Jan. 1992. J. H. van Lint, “Algebraic geometry codes,” in Coding Theory and Design Theory, IMA Volumes in Mathematics and its Applications, vol. 20, D. Ray-Chaudhuri, Ed. New York: Springer-Verlag, 1988, pp. 137-1 62. J. L. Massey, “Shift-register synthesis and BCH decoding,” IEEE Trans. Inform. Theory, vol. IT-15, pp. 122-127, Jan. 1969. R. Pellikaan, “On a decoding algorithm for codes on maximal curves,” IEEE Trans. Inform. Theory, vol. 35, pp. 1228-1232, Nov. 1989. 0. Pretzel, Error-Correcting Codes and Finite Fields. Oxford, UK: Oxford Univ. Press, 1992. pp. 333-390. A. N. Skorobogatov and S. G. Vladut, “On the decoding of algebraic geometric codes,” IEEE Trans. Inform. Theory, vol. 36, pp. 1051-1060, Sept. 1990. M. A. Tsfasman and S. G. Vladut, Algebraic-Geometric Codes. Dordrecht, The Netherlands: Kluwer, 1992, pp. 99-388. S. G. Vladut, “On the decoding of algebraic geometric codes over Fq for q 2 16,” IEEE Trans. Inform. Theory, vol. 36, pp. 1461-1463, Nov. 1990. I. M. DUUrSmd, “Decoding codes from curves and cyclic codes,” thesis, Eindhoven, The Netherlands, 1993. B. Shen, “Algebraic-geometric codes and their decoding algorithm,” thesis, Eindhoven, The Netherlands, 1992.

i6:t,

+

1) it treats all projective points, 2) it decodes up to the designed minimum distance, 3) it uses a (one-dimensional) Berlekamp-Massey row-reduction algorithm to efficiently (that is with roughly what should be expected as a running time) row-reduce S , 4) it produces a small set of generators for the whole errorlocator ideal Z, rather than settling for a single error-locator function with possibly extraneous zeros. (A minimal Grobner basis can be extracted from this or produced directly from a BerlekampMassey type row-reduction algorithm that treats rows of the syndrome matrix as grids generated by the minimal nonzero elements of C( @ ) . and shifts in all the grid directions.)

REFERENCES E. Berlekamp, Algebraic Coding Theoiy. New York: McGraw-Hill, 1968, pp. 176191. G. L. Feng and T. R. N. Rao, “Decoding algebraic-geometric codes up to the designed minimum distance,” IEEE Trans. Inform. T h e o c , vol. 39, no. I , pp. 37-45, Jan. 1993. G. van der Geer and J. H. van Lint, Introduction to Coding Theory and Algebraic Geometry. Birkhauser, 1988, pp. 11-81. J. Justesen, K. J. Larsen, H. E. Jensen, A. Havemose, and T. Hflholdt, “Construction and decoding of a class of algebraic geometry codes,” IEEE Trans. Inform. theory, vol. 35, pp. 81 1-821, July 1989.

Information-Theoretic Approach to Unimodal Density Estimation Patrick L. Brockett, A. Chames, and Kwang H. Paick, Member, IEEE

Abstract-We extend the maximum entropy information-theoretic density estimation method to provide a technique which guarantees that the resulting density is unimodal. The method inputs data in the form of moment or quantile constraints and consequently can handle both data-derived and non-data-derivedinformation. Index Terms-Information Theory, MDI. density estimation,maximum entropy, unimodality.

I. INTRODUCTION In many problems encountered in engineering and signal processing, it is useful to estimate the probability density function corresponding to some random phenomenon under study. The density is known to be unimodal and the first few cumulants or certain percentiles are known or can effectively be estimated from the available data. However, the precise parametric formula for the generating density cannot be determined from physical considerations alone and often may not match any of the commonly assumed densities. If only certain moments or percentiles are given instead of the raw data then a nonparametric kernel density is not possible. If a usual maximum entropy solution is attempted (c.f., Burg [ 11, or Parzen [2]) the resulting density may not be unimodal. Manuscript received June 20, 1989; revised May 29, 1992. P. L. Brockett is with the Center for Cybernetic Studies, CBA 5.202 (B6501), The University of Texas at Austin, Austin, TX 78712-1 177 USA. A. Charnes (deceased) was with the Center for Cybernetic Studies, The University of Texas at Austin, Austin, TX 78712-1177 USA. K. H. Pdick is with the Department of Computer Science, High Energy Physics, Box 355, Prairie View A&M University, Prairie View, TX 774460355 USA. IEEE Log Number 9410400.

0018-9448/95$04.00 0 1995 IEEE

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 41, NO. 3 . MAY 1995

06

0.6 -

05 -

05 04 >,

-

,go4

c

U)

U

)

.

0

0.3

-

a

-

:

W

0 3 -

I

I

02 -



I

02 II

01

-

P’

O1

0

~

0

2

6

4

-

~~

8

0’ 0

10

2

/

\ B 4

x

0

8

6

10

X

Fig. I . ME density when the information is in the form of percentile constraints.

In this correspondence we present a new method for unimodal density estimation which extends the maximum entropy technique to guarantee unimodality of the resultant density. This is done by using a method of Kemperman [3] for transforming moment problems, and coupling this with an information-theoretic generalization of Laplace’s famous “principle of insufficient reason” to motivate the use of a maximum entropy principle. The resulting density estimate is unimodal and is rendered in closed analytic form. It should be remarked that our method is also applicable to the situation of prior information which is not necessarily data-derived and can be used for developing a unimodal prior distribution for subsequent Bayesian analysis. This topic is pursued in a separate paper [4l. The information-theoretic approach to probability density estimation proposed in this correspondence is different from previous maximum entropy (ME) density estimation approaches (e.g., Parzen [2]). The ME density obtained in previous approaches is not guaranteed to be unimodal. Moreover, when certain percentiles are used as constraints, the resulting ME density could be a ‘‘lumpy’’ step function. Fig. 1 illustrates this point. We use the following four percentile constraints on an unknown random variable I-: Pr [ 3 5 I - 5 41 = Pr[G 5 I- 5 T ] = 0.13, Pr[O 5 I - 5 101 = 1.0, and Pr[O 5 I‘ 5 51 = 0.5, and estimate the density of I - via maximum entropy. Using raw moment constraints instead of percentile constraints makes the ME density smooth. However, the resulting ME density may still fail to be unimodal. Fig. 2, illustrates this point using four moment functions ( n ~ ( . r )= .r, n r ( . r ) = .rr, n 3 ( , r ) = .r3, and 0 1 ( . r . ) = .I.‘) and corresponding given raw moments (0, = 4.2963, $ 2 = 20.9492, fh = 108.1973, and 01 = 576.9984). As Fig. 2 illustrates, the ME density estimation approach may fail to provide unimodality in this situation as well. In Section I1 we present the information-theoretic estimation procedure. In Section 111 we show how to transform the problem of unimodal density estimation to that of an estimation involving a derived auxiliary variable. In Section IV we present the actual estimation procedure which ensures a unimodal density estimate. This procedure also guarantees that the resultant density has the collection of desirable characteristics which were constrained into the estimation

Fig. 2. ME density when the information is in the first four raw moments. process. In Section V, we provide some numerical results. A summary section completes the correspondence. and

11. MAXIMUM ENTROPY AND MDI DENSITY ESTIMATION The concept of statistical information and density estimation for numerical data is paramount in statistics, economics, engineering, signal detection, and other fields. Wiener [5] remarked quite early, in 1948, that the Shannon measure of information from statistical communication theory could eventually replace Fisher’s concept of statistical information for a sample. For instance, using a measure of information distance between two measures first developed by Kullback and Leibler [61 in 1951, following the work of Khinchin, it has been shown how to estimate the order in an autoregressive time series model, how to estimate the number of factors in a factor analysis model, and how to analyze contingency tables (cf. Akaike [ 7 ] , [8] and Gokhale and Kullback [9]). Minimizing this statistical “distance” subject to the given constraints is called “Khinchin-Kullback-Leibler (K’L) estimation,” “minimum cross entropy,” and “minimum discrimination information” (MDI) estimation in the literature. Mathematically, the problem is to pick that density function f which is “as close as possible” to some other given function g, and for which f satisfies certain given moment constraints, e.g.

-

i i i1i i i ./

f(.r)~n

(?)A

(I(

1.1

(d.r)

(1)

subject to

lo.(.r)f(.i.)Xiil.l.)=H,.

i = o . 1. 2 . ’ ” . k .

(2)

Here X is some dominating measure for f and ,q (usually Lebesgue measure in the continuous case, or counting measure in the discrete case), H I . . . . . H I are given moment values for the “moment func( . I , ) , and t r o ( . r ) = 1 = (-lo. A moment function tions” ( . I . ) . ... n ( . r ) may be used to generate moment or cumulant constraints, e.g., when ( I , is a polynomial, or may generate percentile constraints, e.g., when ( I , is an indicator function for an interval. In many applications there is no cz priori choice of the given distribution g which is immediately apparent to use in (1). In this case we express our ignorance by choosing all .I’ values to be equaly

IEEE TRANSACTIONS ON INFORMATION THEORY. VOL. 41. NO. 3. MAY 1995

826

likely, i.e., g ( x ) = 1. In this case the MDI objective function is of the form

This is precisely minus the entropy of the density, and the MDI problem becomes an ME problem. The ME criterion can be thought of as taking the most “uncertain” distribution possible subject to the given constraints. Accordingly. this principle of maximum entropy may be construed as an extension of the famous Laplace “principle of insufficient reason,” which postulates a uniform distribution in situations in which nothing is known about the variable in question. The minimization ( I ) subject to the given constraints (2) can be carried out by Lagrange multipliers. The short derivation given below can essentially be found in Guiasu [lo]. Introducing a Lagrange multiplier for each constraint in (2) and changing from a minimization to a maximization, we wish to maximize

and I are independent, and I- is uniformly distributed over [O. 11. A proof of this result can be found, for example, in Feller [ 13, p. 1581. From the above result, Kemperman [3] shows how to use the structural relationship between -1-and 1- to determine the moments of -1-from the moments of I-. Namely, it follows immediately that for any function h

€ / ) ( I - )= Eh*(.\-) where

h*(.l.)= E ( h ( C - S )1-1= ./’)) =

~1 1



IJ(f)df.

Our technique for solving the problem ( 1 ) with constraint set (2) and an additional unimodality constraint may now be explained as follows. If I- is unimodal with mode i n , then 1.- tu is zero unimodal. First we transform the given moment constraints on the variable Iinto moment constraints on the auxiliary variable S where the new moment functions for -1-are

If the mode is unknown, then a consistent estimator ~icmay be used, (See Sager [14] for such a nonparametric mode estimator.) We then solve the transformed MDI problem involving the constrained estimation of f.\-. Using the estimated -1-density we then transform it back to obtain the estimated density for 1.. If -1-is estimated by then 1- is estimated by / t i I-. and consequently is unimodal by Khinchin’s theorem. The details are given in the next section.

or equivalently

+

At,

Iv.

OBTAINING THE ESTIMATED DENSITY

By decomposing the original variable I - via the Khinchin representation as IT- I I J = I - . S,we are able to transform the constraint set (2) on I-’s density into constraints involving S.Namely, by Kemperman’s technique

0, = E ( n , ( I - )= ) E ( < I , ( l-where

=

The inequality follows since 1n.r 5 .r- 1 with equality only at ,r = 1. Thus the inequality becomes an equality when \

(4)

(I:(.,.)

Jli

+

?I,))

= E(ff:(S))

;lJ