A Parameter-free Kernel Design Based on Cumulative ... - IEEE Xplore

Report 9 Downloads 83 Views
A Parameter-free Kernel Design Based on Cumulative Distribution Function for Correntropy Jongmin Lee, Pingping Zhu, and Jose C. Principe

Abstract— This paper proposes a parameter-free kernel that is translation invariant and positive definite. The new kernel is based on the data cumulative distribution function (CDF) that provides all the statistical information about the observed samples. Without an explicit kernel size parameter, this novel kernel is used to define the autocorrentropy function, which is a generalized similarity measure, and spectral density estimator. Numerical examples show that the proposed method provides comparable performance to the existing Gaussian kernel with optimized kernel size.

I. I NTRODUCTION Iven a time series, the autocorrelation function is generally considered as a similarity measure between the data set and a shifted copy of the data. If the data is periodic, the autocorrelation function is also periodic. However, in many practical applications where non-Gaussianities and nonlinearities exist, there may be a need to use alternative measures of similarity. In [1] [2] [3], a generalized correlation was proposed, which is named correntropy and includes a sum of evenorder moments of the data when the Gaussian kernel is utilized. By analogy with the Wiener-Khinchin theorem [4] the Fourier transform of the autocorrentropy also exists yielding a new spectral estimator that is not restricted to second-order moments. But a free parameter, i.e., kernel size of the Gaussian kernel, should be carefully selected. In this paper we propose a new parameter-free kernel for the autocorrentropy function. The proposed kernel utilizes a translation invariant function that is based on the cumulative distribution function (CDF) of data. This allows the kernel to adjust its size automatically to the data statistics and to reflect the full statistical information of the given data. It is proved that the proposed kernel is translation invariant positive definite and that the correntropy spectral density with this kernel can be estimated by the Fourier transform. Numerical examples for the detection of a sine wave in different background noises show the autocorrentropy is superior to the autocorrelation function in non-Gaussian noise and close to it in Gaussian noise. Note that correntropy with this new kernel does not involve any free parameter, just like the autocorrelation. The rest of this paper is organized as follows. Section II describes related work, namely autocorrentropy with its advances and limitations. In Section III, we design a new

G

Jongmin Lee ([email protected]), Pingping Zhu ([email protected]), and Jose C. Principe ([email protected]) are with Computational NeuroEngineering Laboratory (CNEL), the Department of Electrical and Computer Engineering, University of Florida, Gainesville, Florida, United States. This work was partially supported by NSF IIS 0964197.

parameter-free kernel using CDF and convolution theorem. This kernel can be used for spectral density estimation. In Section IV, numerical examples are introduced to demonstrate the performance of the new kernel. Finally we conclude this paper in Section V. II. R ELATED W ORK Given a real strictly stationary random process {x(t), t ∈ T } with t denoting time and T being an index set of interest, the generalized autocorrelation as a similarity measure, called autocorrentropy function, is defined by Vx (t, s; σ) = E [κσ (xt , xs )]

(1)

where κσ (·, ·) is a positive definite kernel and σ is a kernel size parameter that is selected by the user. And the centered autocorrentropy function, which is the generalized autocovariance function, is defined by Ux (t, s; σ)

= Ext xs [κσ (xt , xs )] −Ext Exs [κσ (xt , xs )] .

(2)

When a translation invariant kernel such as Gaussian is utilized ! 2 1 (xt − xs ) κσ (xt , xs ) = √ exp − , (3) 2σ 2 2πσ the centered correntropy function Ux (t, s; σ) is a function of the lag τ = t − s. By analogy with the Wiener-Khinchin theorem the correntropy spectral density is defined [3] as Z ∞ S (ω; σ) = Ux (τ ; σ) e−jωτ dτ . (4) −∞

The Fourier transform of Ux (τ ; σ) projects the generalized energy of periodic patterns onto the corresponding eigenfunction e−jωτ with frequency ω. Note that the spectral representation is built from using all even moments of the random variable [3]. The correntropy with Gaussian kernel may extract the periodic structure from the given data much better than autocorrelation, when corrupted by non-Gaussian noise. However, a free parameter - the kernel size σ, which is inherent in most translation invariant kernel methods, should be carefully selected. In fact, correntropy with the Gaussian kernel is a function of two arguments: the lag and the kernel size that plays the role of a scale parameter for the similirity [3]. In [5] a triangular kernel that is parameter-free was introduced. The kernel normalizes the observed samples by the extreme values. This kernel is translation invariant and positive definite but it may not reflect the full statistical information of data.

1

1 0.8

CDFs of Z

0.8

CDFs of Z CDFs of X

CDFs of X CDF

F (z), γ=1

0.4

CDF

0.6

0.6

FX(x), γ=1

0.4

Z

0

0.2

FX(x), γ=5 FZ(z), γ=5 −20

−10

0 data

10

0

20

κcdf(x−x’) 0.8

0.4

κ(x−x’)

κ(x−x’)

κcdf(⋅), γ=1

FZ(z), γ=5 −20

−10

0 data

10

κcdf(⋅), γ=5

σ = 10

κg,σ(⋅), γ=5

σ = 2.6 0

20

κcdf(x−x’)

−20

−10

0 x−x’

10

20

κcdf(⋅), γ=1

0.4

κg,σ(⋅), γ=1 0.2 0

(a) α = 1.5

κcdf(⋅), γ=5

σ = 10 σ = 2.6 −20

−10

κg,σ(⋅), γ=5 0 x−x’

10

20

(b) α = 2.0

Fig. 1. CDFs of X and Z = −|X| (up) and the corresponding kernels (down). The CDF kernel is defined in Section III-B and compared with the Gaussian kernels that will be used in Section IV.

III. PARAMETER - FREE K ERNEL D ESIGN AND C ORRENTROPY S PECTRAL D ENSITY In this section we design a new kernel that eliminates the kernel size parameter while reflecting the full statistical information of data. First of all, a translation invariant function is generated from the data cumulative distribution function (CDF). Then, a positive definite kernel is formulated using the convolution theorem. Finally this new kernel is applied for correntropy and its spectral density estimation. Based on Wiener-Khinchin theorem, it is shown that this new spectral estimator exits. A. A Translation Invariant Function from the CDF The CDF contains the information about the full data’s amplitude distribution, hence it is a good candidate to design a new kernel. Let X be the observed value. We utilize the CDF of the negative absolute value of the data (i.e., Z = −|X|, −∞ < Z ≤ 0). Because the original data is modified by the absolute value, the slope of the CDF is changed and the kernel bandwidth is basically controlled by the mid range of the CDF. The CDFs of X and Z are illustrated in the upper panel of Fig. 1 assuming that X is zero-mean and symmetric. Even if data is neither zero-mean nor symmetric, we may still use this after subtracting the mean or the median of the data. Based on the CDF of Z, we define a function g : X ×X → R such that Z −|x−x0 | 0 g (x, x ) = fZ (z) dz −∞

=

This is referred as a translation invariant function since g is independent of the absolute position of x but instead depend only on x − x0 [6]. B. Translation Invariant Positive Definite Kernel

0.6

κg,σ(⋅), γ=1 0.2

(6)

FX(x), γ=5

1

1 0.8 0.6

g(x, x0 ) = g(x − x0 ).

FZ(z), γ=1

FX(x), γ=1 0.2

The negative side of g(x, x0 ) is illustrated in the upper panel of Fig. 1 for various distributions (the positive side is symmetric). The function g(x, x0 ) in (5) only depends on the difference between x and x0 :

FZ (− |x − x0 |)

According to Bochner’s theorem and its Fourier criterion, the translation invariant function g(x, x0 ) = g(x − x0 ) is a positive definite kernel if the Fourier transform is nonnegative [7]. Moreover the convolution in the original space becomes the product in the Fourier domain. g(x − x0 ) is not positive definite, therefore we need to design a new kernel which is a positive definite function: κ(x, x0 ) = κ(x − x0 ) = (g ⊗ g)(x − x0 )

(7)

where ⊗ denotes convolution operator and g(x − x0 ) is defined in (5). Let F denote the Fourier transform. F[g(x−x0 )] is real because g(x − x0 ) is real and even. And F[κ(x − x0 )] is nonnegative: F [κ (x − x0 )]

= F [(g ⊗ g) (x − x0 )] = F [g (x − x0 )] · F [g (x − x0 )] ≥ 0. (8)

Therefore, κ(x, x0 ) in (7) is a translation invariant, positive definite, and parameter-free kernel. C. Correntropy Spectral Density Estimation Using the parameter-free CDF kernel of (7) with lag index τ = t − s of a random process x(t), the autocorrentropy function in (1) is redefined as Vx (τ )

=

E [κ (xt − xs )]

=

E [(g ⊗ g) (xt − xs )]

(9)

where g(xt − xs ) = FZ (− |xt − xs |). Also, the centered autocorrentropy function in (2) becomes Ux (τ )

= Ext xs [(g ⊗ g) (xt − xs )] −Ext Exs [(g ⊗ g) (xt − xs )].

(10)

The correntropy spectral density still given by (4) remains the same except for σ: Z ∞ S (ω) = Ux (τ ) e−jωτ dτ . (11) −∞

D. Existence of Correntropy and its Spectral Density (5)

where fZ (z) is the probability density function of Z. When x = x0 , g(x, x0 ) = 1. As |x − x0 | → ∞, g(x, x0 ) → 0 on both the negative and the positive values of x − x0 .

We further show that for any strictly stationary random process the correntropy exists and so does its spectral density by the Fourier transform. The absolute value of the correntropy with the probability density function f can be bounded

as follows Z ∞ κ(x)f (x)dx |Vx (τ )| = Z −∞ ∞ ≤ |κ(x)f (x)| dx (12)

where κ(x) is a translation invariant positive definite kernel and x is an arbitrary random variable of the stationary random process at lag τ . Both of Gaussian and CDF kernels are finite. When κ is the Gaussian kernel, the correntropy always exists because the Gaussian function is absolutely integrable (L1 ) so that correntropy values are finite. We now show that the CDF kernel we built also exists in the L1 sense. Z ∞ kκ(x)k1 = |(g ⊗ g) (x)|dx  Z Z−∞ ∞ ∞ g (u) g (x − u) du dx = −∞ Z  Z−∞ ∞ ∞ = g (u) g (x − u) dx du −∞ −∞ Z  Z ∞  ∞ = g (u) du g (x) dx (13) −∞

spectral density (normalized)

kκ(x)k kf (x)k

product kernel gaussian kernel, σ=2.6 gaussian kernel, σ=10 CDF kernel σ = 10

−1

10

σ = 2.6

α=1.5, γ=1 0

10 spectral density (normalized)

−∞

product kernel

σ = 2.6

−1

Gaussian (σ=10) & CDF kernels

10

α=1.5, γ=5 0

0.1

0.2 0.3 normalized frequency

0.4

0.5

−∞

(a) α = 1.5

FT (ensemble avg. of 50 estimates)

0

10

(14)

−∞

where Z = −|X| and −∞ < Z ≤ 0. Since the integral of FZ (z) is finite, the CDF kernel is absolutely integrable (L1 ). And thus the correntropy with the CDF kernel also always exists. When one is interested in spectral estimation, another issue is the integrability of correntropy over the lags, which is a sufficient condition for the Fourier transform to exist. Practically this case is not important since we always use finite windows, so the integral always exists. Our experience shows that for large lags the correntropy decays faster than the autocorrelation function, what can be expected because of the kernel that decreases on average when the two values of the argument are different. However we have not been able to prove this fact mathematically. The only step that is mandatory for correntropy is to subtract its mean value that is always positive because of the positive nature of the kernel, i.e., we should always work with the centered correntropy for spectral estimation.

spectral density (normalized)

where g is defined in (5), the integral of g in (13) is Z ∞ Z 0 g (x) dx = 2 FZ (− |x|) dx −∞ −∞ Z 0 = 2 FZ (z) dz

α=2.0, γ=1

product kernel gaussian kernel, σ=2.6 gaussian kernel, σ=10 CDF kernel CDF kernel

σ = 2.6

−1

10

gaussian (σ=10) & product kernel 0

10

α=2.0, γ=5 spectral density (normalized)



FT (ensemble avg. of 50 estimates)

0

10

σ = 2.6

σ = 10

CDF kernel −1

10

product kernel

0

0.1

0.2 0.3 normalized frequency

0.4

0.5

IV. N UMERICAL T ESTS We present numerical examples to demonstrate the performance of the CDF kernel. The observed data x(t), from which we estimate the spectral density, is a sum of a real sinusoidal u(t) = a1 sin(ωt) and zero-mean symmetric white noise n(t) that can be either Gaussian or α-stable distributed noise. The amplitude a1 depends on the signal-to-noise level

(b) α = 2.0 Fig. 2. Normalized spectra estimates in various noises. GSNR=-3dB. ω = 0.1. Due to different scales of kernels, each spectral density is normalized by the highest peak. The ensemble average over 50 estimates are presented. The product kernel indicates autocorrelation. The other kernels are for autocorrentropy. Note that the estimates of the CDF kernel do not change much in various noise environments while the others varies a lot.

Fig. 3 shows the results of a simple statistical test that is identifying the presence of a sine wave in noise. The binary hypothesis is considered: H0 : x(t) = n(t) and H1 : x(t) = u(t) + n(t). The presence of u(t) at frequency ω is detected at low GSNR. The decision statistic ζω is obtained from the peak value of the spectrum density at ω. The probability of detection Pd (ω ) = P r(ζω > ω |H1 ) and the probability of false alarm Pf a (ω ) = P r(ζω > ω |H0 ) are compared for the likelihood ratio test with a certain threshold ω that is determined by Pf a . For the sine wave with normalized frequency ω = 0.1 and the different noises, 10,000 Monte Carlo experiments were performed. The presence of the signal at the desired frequency ω may not be detected with the autocorrelation when noise power spectrum is too high. Especially, in Fig. 3(a) the ROC curves show the autocorrelation function could not detect well because of impulsiveness and higher noise floor. One can relate these results with spectral estimation in Fig. 2(a). The correntropy with Gaussian kernel achieves better detection probability if we select the appropriate kernel sizes: σ = 2.6 for (α = 1.5, γ = 1); about σ = 10 for (α = 1.5, γ = 5). However, the CDF kernel eliminates such a free parameter while it is close to the highest detection probability regardless of the distribution of the observed data. Even when the noise is Gaussian, the CDF kernel shows comparable performance as illustrated in Fig. 3(b). In order to provide more insight about the performance of CDF kernel, the upper panel of

σ = 2.6 Detection Probability, Pd

0.9

CDF kernel

0.8

α=1.5, γ=1

0.7 0.6

product kernel Gaussian kernel (σ = 2.6) Gaussian kernel (σ = 10) CDF kernel triangular kernel

0.5 0.4 0.3 1

Detection Probability, Pd

σ = 10 0.9

CDF kernel

0.8

α=1.5, γ=5

0.7 0.6 0.7 0.69 0.68

0.5 0.4 0.3

0.06 0

0.2

0.08

0.4 0.6 0.8 False Alarm Probability, Pfa

1

(a) α = 1.5

1 product &

Gauss. kernel Detection Probability, Pd

Spectral densities are estimated for the normalized frequency ω = 0.1 in four different noises: (α = 1.5, γ = 1), (α = 1.5, γ = 5), (α = 2, γ = 1), and (α = 2, γ = 5). Fig. 2 compares the correntropy spectral densities of the CDF kernel with those of other kernels. We can compare the relative height between the peak and noise floor level. When the symmetric α-stable noise is (α = 1.5, γ = 1), the CDF kernel and the Gaussian kernel with σ = 2.6 can estimate lower noise floor than others. However, for γ = 5 the kernel with σ = 2.6 is no longer the best but σ = 10 is better. When the noise is Gaussian distribution (α = 2.0) in Fig. 2(b), the product kernel is the best for both of γ = 1 and γ = 5. While the Gaussian kernel with larger kernel size is close to the best one, the kernel with σ = 2.6 for γ = 5 leads to poor performance. But the CDF kernel automatically adjust the kernel size so that it can be quite close to those best kernels no matter the observed data distribution or scale is. Here the kernel sizes σ = 2.6 and 10 were considered after scanning over a range as shown in the upper panel of Fig. 4.

1

0.95 (σ = 10) 0.9

α=2.0, γ=1

CDF kernel

0.85

product kernel Gaussian kernel (σ = 2.6) Gaussian kernel (σ = 10) CDF kernel triangular kernel

0.8 0.75 0.7 1

Detection Probability, Pd

in the tests and ω is known. A simple binary hypothesis testing is considered to see the presence of the signal u(t) that is buried in impulsive nose with low signal-to-noise ratio. We define the generalized SNR (GSNR) due to αstable noise with α < 2: GSN R = Pu /γ 2 where Pu and γ are respectively the power of u(t) and scale parameter (or dispersion ratio) of α-stable noise. When α = 2.0, the αstable noise is equivalent with Gaussian noise whose variance is 2γ 2 .

product kernel

0.95

σ = 2.6 α=2.0, γ=5

0.9 0.85 0.88

0.8

0.87 0.75 0.7

0.14 0.16 0

0.2

0.4 0.6 False Alarm Probability, Pfa

0.8

1

(b) α = 2.0 Fig. 3. ROC curves. GSNR = -12dB. Triangular kernel, which is also parameter-free and briefly described in Section II, is compared.

0.85

Detection Probability, Pd

0.8

σ = 2.6

0.75

σ = 10

σ = 6.8

0.7 0.65 0.6 α = 1.5, γ = 1 α = 1.5, γ = 5 α = 1.3, γ = 3

0.55 0.5

0

5

10 15 Kernel Size, σ

20

25

1

σ = 6.8

Detection Probability, Pd

0.9 0.8

CDF kernel

α=1.3, γ=3

0.7 0.6

product kernel Gaussian kernel (σ = 2.6) Gaussian kernel (σ = 6.8) CDF kernel triangular kernel

0.5 0.4 0.3

0

0.2

0.4 0.6 False Alarm Probability, P

0.8

1

fa

Fig. 4. Detection probability vs. kernel size for Gaussian kernel in three different noise distributions when Pf a = 0.1 (up). To compare with Fig. 3(a), the ROCs for the case of (α = 1.3, γ = 3) are also presented (down).

Fig. 4 illustrates the optimized kernel sizes for the detection, and the lower one compares the CDF kernel with others in another non-Gaussian noise environment (α = 1.3, γ = 3). V. C ONCLUSION We have designed a new CDF-based kernel. The CDF contains all the statistical information about the given data so that it can allow the new kernel to automatically adjust the kernel size parameter to the data statistics. It has shown that the kernel is translation invariant, positive definite, and parameter free. The autocorrentropy function based on this kernel can estimate spectral densities with comparable performance to the existing methods such as the autocorrelation and the autocorrentropy that adopts the Gaussian kernel with optimized kernel size. Further theoretical work is needed to study the conditions under which the correntropy spectral estimation exists, and if the CDF kernel includes the sum of moments of the data statistics. But overall this methodology is ready to be applied for nonparametric spectral estimation applications. Recall that correntropy spectral estimation is not limited by the use of second-order moments as the conventional power spectrum. R EFERENCES [1] I. Santamaria, P.P. Pokharel, and J.C. Principe, “Generalized correlation function: definition, properties, and application to blind equalization,” Signal Processing, IEEE Transactions on, vol. 54, no. 6, pp. 2187 – 2197, june 2006.

[2] Weifeng Liu, P.P. Pokharel, and J.C. Principe, “Correntropy: Properties and applications in non-gaussian signal processing,” Signal Processing, IEEE Transactions on, vol. 55, no. 11, pp. 5286 –5298, nov. 2007. [3] J. C. Principe, “Information theoretic learning: Renyi’s entropy and kernel perspective,” New York: Springer, 2010. [4] H. Stark and J.W. Woods, Probability and Random Processes With Applications to Signal Processing, Prentice Hall, 2002. [5] S. Seth, M. Rao, Il Park, and J.C. Principe, “A unified framework for quadratic measures of independence,” Signal Processing, IEEE Transactions on, vol. 59, no. 8, pp. 3624 –3635, aug. 2011. [6] Bernhard Scholkopf and Alexander J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA, USA, 2001. [7] S. Bochner, “Lectures on fourier integral,” Princeton Univ. Press, Princeton, New Jersey, 1959.