Reconstruction of Aperiodic FRI Signals and ... - COMONSENS

Report 3 Downloads 118 Views
Reconstruction of Aperiodic FRI Signals and Estimation of the Rate of Innovation Based on the State Space Method

A. Erdozain, P. M. Crespo CEIT and Tecnun (University of Navarra) Manuel de Lardiz´ abal 15, 20018 San Sebasti´ an, Spain

Abstract In the absence of observation noise, it is known that it is possible to develop exact sampling schemes for a large class of parametric nonbandlimited signals, namely, certain signals of finite rate of innovation (FRI signals), either periodic or aperiodic, such as streams of Diracs, nonuniform splines or piecewise polynomials. A common feature of such signals is that they have a finite number of degrees of freedom per unit of time and they can be reconstructed from a finite number of uniform samples of the filtered signal. Unfortunately, the accuracy of such reconstruction substantially degrades when the samples are distorted by noise. For the case of periodic FRI signals, good algorithms based on the state space method have been proposed which are robust against noise. However, in the case of aperiodic signals, these algorithms may still fail to accurately reconstruct the signals due to ill-conditioning problems that often arise in the matrices involved. This paper proposes a new reconstruction method for aperiodic FRI signals that is also based on the state space method but has a considerably better numerical conditioning than previous reconstruction algorithms. This advantage is achieved by using a frequency domain formulation.

Preprint submitted to Elsevier Science

22 October 2010

Key words: Signals with finite rate of innovation, State Space method, Reconstruction algorithm

1

Introduction

The sampling of continuous-time waveforms was first studied by H. Nyquist in 1928 [1]. The criterion that inherits his name and proved in 1948 by Shannon [2] states that the original continuous-time signal x(t) can be perfectly reconstructed if the sampling rate fs = 1/T > 2B, where B is the signal bandwidth and T is the sampling period. If the sampling is performed following this criterion, the original signal can be reconstructed by the Whittaker-Shannon interpolation formula [3]: x(t) =

∞ X

µ

x(nT )sinc

n=−∞



t −n , T

(1)

where sinc(t) = sin(πt)/πt. Therefore, from (1) it can be deduced that the signal x(t) has 2B degrees of freedom per unit of time. However, not all the signals with finite degrees of freedom - the number of degrees of freedom per unit of time is widely known as rate of innovation ρ - can be directly sampled with no loss of information, due to their infinite bandwidth. For instance, a Dirac pulse has two degrees of freedom (its time position and its amplitude), but as it has infinite bandwidth, when uniform sampling is directly performed over the continuous signal, the data contained in the signal is lost. Nevertheless, in [4] it is proved that a group of nonEmail addresses: [email protected] (A. Erdozain), [email protected] (P. M. Crespo).

2

bandlimited parametric signals, either periodic or aperiodic, like streams of Diracs, nonuniform splines or piecewise polynomials, can be reconstructed from their uniform samples if the signal has been filtered by a bandlimited kernel function h(t) before being sampled: z[n] = z(nT ) = x(t) ∗ h(t)|t=nT =

Z ∞ −∞

h(τ − nT )x(τ )dτ,

x(t) = T ({z[n]}), where n = 0, . . . , N −1, z(t) is the filtered version of the original signal x(t) and T denotes the corresponding reconstruction transformation needed to recover x(t) from the samples {z[n]}. Regarding the kernel functions, Gaussian and sinc functions are the most common, but in [5] a wider set of suitable h(t) functions can be found. The estimation procedure proposed in [4] is based on the use of annihilating filters [6]. Depending on whether the Finite Rate of Innovation (FRI) signals - namely the streams of Diracs, nonuniform splines and piecewise polynomials - are periodic or nonperiodic, a frequency domain or a time domain formulation is used, respectively. Unfortunately, in either case, the accuracy of the reconstruction substantially degrades when the samples are distorted by noise. In this context, the authors in [7] proposed a new reconstruction scheme to reduce the estimation error in presence of noise. It is based on the state space method, scheme first introduced in array processing and spectral estimation [8,9]. This method was elaborated for both periodic and aperiodic x(t) signals by using again a frequency domain or a time domain formulation, respectively. Although the method reduces the estimation error when compared to [4], the procedure could run into computational problems when considering the case of aperiodic FRI signals, because in the time domain formulation the pro3

cedure could give rise to ill-conditioned matrices. This paper proposes a new reconstruction method for aperiodic FRI signals, also based on the state space method, but which has a considerably better numerical conditioning than the method in [7]. This advantage is achieved by using a frequency domain formulation. Mention also that, in the specific case of aperiodic streams of Diracs, there exist other algorithms for the reconstruction [10,11], but they require more computational time and can show convergence problems due to their stochastic nature. The outline of the paper is as follows. In Section 2 the state space method is revised as the basis for the methods proposed in [7] and in this work. In Section 3, the reconstruction of aperiodic FRI signals is analyzed. The disadvantage of the procedure introduced in [7] will be justified and the new proposed reconstruction procedure explained. Although we will first focus on streams of Diracs, we will also extend the reconstruction method to work with nonuniform splines and piecewise polynomials. Finally, Section 4 shows simulation results for the proposed reconstruction method and we end with a concluding discussion in Section V.

2

Preliminaries: The State Space Method

Consider the following data sequence xl =

K X

|bk |e((αk +jwk )l+jφk ) =

k=1

K X

bk zkl ,

(2)

k=1

with l = 0, . . . , N − 1. The aim of the state space method is to recover the parameters {bk } and {zk } from the data sequence {xl }. The advantage of this estimation scheme is that when noise is added to the data sequence, i.e. 4

yl = xl + nl , the error in the estimated parameters is lower than that achieved by other estimation procedures such as the polynomial method [12] or the annihilating filters method [6]. The proof of the state space method can be found in [7]. However, in what follows, we give an alternative new proof based on the matrix pencil method [13] that we believe is more concise. To that end, let us define the Hankel matrix



      X=     

x0

x1

···

x1

x2

· · · xL+1

.. .

.. .

..

.

xL

.. .

     ,     

(3)

xN −L−1 xN −L · · · xN −1

where L is an arbitrary parameter bounded by K < L < N/2. Notice that the rank of X is K, since it can be decomposed as the product X = LSRH , k where L is a (N − L) × K Vandermonde matrix with entries {Lm,k = zm−1 },

S = diag{b1 , b2 , . . . , bK }, and R is the modified Vandermonde matrix 1 L with L rows instead of N − L. Let X = UΛVH be its singular value decomposition (SVD), where the singular values of the diagonal matrix Λ are placed in a non-increasing order. Define Up as the (N − L) × K matrix formed by selecting the first K columns of matrix U, i.e., by only keeping the K left-singular vectors of X corresponding to its nonzero singular values. Theorem 1 The values {zi }K i=1 can be calculated as the eigenvalues of the matrix Up + Up , where Up and Up are obtained from Up by deleting the first and the last row, respectively. 1

The superscripts H and + indicate the conjugate transpose and the pseudo-inverse

of a matrix, respectively.

5

PROOF. The rank of matrices L and Up is K (K ≤ N − L), consequently, there exists a non-singular K × K matrix P such that Up = LP. From the fact that L = L · Z, with Z=diag{z1 , z2 , . . . , zK }, LP = LP and LP = LP, one obtains LP = LP · (P−1 ZP).

(4)

Based on this equality, the matrix 2 Up − Up z, where z ∈ C is an arbitrary parameter, can be rewritten as Up − Up z = Up (P−1 ZP − IK z) = Up (P−1 ZP − P−1 IK Pz) = = Up · P−1 (Z − IK z)P.

(5)

This expression shows that when z takes the value zi , i = 1, . . . , K, the rank of Up −Up z is reduced from K to K −1. Therefore, the equation (Up −Up z)q = 0 will have a solution q ∈ CK other than the null vector as long as z = zi , i ∈ {1, . . . , K}. Left multiplying this equation by the pseudoinverse Up + , one obtains Up + Up q = Up + Up zq = zq, showing the fact that the nonzero eigenvalues of Up + Up are {zi }K i=1 , as we wanted to prove.

Let us now consider the more realistic situation where the observation samples are distorted by an additive white Gaussian noise (AWGN), yl = xl + nl ,

l = 0, . . . , N − 1,

(6)

where {nl } are i.i.d Gaussian random variables with zero mean and variance σ 2 . From the data sequence {yl : l = 0, . . . , N − 1}, one obtains the matrix Y in the same way as X. 2

The matrices defined as A − zB, where A, B ∈ Rn×m are any two matrices and

z ∈ C, are a special case of pencil matrices.

6

However, due to the noisy observations, the rank of Y, say J, (i.e., the number of its nonzero singular values) will in general be greater than K. Notice that if J, instead of K, principal left-singular vectors of Y were used to construct Up , this would be tantamount to assuming that a noiseless x(t) realization with J Dirac pulses, rather than its actual K pulses, has been produced. Consequently, the algorithm would be misled and wrong estimates will likely be obtained. To overcome this problem, and assuming that K is known, only the K left-singular vectors corresponding to the K largest singular values of Y are kept to construct the above Up matrix, denoted by Ut hereafter. K Then, by the State Space Method, estimates {zbi }K i=1 of the {zi }i=1 values are

obtained by computing the K eigenvalues of matrix Ut + Ut . Notice that the above step is equivalent to starting the algorithm with matrix Yt instead of Y, where Yt is the unique K rank matrix that minimizes the matrix 2-norm ||Y − Yt ||2 . This is easily deduced by realizing that its solution [14] is given by Yt = Ut · diag{Σt } · VH t , where Ut is the previous defined matrix, Vt is formed by the K principal right-singular vectors of Y, and diag{Σt } ∈ RK×K is the diagonal matrix with the K largest singular values of Y. Finally, the estimates of the {bk } can be obtained as follows. The previously estimated values {zbi }K i=1 are substituted in expression (2). Then, from expression (6), one obtains the linear regression yl =

K X

bk zbkl + nl ,

l = 0, . . . , N − 1,

K=1 N −1 where the {nl }l=0 are the observation errors. The most common approach K to estimate {bi }K i=1 from the observations {yl }i=1 is the least squares (LS)

method, which gives b = (TH T)−1 TH y, b

7

(7)

b = [b where b b1 , . . . , bbK ]T (bbi is the estimate of bi ), y = [y0 , . . . , yN −1 ]T , and

T is a matrix with entries {Tm,k = zbkm }. This solution minimizes the error ky − Tbk22 .

3

Reconstruction of aperiodic signals with finite rate of innovation

In this section, first, the reconstruction method of aperiodic signals with finite rate of innovation proposed in [7] is described. The problem we look at is the following. Assume that the signal of interest x(t) is a stream of K Diracs, x(t) =

K X

ck δ(t − tk ).

(8)

k=1

This signal is filtered by a kernel function before being uniformly sampled. The goal is to recover from a noisy version of these samples the amplitudes {ck } and the time instants {tk }. When the observation samples are noiseless, the unknown parameters, {ck }, {tk }, can be perfectly recovered by reducing the problem to one as in expression (2). For this purpose, the given x(t) is filtered with a Gaussian kernel function, 2 /2σ 2

h(t) = e−t

.

(9)

The noiseless samples can then be written as z[n] = z(nT ) =

K X

2 /2σ 2

ck e−(tk −nT )

k=1

=

K X

2

2

2

ck e−tk /2σ · entk T /σ · e−n

2 T 2 /2σ 2

2

with n = 0, . . . , N −1. By defining the variables zk = etk T /σ , un = z[n]en 2

, (10)

k=1 2 T 2 /2σ 2

2

and ak = ck e−tk /2σ , yields un =

K X

2

ak entk T /σ =

k=1

K X k=1

8

ak zkn ,

(11)

that is, a data sequence with the structure defined as in (2). The {zk } values can now be obtained by the state space method. As described before, once the {zk } are known, one can use the LS method to solve for the amplitudes {ck }. Notice that an ill-conditioned problem may arise in the state space method when N is relatively large. The reason being that the un values for n close to N are extremely large, since the samples z[n] are multiplied by en instance, for n = 100 and σ = 8T , en

2 T 2 /2σ 2

2 T 2 /2σ 2

. For

= 8.5 · 1033 . On the contrary, for

small n, the corresponding un values are many orders of magnitude lower. This may result in a Hankel matrix that is ill-conditioned due to the disparity in the values of its entries. In order to mitigate these problems, the next section proposes a frequency domain based method which avoids the use of these en

2 T 2 /2σ 2

factors.

Other serious effect introduced by the en

2 T 2 /2σ 2

factors arises when the observa-

tion samples z[n] are noisy, since then the accuracy of the state space method may be degraded. Notice that in this case, the noise samples get amplified by 2 T 2 /2σ 2

the values en

, increasing thus their variance. This in turn may cause

that the dominant singular values may not necessarily belong to the signal space. The authors in [7] try to overcome this problem by a scaling technique so that the resulting noise variance is as uniform as possible, and then using the SVD for noise suppression. The solution proposed in next section avoids the use of these en

2 T 2 /2σ 2

factors, and consequently does not require the use

of the cumbersome procedure proposed in [7] to scale the noise variables. 9

3.1 The proposed reconstruction method

Consider the Discrete Time Fourier Transform (DTFT) of the infinite sample sequence {z[n] = z(nT )}∞ n=−∞ (10), that is, ∞ X

Z(ejω ) =

z[n]e−jωn .

(12)

n=−∞

It is well known that Z(ejω ) =

∞ 1 X Za (jω/T − j2πl/T ), T l=−∞

(13)

where Za (jΩ) is the Continuous Time Fourier Transform (CTFT) of the filtered signal z(t), Za (jΩ) =

Z ∞ −∞

K √ X 2 2 ck e−jΩtk . z(t)e−jΩt dt = Ha (jΩ)·Xa (jΩ) = σ· 2π·e(−σ Ω /2) · k=1

(14) If Za (jΩ) = 0 for |Ω| > B and B ≤ π/T , no aliasing will occur and consequently Z(ejω ) =

1 Z (jω/T ) T a

in the interval −π/T ≤ ω ≤ π/T . By realizing

that the Gaussian function is negligible beyond 3.5 times its standard deviation (only 2 · 10−5 % of its total energy is beyond this time), and that Za (jΩ) (14) has Gaussian form, the following B is considered: 3.52 3.5 σ2B 2 = → B= . 2 2 σ

(15)

On the other hand, if the support of z(t) is such that z(nT ) = 0 for n < 0 −1 and n > N − 1, then the Discrete Fourier Transform (DFT) of {z(nT )}N n=0

coincides with the samples of the DTFT Z(ejω ), ZDF T [l] =

N −1 X

Ã

−j 2πnl N

z(nT )e



= Z(e )|w= 2πl N

n=0

2πl 1 = Za T NT

!

where the last equality holds under the assumption of no aliasing. 10

,

(16)

Notice that the signal z(t) is the sum of scaled versions of the kernel functions h(t) centered at the time instants ti , i = 1, . . . , K, (see (10)). Consequently, if the set {z(nT )} must satisfy the above time constraint, then the first and last Dirac pulses of x(t) must be located at least ta seconds behind of z(−T ) and ta seconds before z(N T ) (t1 > ta − T and tK < N T − ta ), respectively, where ta should be large enough for the kernel to be negligible. Again, since the kernel is Gaussian and its values can be considered negligible beyond t = 3.5σ (values lower than the 0.2% of its maximum value), a good compromise 3 is to set ta = 3.5σ, as corroborated in the results section. 3.5 σ

Therefore, if both conditions


3.5σ are satisfied, expression

(16) yields, Ã

1 2πl ZDF T [l] ' Za T NT

!

K 2 √ ltk 1 −σ 2 2π 2 l 2 X (N T ) · NT ck |e−j2π · σ · 2π · e {z } T k=1 l

=

(17)

zk

for l = −dN/2e + 1, . . . , bN/2c, where the ' sign should be understood as a tight approximation of the exact expression under the above conditions. By defining Z M [l] as Z M [l] =

ZDF T [l] +σ2 2π2 l2 /(N T )2 √ ·e · T, σ · 2π

(18)

expression (17) can be written as Z M [l] '

K X

ltk

ck · e−j2π N T =

k=1

K X

ck · zkl .

(19)

k=1

Observe that this system of equations has the same form as in (2). Therefore, the {zk } values, and in turn the time instants {tk }, can be solved by the 3

This is the reason why a Gaussian kernel is employed instead of a sinc kernel.

The sinc function decreases inversely proportional to the time and consequently, higher ta values will be required.

11

state space method, using in this case a Hankel matrix with entries Z M [l]. The advantage of having used a frequency domain formulation is that this Hankel matrix is not ill-conditioned as it often happens when solving the scheme proposed in [7] and showed in (11). This statement is justified in the Appendix. In case the samples are distorted by an additive noise, that is, y[n] = z[n] + w[n],

(20)

then the entries of the Hankel matrix Y are given by Y M [l] = Z M [l] + W M [l] =

K X

ck · zkl + T ·

k=1

WDF T [l] +σ2 2π2 l2 /(N T )2 √ ·e , σ 2π

(21)

−1 where WDF T [l] denotes the DFT component of the noise sequence {w[n]}N n=0 .

It will be assumed that the {w[n]} are i.i.d Gaussian random variables, w[n] ∼ N (0, σw2 ). Let us define the instantaneous signal-to-noise ratio (SNR) of the Y M [l] sequence as SNR[l] =

E [Z M [l]Z M [l]∗ ] , −dN/2e ≤ l ≤ bN/2c. E [W M [l]W M [l]∗ ]

(22)

Obviously, in order to reduce the estimation error, one should use all the available data Y M [l], l = −dN/2e + 1, . . . , bN/2c, in Y whenever their SNR[l] is high. However, if only a subset of the {Y M [l]} has a SNR significantly large, then the Hankel matrix should be formed by only considering this subset. The inclusion of the Y M [l] with small SNR[l] will not reduce the estimation error, on the contrary, they are likely to worsen it (this fact will be corroborated by simulation results later). This is what happens in our case, since as shown in the Appendix, the SNR[l] decreases exponentially with |l|2 and consequently min only a subset of the Y M [l], let say {Y M [l]}l=l l=−lmin , has a SNR[l] significantly

large. In this case, the Hankel matrix should be formed by only considering 12

this subset. That is, matrix Y should be 

 M [−l lim ]

M [−l

Y lim + 1]  Y    Y M [−l + 1] Y M [−l + 2]  lim lim Y=  .. ..  . .    Y M [0 + L]

Y M [1 + L]

··· Y

M [0

··· Y

M [1

..

.

···

− L] 

   − L]  ,  ..  .   

(23)

Y M [llim ]

where llim ≤ N/2 and L ∈ Z can be arbitrarily chosen as long as llim − L + 1 ≥ K and llim + L + 1 ≥ K. Notice that the optimum llim value depends on its corresponding SNR[llim ], or equivalently on the value the Gaussian function e−σ

2 2π 2 l2 /(N T )2

takes on

l = llim . Consequently, the optimum llim will vary with the variance of the Gaussian function, namely with σ, N and T . To avoid this dependency, we define the value nc by normalizing the llim index by the standard deviation of the Gaussian function: nc =

2σπllim . NT

(24)

Given an experiment, the optimum nc value is obtained empirically by simulation. We now summarize the proposed method: (1) Compute the SVD of Y, i.e., Y = UΛVH , with the singular values placed in non-increasing order. (2) Construct matrix Ut by keeping only the K first columns of matrix U. (3) The signal poles zk = e−j2πtk /N T are estimated by the eigenvalues of the matrix Ut + Ut . (4) From the estimated {zi }K i=1 values, compute the estimation of the amplitudes {ci }K i=1 by the LS method applied to the Vardermonde system in (19), as described in Section 2. 13

The most computationally intensive step in the algorithm is the SVD calculation 4 , so the computational complexity of the proposed algorithm is O(N 3 ), where g(N ) = O(f (N )) means that there exist a constant a and an integer n0 such that g(n) ≤ af (N ), for all N > n0 .

3.2 Other signals with finite rate of innovation

We now extend the proposed method to other FRI signals: nonuniform splines and piecewise polynomials.

3.2.1 Nonuniform Splines A signal x(t) is a nonuniform spline of degree R if its (R+1)-th derivative is a stream of weighted Diracs, that is, K X

x(R+1) (t) =

ck δ(t − tk ).

(25)

k=1

In this case, if x(t) is filtered by h(R+1) (t) instead of by h(t), where h(t) is the Gaussian kernel defined in (9), yields z(t) = x(t) ∗ h(R+1) (t) =

K X

2 /2σ 2

ck e−(t−tk )

(26)

k=1

where we have used the fact that x(t) ∗ h(R+1) (t) = x(R+1) (t) ∗ h(t). Therefore, the CTFT of this filtered signal is again given by expression (14). Conse−1 quently, if we denote the DFT of {z(nT )}N n=0 as {ZDF T [l]}, and the conditions 3.5 σ

4


3.5σ are satisfied (see Section 3.1 for the definition of ta ),

Notice that the pseudo-inverse of a matrix can also be computed by a SVD.

14

then the approximation in (17) still holds. That is, 2

T · e(+σ 2π √ Z [l] = ZDF T [l]· σ 2π M

2f 2)

'

K X

ck zkl ,

l = −dN/2e+1, . . . , bN/2c, (27)

k=1

tk

where zk = e−j2π N T . Therefore, we will be able to estimate {ck } and {tk } by the state space method by constructing the Hankel matrix with the Z M [l] values. In case the samples are distorted by noise, the same procedure as for the streams of Diracs will be valid.

3.2.2 Piecewise polynomials A signal x(t) is a piecewise polynomial if its (R + 1)-derivative is a stream of differentiated Diracs, that is, x

(R+1)

(t) =

K RX k −1 X

ckr δ (r) (t − tk ),

(28)

k=1 r=0

where δ (r) (t) denotes the rth derivative 5 of δ(t), and Rk ≤ R is the maximum degree of the Dirac pulse derivative at time tk . If x(t) is filtered by the (R+1)-derivative of the Gaussian kernel h(t) defined in (9), again from the property x(t) ∗ h(R+1) (t) = x(R+1) (t) ∗ h(t), we get z(t) =

K RX k −1 X

ckr (−1)r h(r) (t − tk ).

(29)

k=1 r=0

In this case, if the samples of Za (jΩ) have to be approximated by the DFT of −1 {z(nT )}N n=0 , then the value of ta has to be slightly higher than 3.5σ and the

maximum sampling period slightly lower than in the case of streams of Diracs. The reason is that the Gaussian functions are now multiplied by polynomials 5

The rth derivative of the Dirac δ(t) is such that

R

f (t) · δ (r) (t − t0 )dt =

(−1)r f (r) (t0 ), where f (t) is a r times continuously differentiable function.

15

of degree R: Ã (r)

h (t) =

−1 √ (σ 2)

Ã

!(r)

· H(r)

!

t √

· e−t

σ 2

2 /2σ 2

,

(30)

where the Hn (x) denotes the Hermite polynomial of degree n [15, Chapter 4]. Consequently, the maximum value for T and the minimum for ta will depend on R. If these conditions are satisfied, the approximation in (17) still holds, Ã

K Rk −1 2 l 1 √ (−σ 2 2π 2 l 2 ) X X (N T ) · ckr i2π ZDF T [l] ' σ 2πe T NT k=1 r=0

Once again, dividing the above expression by Z M [l] =

K RX k −1 X

1 σ T



ltk

ltk

e−j2π N T .

(−σ 2 2π 2

2πe

c˜kr lr e−j2π N T .

!r

l2 ) (N T )2

(31)

, yields (32)

k=1 r=0

Using these values to construct the Hankel matrix Y, the {tk } values can now be estimated by the state space method (see [7]). Notice that due to the presence of the lr factors in the expression (32) of Z M [l], the dispersion of the Z M [l] values will be larger than in the case of the streams of Diracs, and therefore, the problem may become ill-conditioned when R and the index l take large values. However, this dispersion is much smaller than the one that would have been obtained if the method in [7] had been used.

4

Results

The proposed reconstruction algorithm is analyzed by performing an experiment consisting of randomly generating streams of Dirac signals (8). The following assumptions have been taken. The amplitudes of the pulses are i.i.d random variables with uniform distribution ck ∼ U [1, 2], and the time spacing between Dirac pulses are modeled as i.i.d exponential random variables 16

with mean one (second) plus a fixed minimum spacing time tmin = 0.2 s, i.e. (tk+1 −tk ) ∼ exp(λ = 1) + tmin . The number K of Dirac pulses is then given by the resulting number of time instants tk located inside an interval of 20 seconds. Notice that in order to check the accuracy of the algorithm, which is the objective inhere, one has to know a priori the minimum distance tmin between pulses so that σ in the kernel h(t) in (9) can be set small enough to obtain a good resolution when estimating the position of two close pulses. In our case, with tmin = 0.2 s, the chosen sigma is σ = 0.08 s. In case the minimum distance is not known, then σ must be set according to the required resolution. For example, if two particular pulses are very close together with respect to the σ chosen in the kernel (or equivalently with respect to the required resolution), the algorithm will give a unique pulse located in the vicinity where the two original pulses are, and with an amplitude close to the sum of their amplitudes. Furthermore, in order to satisfy the time constraint considered in Section 3.1, the first sample is taken (3.5σ − T ) seconds before the occurrence of the first Dirac pulse, whereas the end sample is taken (3.5σ − T ) seconds after occurrence of the last Dirac pulse (i.e., ta = 3.5σ). The sampling interval T is obtained as (N + 1)T = 20 + 2ta , where N is the number of samples. Finally, the Hankel matrix in expression (23) is constructed by setting L to zero and the value lmin obtained from expression (24) with nc set to 2.

4.1 Reconstruction of the signal

In what follows, it is assumed that the rate of innovation, i.e. the number of pulses K, is known. At the end of this section an estimation method of 17

this rate is proposed. Since the signal of interest is a stream of Dirac pulses, x(t) will be perfectly reconstructed once the time instants and the amplitudes of these pulses are known. Therefore, in order to assess the accuracy of the proposed algorithm the following estimation errors are used: Et =

1 E[tk+1 − tk ]2

K X (tˆk − tk )2 k=1

K

PK

,

Ec =

ck − ck )2 k=1 (ˆ , PK 2 k=1 ck

(33)

where tˆk and cˆk are the estimations of the time instants tk and of the amplitudes ck , respectively. We will also denote by Ebt and Ebc the errors Et and Ec averaged over 1000 realizations of the experiment, respectively. We begin by corroborating the statement made in Section 3.1 that the accuracy of the estimation improves as the size of the Hankel matrix Y increases (i.e. as llim increases), as long as the average SNR[l] in Y M [l] is large for each index l. To show this, in Fig.1 (a) we plot Ebt versus llim in the hypothetical case where the average SNR[l] in expression (27) is nearly constant with l. This is done by assuming that Y M [l] is given by Y M [l] = Z M [l] + T · WDF T [l] instead of by its actual value in expression (21). Since the expected value of Z M [l] and WDF T [l] stay nearly constant with respect to l, so does SNR[l]. However, we emphasize that this is done just for theoretical purposes since the actual random variables {WDF T [l]} are also multiplied by the factor

2 2 2 2 T √ e+σ 2π l /(N T ) . σ 2π

On the other hand, in Fig.1 (b), Ebt is plotted versus llim when the Hankel matrix is constructed with the actual Y M [l] values in (21). As stated in Section 3.1, Fig.1 (b) clearly shows that there exists an optimum llim value where the estimation error Et is minimized. The second set of simulations focuses on the performance of the proposed algorithm. To that end, Fig.2 (a) and (b) plot Ebt and Ebc versus the SNR in 18

−2

1

10

10

0

10

−3

10

−1

10 −4

10

Ebt

Ebt

−2

10

−5

10

−3

10 −6

−4

10

10

−5

10

−7

10

40

50

60

70

80

90

100

110

120

40

50

60

70

80

90

100

110

120

l

llim

lim

(a)

(b)

Fig. 1. Ebt (a) when the DFT of the noise WDF T is directly added to Z M and (b) when WDF T is scaled as ZDF T to get Y M .

the samples for different number of samples N , respectively. Notice that the logarithms of Ebt and Ebc decrease linearly with the SNR (in dB) for SNR>10dB. −1

Finite−length signals

−1

10

Finite−length signals

10 N = 300 N = 400 N = 600 N = 800 N = 1000 N = 2000

−2

10

N = 300 N = 400 N = 600 N = 800 N = 1000 N = 2000

−2

10

−3

10

Ebt

Ebc −4

10

−3

10 −5

10

−6

10

5

−4

10

15

10

20

5

10

15

SNR (dB)

SNR (dB)

(a)

(b)

20

Fig. 2. (a) Ebt and (b) Ec versus the SNR in the samples for different sample sizes N .

Although the above results will vary if some of the parameters of the experiment are changed, the accuracy of the reconstruction algorithm can be kept roughly the same if the parameters in the algorithm are modified accordingly. For instance, if tmin is decreased, the σ in the Gaussian kernel should be decreased to keep the same resolution between pulses; at the same time the 19

number of samples N should be increased. −4

10 sinc kernel Gaussian kernel

Aperiodic Gaussian kernel Aperiodic sinc kernel

−5

−5

10

10

−6

10

Ebt

Ebt

−10

10

−7

10

−8

−15

10

10

−9

0.05

0,15

0,25 0,35 3.5σ

0,45

0,55

0,65

0,75

10

0.85

10

t

15

20

25

30

35

40

45

50

SNR (dB)

a

(a)

(b)

Fig. 3. (a) Ebt vs ta for a Gaussian kernel and a sinc kernel in the noiseless case and (b) Ebt vs samples’ SNR for the same two kernels.

In section 3.1, a Gaussian kernel was chosen. However, by looking at expression (14), nothing prevents one from using any other bandlimited kernel function, for instance the sinc function widely used for periodic signals. In this case and in contrast to what happens with a Gaussian kernel, the SNR[l] of the Y M [l] entries in matrix Y would not vary with l. However, as mentioned in Section 3.1, the sinc kernel has the drawback that it does not decrease as fast as the Gaussian kernel. This in turn affects the accuracy of the algorithm even if higher ta values are selected. For the noiseless case and with N = 288 (the lowest N satisfying condition

3.5 σ


ξ} = Pr {λk > ξ|K = k}P r{K = k} = 1 . 2 k=1 (34)

This threshold will guarantee no errors. The larger T is, the more robust the above estimation will be in presence of observation noise. This definition, however, does not take into consideration the effect of noise. A more appropriate definition for the threshold T in presence of noise would be as follows. Given an arbitrary small ² ¿ 1, find Λ0 and Λ1 as (

Λ0 (²) = sup ξ ∈ R+ : (

Λ1 (²) = inf ξ ∈ R+ :

∞ X

)

Pr {λk > ξ| K = k} · Pr {K = k} = 1 − ² , (35)

k=1 ∞ X

)

Pr {λk+1 ≤ ξ| K = k} · Pr {K = k} = 1 − ² (36) ,

k=1

where λk and λk+1 are the singular values located at positions k and k + 1, respectively. Then, define T as either the arithmetic (T (²) = (Λ0 (²) + Λ1 (²))/2) or geometric (T (²) =

q

Λ0 (²)Λ1 (²)) mean of Λ0 (²) and Λ1 (²). Notice

that whenever Λ0 (0) > Λ1 (0), an estimator of K based on this threshold c 6= K) equal to zero. Otherwise, will render a probability of error Pe = Pr(K

Pe = Pr({λK < T (²)} ∪ {λK+1 > T (²)}). Notice also that in the absence of noise, the threshold in expression (34) and T (²) = (Λ0 (²) + Λ1 (²))/2 with 23

² = 0 coincide. 14 12 10

N = 300 N = 400 N = 600 N = 800 N = 1000 N = 2000

8

ζ 6 4 2 0 −2 5

10

15

20

25

SNR (dB)

Fig. 5. ζ versus the samples’ SNR for different N values.

An analytical derivation of Λ0 (²) and Λ1 (²) is difficult and therefore, we have resorted to estimate Λ0 (²) and Λ1 (²) by computer simulation based again on 1000 realizations of the defined experiment. Based on the results, the estib and Λ b have been computed in such a way that the ² satisfying mations Λ 0 1 b =Λ (²) and Λ b =Λ (²) lies in the confidence interval (0, 0.01) with confidence Λ 0 0 1 1

level of 99%. Fig.5 plots

Ã

ζ = 10 log10

b Λ 0 b Λ1

!

(37)

versus the average SNR of the samples for different N values. Obviously, the higher ζ is, the lower the error probability Pe will be.

5

Conclusions

A novel algorithm based on the state space method for reconstruction of certain aperiodic finite rate of innovation signals from their filtered samples has been proposed. In contrast to previous reconstruction algorithms which rely on a time-domain formulation, the proposed algorithm works on the frequency domain. In this way, it avoids the computational problems that may arise from 24

the use of ill-conditioned Hankel matrices. Furthermore, it is also shown that the use of the state space method allows estimating the rate of innovation before applying the reconstruction algorithm, by paying attention to the magnitude of the singular values of the Hankel matrix involved in the algorithm.

A

Appendix

To show that the Z M [l] values are of similar order of magnitude, let us first calculate the mean and the variance of Z M [l]. M

E[Z [l]] = E

"K X

# lt −j2π N kT

= E[c]

ck e

k=1

K X

·

lt

−j N kT

E e

¸

Var[Z M [l]] = E[Z M [l]Z M [l]∗ ] − E[Z M [l]] · E[Z M [l]]∗ = K X

"

K X

Ã

l(tm − tk ) = K · E[c ] + E[c] E 2 cos 2π NT k=1 m=k+1 2

2

E[c]

2

K X K X

(A.1)

k=1

·

E e

lt

−j2π N kT

¸

h

ltm

!#

i

E e+j2π N T ,

− (A.2)

k=1 m=1

since

M

M



E [Z [l]Z [l] ] =

K X K X

· j2π

E ck cm e

k=1 m=1 K K X X

"

l(tm −tk ) NT

¸

=

Ã

l(tm − tk ) K ·E[c ] + (E[c]) E 2 cos 2π NT k=1 m=k+1 2

2

!#

,

(A.3)

where it has been assumed that the {ck } random variables (r. v.) are i.i.d and independent from the time instant r. v. {tk }. Without any loss of generality assume that tm > tk and ta = 0 (see Section 3.1). By definition, the expectation of the cosine in the previous expression is "

Ã

l(tm − tk ) E 2 cos 2π NT

!#

,

Z (N −1)T 0

25

Ã

ltd 2 cos 2π NT

!

pk,m (td )dtd ,

(A.4)

where td = tm − tk and pk,m (td ) is the probability density function for td . Proposition 2 Let {pk,m (td )} be a family of differentiable probability density functions with bounded first derivatives 8 , that is, | dtd pk,m (t)| < D < ∞ ∀t ∈ (0, (N − 1)T ). Then, as |l| grows without-bound the expected value in (A.4) converges to zero. That is, "

Ã

l(tm − tk ) lim E 2 cos 2π |l|→∞ NT

!#

= 0.

(A.5)

PROOF. Intuitively, as the value of the cosine period diminishes (l → ∞), its positive and negative cycles should cancel themselves. Next, we prove this in a formal way. Consider one cycle of the cosine in the above integral. Its period is N T /l, and an upper-bound of the contribution of this cycle to the integral can be derived as follows. Denote by pm the maximum value of pk,m (td ) on this particular cycle C. Z

Ã

lt´ cos 2π NT C

!

Z

3N T 4l

Ã

!

lt pk,m (t)dt cos 2π NT NT − 4l à ! à !µ ¶ Z NT Z 3N T lt lt NT 4l 4l ≤ N T cos 2π pm dt + N T cos 2π D dt = pm − NT NT l − 4l 4l (N T )2 = D, (A.6) πl2 pk,m (t´)dt´ =

where the inequality is obtained by assuming that pk,m (td ) takes its maximum value when the cosine is positive and its minimum when the cosine is negative. Under the hypothesis that | dtd pk,m (t)| < D and considering the least favorable 8

In the usual case the r. v. tk −tk−1 has a Gamma distribution (the exponential dis-

tribution considered in the simulations is a special case of theses distributions), the pdf pk,m (t) will have also a Gamma distribution, which is a differentiable function d that satisfies | dt pk,m (t)| < D ∀t ∈ (0, (N − 1)T ).

26

case where pk,m (td ) decreases linearly with slope D in t ∈ C, this minimum value is pm −

NT D. l

On the other hand, for large |l|, the total number of cycles in the interval of length (N − 1)T is approximately l. Therefore, Z (N −1)T 0

Ã

!

l(td ) (N T )2 cos 2π pk,m (td )dtd ≤ D. NT π|l|

(A.7)

Similarly, a lower-bound could be derived by the same procedure, yielding Ã

!

Z (N −1)T (N T )2 l(td ) (N T )2 cos 2π − D≤ pk,m (td )dtd ≤ D. πl NT πl 0

(A.8)

Therefore, since N T and D are finite constants, "

Ã

l(tm − tk ) lim E 2 cos 2π l→∞ NT

!#

= 0.

(A.9)

Since Proposition 2 holds if the cosine is substituted by a sine, the complex exponential will also converge to zero. Therefore, if the probability density functions of ti (i = 1, . . . , K) and (ti − tj ) (j 6= i) are differentiable with a bounded first derivative in (0, (N − 1)T ), the expectations of the cosine and the complex exponentials in (A.1) and (A.2) converge to zero as |l| grows to infinity. Then, the mean and the variance of Z M [l] converge to lim E[Z M [l]] = 0

|l|→∞

lim Var[Z M [l]] = KE[c2 ].

and

|l|→∞

On the other hand, these statistics can be upper-bounded by |E[Z M [l]]| ≤ KE[c]

and

Var[Z M [l]] ≤ KE[c2 ] + K(K − 1)E[c]2 .

Next, we calculate the following two ratios: E[c2 ] + (K − 1)E[c]2 max Var[Z M [l]] = ≤ K, lim|l|→∞ Var[Z M [l]] KE[c2 ] 27

(A.10)

where the inequality holds because E[c2 ] ≥ E[c]2 , and max Var[Z M [l]] E[c2 ] (K − 1) = + . max E[Z M [l]]2 KE[c]2 K

(A.11)

Let us consider the common case where the number of impulses K is not too large and the variance in the amplitudes is not high (i.e., E[c2 ] of the order of E[c]2 ). It can be seen that the variance of Z M [l] does not vary in many orders of magnitude. In addition, in spite of the large magnitude change in E[Z M [l]], the ratio (A.11) suggests that the maximum value of the mean and the maximum value of the variance are of similar order, and consequently, the order of magnitude of Z M [l] does not significantly change with l. Proposition 3 Under the assumption that the distributions of (tm − tk ) are differentiable and with a bounded first derivative, the signal-to-noise ratio (SNR) of the Y M [l] variables, SN R[l] =

E [Z M [l]Z M [l]∗ ] , E [W M [l]W M [l]∗ ]

(A.12)

decreases exponentially with |l|2 .

PROOF. From the orthogonality of the DFT transformation and the i.i.d assumption of the noise, the average energy of WDF T [l] does not depend on l and is given by



E[WDF T [l]WDF T [l] ] = E

"ÃN −1 X

−j 2πnl N

w[n]e

! ÃN −1 X

N −1 N −1 X X

w[m]e

m=0

n=0

=

!# +j 2πml N

E[w[n]w[m]]ej

2π(m−n)l N

n=0 m=0

=

N −1 X

σw2 = N σw2 . (A.13)

n=0

√ T [l] · e+σ Therefore, since W M [l] = T · WσDF 2π

2 2π 2 l2 /(N T )2

, E [W M [l]W M [l]∗ ] will grow

exponentially with |l|2 . On the other hand, the average energy of Z M [l] is given 28

in expression (A.3), and one can see that this energy oscillates around KE[c2 ]. Furthermore, under the assumption of the proposition, Proposition 2 shows that this oscillation decreases in amplitude as |l| increases. Therefore, the SNR in Y M [l] will decrease with |l|2 .

References

[1] H. Nyquist, ”Certain topics in telegraph transmission theory”, Proceedings of the IEEE, vol. 90, N. 2, pp. 280-305, February. 2002. [2] C. E. Shannon, ”A Mathematical Theory of Communication”, Bell Syst. Tech. Journal, vol. 27, 1948. [3] A. J. Jerri, ”The Shannon Sampling Theorem- its Various Extensions and Applications: A Tutorial Review”, Proceedings of the IEEE, vol. 65, pp. 15651596, November. 1997. [4] M. Vetterli, P. Marziliano and T. Blu, “Sampling Signals With Finite Rate of Innovation”, IEEE Transactions on Signal Processing, vol. 50, N. 6, pp. 14171428, June 2002. [5] P. L. Dragotti, M. Vetterli and T. Blu, ” Sampling Moments and Reconstructing Signals of Finite Rate of Innovation: Shannon meets Strang-Fix”, IEEE Transactions on Signal Processing, vol. 55, N. 5, pp. 1741-1757, 2007. [6] P. Stoica and R. Moses, ”Introduction to Spectral Analysis”, Englewood Cliffs, NJ: Prentice-Hall, 2000. [7] I. Maravi´c and M. Vetterli, “Sampling and Reconstruction of Signals with Finite Rate of Innovation in the Presence of Noise”, IEEE Transactions on Signal Processing, vol. 53, N. 8, pp. 2788-2805, August 2005.

29

[8] M. Zoltawski and D. Stavrinides, ”Sensor array signal processing via a Procrustes rotations based eigen-analysis of the ESPRIT data pencil”, IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 832-861, June 1989. [9] S. Y. Kung, K. S. Arun and D. V. Bhaskar Rao, ”State-space and singular value decomposition based approximation method for the harmonic retrieval problem”, J. Opt. Soc. Amer., vol. 73, N. 12, December 1983. [10] V. Y. F. Tan and V. K. Goyal, ”Estimating Signals with Finite Rate of Innovation from Noisy Samples: A Stochastic Algorithm”, IEEE Transactions on Signal Processing, vol. 56, N. 10, pp. 5135-5146, October 2008. [11] A. Erdozain and P. M. Crespo, ”A New Stochastic Algorithm Inspired on Genetic Algorithms to Estimate Signals with Finite Rate of Innovation from Noisy Samples”, Signal Processing, vol 90, pp. 134-144, January 2010. [12] D. W. Tufts and R. Kumaresan, ”Estimation of frequencies of multiple sinusoids: Making linear prediction perform like maximum likelihood”, Proc. IEEE, vol. 70, pp. 975-989, Sept. 1982. [13] Y. Hua and K. Sarkar, “Matrix Pencil Method for Estimating Parameters of Exponentially Damped/Undamped Sinusoids in Noise”, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 38, N. 5, pp. 2788-2805, May 1990. [14] G. H. Golub and C. F. Van Loan, ”Matrix Computations”, John Hopkins University Press, 1983. [15] S. S. Bayin ”Mathematical Methods in Science and Engineering”, Wiley, 2006.

30