An Analog Network Approach to train RBF Networks ... - IEEE Xplore

Report 2 Downloads 113 Views
Proceedings of the 19th International Conference on Digital Signal Processing

20-23 August 2014

An Analog Network Approach to train RBF Networks based on Sparse Recovery Ruibin Feng and Chi-Sing Leung

A. G. Constantinides

Dept. of Electronic Engineering City University of Hong Kong Hong Kong Email: [email protected]

Imperial College UK

denoising (BPDN) [15]. In [16], the author proposed an analog method, namely the local competition algorithm (LCA). The LCA is inspired by a biological neural model. The convergent behavior of this model was studied in [17]. In this paper, we put the learning problem and the RBF center selection problem together. We formulate the problem as a sparse approximation problem. Afterwards, we use the LCA to train the RBF network, as well as, to select the RBF centers. Since the LCA can limit the weight magnitude in the l1 -norm sense. The trained RBF networks also have certain ability to handle weight failure. The rest of this paper is organized as follows. In Section II, we briefly describe the background on the sparse approximation problem and the RBF approach. In Section III, the proposed LCA-RBF approach is presented. Section IV presents the simulation results. Finally we draw the conclusion in Section V.

Abstract—The local competition algorithm (LCA) is an analog neural approach for compressed sensing. It is used to recover a sparse signal from a set of measurements. Unlike some traditional numerical methods that produce many elements with small magnitude, the LCA automatically set those unimportant elements to zero. This paper formulates the training process of radial basis function (RBF) networks as a compressed sensing problem. We then apply the LCA to train RBF networks. The proposed LCA-RBF approach can select important RBF nodes during training. Since the proposed approach can limit the magnitude of the trained weight, it also has certain ability to handle RBF networks with multiplicative weight noise. Index Terms—Local Competition Algorithm, Fault Tolerance, RBF Networks

I. I NTRODUCTION The radial basis function (RBF) approach has been successfully applied in many domains [1]–[3]. One of the important issues in the RBF approach is the selection of RBF centers (nodes). The trivial approach is that we use the inputs of all training samples as the RBF centers [4]. But this approach often results in an ill-conditioned solution. Apart from the trivial solution, we can select the RBF centers randomly from the training set [5]. Another approach is to select the centers based on the clustering algorithm [1]. In the orthogonal least squares (OLS) approach [6], all training samples are selected as the RBF centers. The OLS algorithm then sorts the RBF centers based on the mean square error (MSE) performance. Afterwards, designers select the important centers from the sorted list. However, few RBF selection approaches consider the network fault situation. In the implementation of artificial neural networks, weight failure cannot be avoided [7]–[9]. In the digital implementation of a trained neural network, the loss of precision in the trained weights can be modelled as the multiplicative noise [10], [11]. Since the network output is very sensitive to large weights [12], [13], one simple but effective approach is to limit the weight magnitude. However, the weight decay approach does not provide a mechanism to select the important RBF centers. Sparse approximation has been an active research area in the last decade. There are many digital algorithms (numerical algorithms), like greedy algorithm matching pursuit (MP) [14], and interior point-type methods for solving basis pursuit

978-1-4799-4612-9/14/$31.00 © 2014 IEEE

II. BACKGROUND A. Sparse Approximation In the sparse approximation, we would like to estimate a sparse solution w ∈ M of a system, given by b = Φw,

(1)

where b ∈ N is the observation vector, Φ is an N × M matrix with a rank of N , where N < M . In a more precise way, the sparse approximation problem is defined as min w0 subject to b = Φw .

(2a) (2b)

Unfortunately, Problem (2) is NP-hard [18]. In [15], a practical and well-known approach, called basis pursuit denoising (BPDN), formulates the problem as an unconstrained optimization problem, given by 1 (3) min y − Φw22 + λw1 w 2 where λ is a trade-off parameter. The most common solver is the interior point-type method. In [16], the LCA was proposed to solve (3). It is inspired by the operation of neural systems. Using an analog neural circuit to solve an optimization problem has been studied over

903

DSP 2014

Proceedings of the 19th International Conference on Digital Signal Processing

20-23 August 2014

many decades [19]–[22]. When a realtime solution is needed, the analog neural approach is more effective. In the analog neural approach, we do not solve the optimization problem in a digital computer. Instead, we set up an associated neural circuit for the optimization problem. After the neural circuit settles down at one of equilibrium points, the solution is obtained by measuring the neuron output voltages at this stable equilibrium point. In the LCA, an energy function is defined as

multiplicative weight noise. In the weight noise situation, an implemented weight is given by w ˜i = wi + βi wi ; ∀ i = 1, · · · , N,

where the multiplicative fault factors βi ’s are used to model the behavior of the multiplicative weight noise. They are identical independent zero-mean random variables with variance σβ2 . From (9) and (10), the training set error of a faulty network is given by 1 ˜ 22 . (11) Eβ =  y − Ψw 2 Taking the expectation over βi ’s, the average training set error of faulty networks is given by

1  y − Φw 22 +λw1 (4) 2 A neural network with M neurons are used to store the variables w. Since the term  w 1 is non-differentiable (with respect to w), a set of internal states, ui (t) where i = 1, · · · , M , are introduced. The dynamics of the internal states are given by L=

u˙ wi

= =

1 1 E¯β =  y − Ψw 22 + wT Gw 2 2 where

du = ΦT y − (ΦT Φ − I)w − u (5) dt  0 for |ui | ≤ λ .(6) Tλ (ui ) = ui − λsign(ui ) for |ui | > λ



G = σβ2

In the LCA, w is the output state vector, u is the internal state vector, Tλ (·) is the activation function, and (ΦT Φ − I) is the interconnection matrix. In the LCA, the norms of the column vectors in Φ are equal to 1. Hence the diagonal elements of (ΦT Φ − I) are equal to zero. The dynamics of the neural circuit aim at minimizing L. Also, the proof of the convergence is provided in [15], [17].

N ⎢  ⎢ ⎢ ⎣ j=1

wi ϕi (x)

fˆ(x) =

E(Dt ) =

2

(yj −

j=1

wi ϕi (xj ))2 .

ϕ2M (xj )

N 

wi ϕi (x)

(14a) (14b)

Then, the training objective becomes  1 1 (yj − wi ϕi (xj ))2 =  y − Ψw 22 . (15) 2 j=1 2 i=1 N

E= where

(8)



⎢ ⎢ Ψ=⎢ ⎣

N

ϕ1 (x1 ) ϕ1 (x2 ) .. .

ϕ2 (x1 ) ϕ2 (x2 ) .. .

··· ··· .. .

ϕN (x1 ) ϕN (x2 ) .. .

ϕ1 (xN )

ϕ2 (xN )

···

ϕN (xN )

⎤ ⎥ ⎥ ⎥. ⎦

(16)

Since using all the training samples as the RBF centers often results in an ill-conditioned solution, we need a training algorithm to automatically select the RBF nodes. Borrowing the concept from the LCA [16], we can define a new energy function, given by

(9)

i=1

In the implementation of neural networks, weight faults take place unavoidably [9], [13]. For example, the finite precision of the trained weights in the implementation introduces

978-1-4799-4612-9/14/$31.00 © 2014 IEEE

⎥ ⎥ ⎥ . (13) ⎦

x − xi 2 . ϕi (x) = exp − Δ

(7)

  2 i is the where wi ’s are RBF weights, ϕi (x) = exp − x−c Δ i-th basis function, ci ’s are RBF centers, and M is the number of RBF nodes. Parameter Δ controls the width of RBF kernels. The training set error is given by M 

···



0 ··· .. .

i=1

i=1

N 1

···

··· 0 .. .

If we use all samples as the RBF centers, we can rewrite (8) as

where f (·) is a nonlinear function, and ej ’s are identical independent zero-mean Gaussian random variables with variance σe2 . In the RBF approach, the unknown system f (·) is approximated by M 

0

0 ϕ22 (xj ) . · · · ..

III. LCA-RBF APPROACH

In the RBF approach, we have a training set, denoted as: Dt = {(xj , yj ) : xj ∈ K , yj ∈ , i = 1, · · · , N }, where xj and yj are the input and output of the j-th sample, respectively. The input dimension is equal to K. The outputs are generated by an unknown stochastic system, given by

f (x) ≈ fˆ(x, w) =

ϕ21 (xj ) 0 .. .

(12)

From (10) and (12), the effect of the multiplicative weight noise depends on the magnitude of wi . Hence the effect of the multiplicative weight noise can be suppressed by limiting the weight magnitude. However, the weight decay approach [13] does not provide a mechanism to select the important RBF centers.

B. RBF networks under multiplicative weight noise

yj = f (xj ) + ej

(10)

1 L˜ =  y − Ψw 22 +λw1 , 2

904

(17)

DSP 2014

Proceedings of the 19th International Conference on Digital Signal Processing

20-23 August 2014 Sinc function

where λw1 is an additional term to penalize the weight magnitude. One may suggest that we can set the dynamics of ˜ ∂L w as dw dt = − ∂w . Then we can minimize the energy function ˜ Since w1 is not a smooth function, we cannot use an L. analog neural circuit to define the dynamics of w. Considering the concept of the subgradient, the gradient of L˜ is given by ∂ L˜ ∂w

1.2 1 0.8

y

0.6 0.4 0.2 0

=

−ΨT (y − Ψw) + λ∂w1

=

−ΨT (y − Ψw) − w + w + λ∂w1 . (18)

−0.2 −0.4 −5

NAR samples 1.5 1

y(t)

0.5

Hence (18) becomes

To decrease the energy, the dynamics u˙ = u˙ = −

du dt

(20)

−1 −1.5 0

100

200

300

400

500

600

700

t

NAR example

(21)

Fig. 1.

From (19), the mapping [16] from u to w is given by  0 for |ui | ≤ λ . (22) wi = Tλ (ui ) = ui − λsign(ui ) for |ui | > λ

The two data sets

The NAR time series is generated by z(t)

Equation(21) gives the dynamics to train an RBF network with the selection of RBF centers during training. The advantage of using (21) is that we do not need to implement ∂wi . Note that “∂wi ” at wi = 0 is not equal to a single value and it is equal to a set [−1, 1]. Hence it is difficult to implement ∂wi in an analog circuit. In the LCA for recovering sparse signals, the matrix Φ is an N × M matrix (N < M ) and the diagonal elements of (ΦT Φ − I) are equal to zero. In the LCA-RBF approach for training RBF networks, the matrix Ψ is a square matrix and the diagonal elements of (ΨT Ψ − I) are not equal to zero. Although there exists the above difference between the LCA and LCA-RBF, one can follow the proof of the LCA convergence [15], [17] to prove the LCA-RBF convergence.

=

(0.8 − 0.5 exp(−z 2 (t − 1)))z(t − 1) −(0.3 + 0.9 exp(−z 2 (t − 1)))z(t − 2) +0.1 sin(πz(t − 1)) + e(t) ,

(23)

where e(t) is a zero-mean Gaussian random variable with variance σe2 = 0.01. The series is generated with z(−1) = z(0) = 0. The first 200 data points are used as the training set. The other 500 data points are used as the test set. Our RBF model is used to predict yt = z(t) from the past observation xt = [z(t − 1), z(t − 2)]T . The RBF width is equal to 0.1. B. Fault-free Cases: Comparison with BPDN Solver In this section, we would like to compare our analog circuit approach with the digital algorithm BPDN (from L1Magic). The simulation results are shown in Figure 2 and Figure 4. It can be seen that the performances of LCA and BPDN are quite similar. If we increase the value of λ, the MSE values increase. In addition, we record the magnitude of trained weights. The weight magnitudes of the two approaches are shown in Figure 3 and Figure 5, where the magnitude is calculated by magnitude = log(|wi | + 1). It can be seen that when a large λ is used, the trained RBF networks have less number of weights (less number of RBF nodes). Of course, if the value of λ is too large, the trained RBF networks have large MSE values. It is because the trained RBF networks do not have enough approximation ability.

IV. S IMULATIONS A. Data Sets and Network Setting To verify our algorithm, we consider two data sets, the sinc function and the nonlinear autoregressive (NAR) time series [24], shown in Figure 1. In the sinc function example, the output is given by y = sinc(x) + e, where e is a zero-mean Gaussian noise with variance σe2 = 0.01. The input x is distributed uniformly from −5 to 5. The training set and test set both contain 200 samples. The RBF width Δ is set as 1.

978-1-4799-4612-9/14/$31.00 © 2014 IEEE

0 −0.5

is set to

∂L = ΨT y − (ΨT Ψ − I)w − u ∂w

5

Sinc example

where ∂· is the subgradient operator [23]. Instead of directly implementing the dynamics of w, we introduce u as the internal state vector of the analog network, given by (19) u = w + λ∂w1 . ∂ L˜ = −ΨT (y − Ψw) − w + u . ∂w

0 x

905

DSP 2014

Proceedings of the 19th International Conference on Digital Signal Processing LCA−RBF faultless MSE

0.016

0.014

0.012

0.012

0.01

0.01 −2

λ

10

−1

0

10

10

0.012

0.008 −4 10

−3

10

−2

λ

10

−1

0.01

0.008 −4 10

0

10

10

(b) BPDN

LCA_RBF λ=0.01

−3

10

LCA_RBF λ=0.2

0.1

80

0.3

0.2

0 0

100

LCA-RBF λ=0.01

20

40 60 Weight Index

80

0.4 0.3 0.2

0 0

100

|w| = 0.05

|w| = 0.01

40 60 Weight Index

80

0.05 0 0

40 60 Weight Index

80

100

BPDN λ=0.01

80

100

LCA-RBF λ=0.2 BPDN λ=0.2 0.8 |w| = 0.05 0.7

0.4 0.3 0.2

Weight Magnitude

0.5

0 0

40 60 Weight Index

|w| = 0.05

0.6

0.4

0.6 0.5 0.4 0.3 0.2

0.2

0.1 20

20

BPDN λ=0.01

Weight Magnitude

0.1

0.3

LCA-RBF λ=0.01 1

|w| = 0.01

Weight Magnitude

Weight Magnitude

0.15

0.4

0 0

100

0.8

0.2

0.5

0.1 20

0.6

0.25

0.6

0.2

BPDN λ=0.2 0.7

0.3

0

10

0.7

0.5

LCA-RBF λ=0.2

BPDN λ=0.01 0.35

−1

10

LCA_RBF λ=0.2

0.1 40 60 Weight Index

λ

0.8

0.1

20

−2

10

(b) BPDN

0.6

0.05 0 0

−3

10

|w| = 0.05

Weight Magnitude

Weight Magnitude

0.15

0.008 −4 10

LCA_RBF λ=0.01

0.4

0.2

0

10

0.7

|w| = 0.01

0.3 0.25

−1

10

Fig. 4. MSE versus λ for the NAR example. (a) LCA-RBF algorithm. (b) BPDN algorithm. Note that λ is in the logarithmic scale.

0.5 |w| = 0.01

−2

10 λ

(a) LCA-RBF

Fig. 2. MSE versus λ for the sinc function example. (a) LCA-RBF algorithm. (b) BPDN algorithm. Note that λ is in the logarithmic scale. 0.35

0.014 0.012

0.01

(a) LCA-RBF

Weight Magnitude

0.016

0.014

Weight Magnitude

−3

10

Training Set Test Set

0.018

0.016

0.016

0.014

0.02 Training Set Test Set

0.018

MSE

0.02 0.018 MSE

MSE

0.02

BPDN faultless MSE

0.02 Training Set Test Set

0.022

0.018

0.008 −4 10

LCA−RBF faultless MSE

BPDN faultless MSE

0.024 Training Set Test Set

0.022

MSE

0.024

20-23 August 2014

0.1 20

40 60 Weight Index

80

0 0

100

BPDN λ=0.2

20

40 60 Weight Index

80

BPDN λ=0.01

100

0 0

20

40 60 Weight Index

80

100

BPDN λ=0.2

Fig. 3. Weight magnitude when λ = 0.01 and 0.2 for the sinc function example. Magnitude is calculated by log(|w| + 1).

Fig. 5. Weight magnitude when λ = 0.01 and 0.2 for the NAR example. Magnitude is calculated by log(|w| + 1).

Another observation is that the BPDN solver generates many non-zero weights. This phenomenon is the feature of the numerical methods. In the sinc function with λ = 0.01, the BPDN generates more than 100 non-zero RBF weights. When the LCA-RBF is used, 30 RBF weights are enough for this problem. It is because in the LCA-RBF approach, there is an activation function Tλ (·) applying on ui ’s. Although in the BPDN approach many weights are close to zero, we need a threshold to exclude “small” weights. Note that this threshold value cannot be determined easily and should be determined in the case-by-case manner. On the other hand, the LCA-RBF approach does not generate many close-to-zero weights. It is clear that the LCA-RBF approach is an effective approach to select important centers and training network at the same time.

tion results. From the figures, it can be seen that for small values of ρ, the OLS method cannot produce fault tolerant RBF networks. Even for large values of ρ, the performance of the OLS method is still worst than that of the LCA-RBF approach. For instance, in the NAR example with the weight noise level of Sβ = 0.25, for small values of ρ, the MSE values of the OLS method are very large (> 10). Even for small values of ρ, the MSE values are around 0.2028. For small values of λ, the LCA-RBF approach achieves a MSE value around 0.1 only. The OLS method does not have a mechanism to limit the weight magnitude, while the LCA-RBF approach has. Hence the performance of the LCA-RBF approach is better under the multiplicative weight noise situation. V. C ONCLUSION

C. Faulty Case

This paper addressed the training of RBF networks based on the LCA concept. With the proposed LCA-RBF algorithm, the training process and the selection of RBF centers are merged to a single process. The proposed algorithm can select important RBF nodes during training. In addition, it have certain ability to handle the multiplicative weight noise situation. Its performance is much better than that of the OLS algoirthm.

In this section, we study the performance of the proposed LCA-RBF approach for faulty RBF networks. As a comparison, we also consider the OLS method [6], which is a traditional digital method to select RBF centers during training. The OLS method has a tuning parameter ρ. We compare the two methods under two different fault levels σβ = {0.09, 0.25}. The two data sets, sinc function and NAR example, are considered. Figures 6-7 summarize the simula-

978-1-4799-4612-9/14/$31.00 © 2014 IEEE

906

DSP 2014

Proceedings of the 19th International Conference on Digital Signal Processing

20-23 August 2014

LCA−RBF σ2 = 0.09

LCA−RBF σ2 = 0.09

β

β

0.06

0.15

Training Set Weight Noise Test Set Weight Noise

0.05

0.1

X: 0.001389 Y: 0.1104

MSE

MSE

0.04

Training Set Weight Noise Test Set Weight Noise

0.03 0.02

X: 0.08685 Y: 0.04768

X: 0.08685 Y: 0.02778

X: 0.01326 Y: 0.02395

0.05

0.01 0 −4 10

−3

−2

10

−1

10 λ

10

0 −4 10

0

10

LCA-RBF fault noise σβ2 = 0.09 0.5

MSE −2

0.3 X: 0.01099 Y: 0.2093

0.2

X: 0.08685 Y: 0.05688

X: 0.01326 Y: 0.04647

0.1

−1

10 λ

Training Set Weight Noise Test Set Weight Noise

0.4

MSE 0.05

−3

10

X: 0.08685 Y: 0.1059

0 −4 10

0

10

LCA-RBF fault noise σβ2 = 0.25 OLS

−2

−1

10 λ

10

0

10

LCA-RBF fault noise σβ2 = 0.25 2 β

OLS σ =0.09 0.15

Training Set Weight Noise Test Set Weight Noise

0.04

Training Set Weight Noise Test Set Weight Noise

X: 0.1071 Y: 0.1108

0.1 MSE

MSE

−3

10

σ2=0.09 β

0.06 0.05

0

10

2 β

Training Set Weight Noise Test Set Weight Noise

10

−1

10

LCA−RBF σ = 0.25

0.1

0 −4 10

−2

10 λ

LCA-RBF fault noise σβ2 = 0.09

LCA−RBF σβ = 0.25

0.15

−3

10

X: 0.201 Y: 0.03797

0.03 0.02

0.05

0.01 0 0.1

0 0.15

ρ

0.2

0.25

0.02

OLS fault noise σβ2 = 0.09 OLS

0.04

Training Set Weight Noise Test Set Weight Noise

0.1

0.12

0.14

OLS Sβ=0.25 Training Set Weight Noise Test Set Weight Noise

0.4

0.1 MSE

0.08 ρ

OLS fault noise σβ2 = 0.09

2 σ =0.25 β

0.5 0.15

0.06

MSE

X: 0.2092 Y: 0.07203

0.3 X: 0.1071 Y: 0.2028

0.2 0.05 0.1 0 0.1

0.15

ρ

0.2

0

0.25

0.02

OLS fault noise σβ2 = 0.25 Fig. 6.

0.06

0.08 ρ

0.1

0.12

0.14

OLS fault noise σβ2 = 0.25

MSE of faulty networks for the sinc function example.

978-1-4799-4612-9/14/$31.00 © 2014 IEEE

0.04

Fig. 7.

907

MSE of faulty networks for the NAR example.

DSP 2014

Proceedings of the 19th International Conference on Digital Signal Processing

20-23 August 2014

ACKNOWLEDGEMENT

[23] I. Lobel and A. Ozdaglar, “Distributed subgradient methods for convex optimization over random networks,” IEEE Transactions on Automatic Control,, vol. 56, no. 6, pp. 1291–1306, June 2011. [24] K. S. Narendra and K. Parthasarathy, “Neural networks and dynamical systems,” International Journal of Approximate Reasoning, vol. 6, no. 2, pp. 109 – 131, 1992. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0888613X9290014Q

The work was supported by RGC General Research Fund from Hong Kong (Project No.: CityU 115612). R EFERENCES [1] S. Chen, “Nonlinear time series modelling and prediction using gaussian RBF networks with enhanced clustering and RLS learning,” Electronics Letters, vol. 31, no. 2, pp. 117–118, 1995. [2] S. Fabri and V. Kadirkamanathan, “Dynamic structure neural networks for stable adaptive control of nonlinear systems,” Neural Networks, IEEE Transactions on, vol. 7, no. 5, pp. 1151–1167, 1996. [3] A. Roy, S. Govil, and R. Miranda, “An algorithm to generate radial basis function (RBF)-like nets for classification problems,” Neural Networks, vol. 8, no. 2, pp. 179 – 201, 1995. [Online]. Available: http://www.sciencedirect.com/science/article/pii/089360809400064S [4] T. Poggio and F. Girosi, “Networks for approximation and learning,” Proceedings of the IEEE, vol. 78, no. 9, pp. 1481–1497, 1990. [5] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed. Upper Saddle River, NJ, USA: Prentice Hall PTR, 1998. [6] S. Chen, C. F. N. Cowan, and P. Grant, “Orthogonal least squares learning algorithm for radial basis function networks,” Neural Networks, IEEE Transactions on, vol. 2, no. 2, pp. 302–309, 1991. [7] J.-F. Sum, C. S. Leung, and K.-J. Ho, “On objective function, regularizer, and prediction error of a learning algorithm for dealing with multiplicative weight noise,” Neural Networks, IEEE Transactions on, vol. 20, no. 1, pp. 124–138, 2009. [8] K.-J. Ho, C. S. Leung, and J. Sum, “Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks,” Neural Networks, IEEE Transactions on, vol. 21, no. 6, pp. 938 –947, june 2010. [9] C. S. Leung, H. J. Wang, and J. Sum, “On the selection of weight decay parameter for faulty networks,” IEEE Trans. Neural Netw., vol. 21, no. 8, pp. 1232 –1244, 2010. [10] T. Kaneko and B. Liu, “Effect of coefficient rounding in floating-point digital filters,” IEEE Trans. on Aerospace and Electronic Systems, vol. AE-7, pp. 995–1003, 1970. [11] B. Liu and T. Kaneko, “Error angalysis of digital filter realized with floating-point arithemetic,” Proceedings of The IEEE, vol. 57, pp. 1735– 1747, 1969. [12] J. Bernier, J. Ortega, M. Rodrguez, I. Rojas, and A. Prieto, “An accurate measure for multilayer perceptron tolerance to weight deviations,” Neural Processing Letters, vol. 10, no. 2, pp. 121–130, 1999. [13] C. S. Leung and J.-F. Sum, “Rbf networks under the concurrent fault situation,” IEEE Transactions on Neural Networks and Learning Systems,, vol. 23, no. 7, pp. 1148–1155, July 2012. [14] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” Trans. Sig. Proc., vol. 41, no. 12, pp. 3397–3415, Dec. 1993. [Online]. Available: http://dx.doi.org/10.1109/78.258082 [15] S. S. Chen, D. L. Donoho, Michael, and A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Journal on Scientific Computing, vol. 20, pp. 33–61, 1998. [16] C. J. Rozell, D. H. Johnson, R. G. Baraniuk, and B. A. Olshausen, “Sparse coding via thresholding and local competition in neural circuits,” Neural Comput., vol. 20, no. 10, pp. 2526–2563, Oct. 2008. [Online]. Available: http://dx.doi.org/10.1162/neco.2008.03-07-486 [17] A. Balavoine, J. Romberg, and C. Rozell, “Convergence and rate analysis of neural networks for sparse approximation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 9, pp. 1377–1389, Sept 2012. [18] B. K. Natarajan, “Sparse approximate solutions to linear systems,” SIAM J. Comput., vol. 24, no. 2, pp. 227–234, Apr. 1995. [Online]. Available: http://dx.doi.org/10.1137/S0097539792240406 [19] A. Cichocki and R. Unbehauen, Neural Networks for Optimization and Signal Processing. London, U.K.: Wiley, 1993. [20] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proc. of the National Academy of Sciences, vol. 79, pp. 2554–2558, 1982. [21] L. O. Chua and G. N. Lin, “Nonlinear programming without computation,” IEEE Trans. on Circuits Syst., vol. 31, pp. 182–188, Feb. 1984. [22] S. Zhang and A. G. Constantinidies, “Lagrange programming neural networks,” IEEE Trans. on Circuits and Systems II, vol. 39, pp. 441– 452, Jul. 1992.

978-1-4799-4612-9/14/$31.00 © 2014 IEEE

908

DSP 2014