A comparison of Extended Kalman Filter and Levenberg-Marquardt methods for neural network training Pablo Deossa ∗ , Julian Pati˜no † , Jairo Espinosa‡, Felipe Valencia§ Facultad de Minas Universidad Nacional de Colombia, Antioquia, Medellin Cra. 80 No. 65-223 M8-108 GAUNAL Email: [pablo.deossa∗, julian.patino†, jairo.espinosa‡, felipe.valencia§] @ieee.org
Abstract—This paper presents a performance comparison of both the Levenverg-Marquardt and Extended Kalman Filter methods for neural network training. As a testbed, an indoor localization problem was solved by the neural network from the RSSI data obtained through a experimental measurement. Both methods were used to train the network, and the MSE (mean squared error) was employed as the performance metric. Index Terms—Kalman ltering, RSSI, neural networks, optimization, Levenberg-Marquardt, localization.
I. I NTRODUCTION In the last decade, and with the improvement in neural networks on both theoretical and hardware levels, a lot of efforts were made for introducing them into practical applications. Since 1980 the back propagation algorithm proposed by Rumelhart, Hinton and Williams has long been used as one of the most important training methods for neural networks. However, the relative ineffectiveness of the gradient method used in this method has motivated the development of several alternative training algorithms. Many of this proposed solutions are based on the use of the information of the second order derivative for updating the weights of the neural network. The most commonly used algorithms are: quasi-Newton, Levenberg-Marquardt, and conjugate gradient techniques [1]. Although these methods have shown promise, they are often plagued by convergence to poor local optima, which can be partially attributed to the lack of a stochastic component in the weight update procedures [1]. In order to tackle this problem, the Kalman lter can be used. The essence of this recursive procedure is that, during training, in addition to evolving the weights of a network architecture in a sequential fashion, an approximate error covariance matrix that encodes second-order information about the training problem is also maintained and evolved capturing the stochastic component in the weight update procedure. Since Levenberg-Marquardt algorithm (LMA) and the Kalman based methods have been widely used (see [2] [3] [4]) to neural network training, in this paper a comparison between such techniques is proposed in order to establish which one has a better performance
for neural networks training. With this objective, a neural network applied to indoor localization is proposed as a testbed. This paper is organized as follows: in Section II, an introduction to neural network training and the description of the training methods to be compared is presented. Section III shows a brief description of the experiment to test the training methods and a brief review of the role of the neural networks applied to indoor localization is presented. In the last section, the results are presented. II. N EURAL NETWORK TRAINING A neural network is a massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use. The procedure used to perform the learning process is called a learning algorithm, and the function of the training algorithm is to modify the synaptic weights of the network in an orderly fashion to attain a desired design objective [5]. Given a set of random initial weights to the neural network, the training algorithm modify the weights to reach the objective function. As an example, assume that the input of the neural network is a RSSI (Received Signal Strength Indicator) pro le from a router, and the objective is the determination of the location of the router. Then, given a table of points that establish a relation of each RSSI with the distance, the neural network is trained [6] [7]. So, after being trained, the neural network should be able to estimate the router the distance based on any RSSI value. A. Levenberg-Marquardt Method The Levenberg-Marquardt method consists of an iterative procedure for nding the minimum value of a multivariate function de ned commonly as the square value of a non-linear function [8]. The Levenberg-Marquardt method is widely used in practice because several real optimization problems can be expressed as a non-linear least squares problem. Regarding neural networks training, it is considered the most ef cient algorithm for training median sized arti cial neural networks (see [4] and the references therein).
Let W denote the vector of weights of a neural network. Let e(k) = y(k) − yˆ(k), where y(k) is the measured output at time step k, and yˆ(k) is the neural network output at time step k. Then, the weights of the neural network can be calculated as the solution of the minimization problem 1 min eeT ee x 2
(1)
where ee = [e(1), . . . , e(L)], and x = [W T , bT ]T , with L the number of samples used for the neural network training, and b the vector of biases. Let f (x) = 21 eeT ee. Then, f (x) is a quadratic function. Thus, the minimization problem (1) can be formulated as a non-linear least squares problem because it constitutes an unconstrained minimization problem with quadratic cost function. So, an iterative solution of (1) is given by xi+1 = xi − H −1 (x)∇f (x) (2) where ∇f (x) denotes the gradient of f (.), H denote the Hessian matrix of f (.), and i denote the iteration number. Let J(x) denote the Jacobian matrix of f (.) with respect to x. Then ∇f (x) = J T (x)e e Note that the computation of (2) requires the computation of the inverse of H(x). However, the computation of H(x) is computationally heavy. Then the approximation based on the Jacobian of f (.) (2) is used [4]. H(x) ≈ J T (x)J(x) However, the simpli ed Hessian matrix might not be invertible. To overcome this problem the modi ed Hessian matrix (3) is used. H(x) ≈ J T (x)J(x) + µi I (3) where I is an identity matrix, and µi is a value such that makes H(x) positive de nite and therefore invertible. So, (2) becomes xi+1 = xi − (J T (x)J(x) + µi I)−1 ∇f (x)
When the systems are dynamic and non-linear, the use of the Extended Kalman Filters (EKF) is applied through the linearization at each time step for the system. The general equation for the Kalman lter is given for [10] : State-space model
(4)
The iterative solution (4) of (1) constitutes the LevenbergMarquardt procedure for computing the weights of a neural network. Moreover if in (4) if ∇f (x) does not has an explicit expression, it should be approximated numerically. This fact increases the computational burden of the LMA. In addition, since LMA was formulated originally based on the inverse of a scalar number (4) implies the inverse of the matrix J T (x)J(x) + µi I. Recall that the numerical computation of a matrix inverse is highly sensitive to the tolerance of the numerical procedure. Then the computation of such inverse may be a source of error on the LMA. B. Extended Kalman •lter The Kalman lter is an optimal estimator similar to the least squares method but is able to estimate both linear and non-linear systems ( [9]). The Kalman Filter can perform the estimation in presence of noise in the system and sensors.
xk+1 = f (k, xk ) + wk
(5)
yk = h(k, xk ) + vk
(6)
De•nitions Fk+1,k
∂f (k, x) = ∂x
∂h(k, x) Hk = ∂x
(7) x=xk
(8) x=x− k
Where wk and vk are independent, zero mean , Gaussian noise processes of covariance matrices Qk and Rk , respectively. Initialization for k = 0 , set x b0 = E[x0 ] P0 = E[(x0 − E[x0 ])(x0 − E[x0 ])T ] Computation for k = 1, 2, ..., compute: State estimate propagation x b− bk−1 ) k = f (k, x
(9)
State estimate propagation T Pk− = Fk,k−1 Pk−1 Fk,k−1 + Qk−1
(10)
Kalman gain matrix Gk = Pk− HkT [Hk Pk− HkT + Rk ]−1
(11)
State estimate update x bk = x b− b− k + Gk yk − h(k, x k)
(12)
Error covariance update Pk = (I − Gk hk )Pk−
(13)
A disadvantage of the Kalman Filter is that requires initial conditions for the mean and variance of the state vector to start the recursive algorithm. Nowadays, there is not a consensus about how to determine these initial conditions.On the other hand, it is also required specify the covariances of the noise and covariences of the process (Rk and Qk matrices respectively).
C. Neural network training with EKF The training problem using Kalman •lter theory can now be described as •nding the minimum mean-squared error estimate of the state w using all observed data so far. We assume a network architecture with M weights and N0 output nodes and cost function components. The EKF solution to the training problem is given by the following recursion [10] : Ak = [Rk + HkT ]−1
(14)
Kk = Pk Hk Ak
(15)
w ˆk+1 = w ˆk + K k ek
(16)
Pk+1 = Pk − Kk HkT Pk + Qk
(17)
The vector w ˆk represents the estimate of the state (i.e., weights of neural networks) of the system at update step k. This estimate is a function of the Kalman gain matrix Kk and the error vector ek = yk − yˆk , where yk and yˆk are the network’s output for the kth presentation of a training pattern, Pk is the error covariance matrix and H is the matrix of derivative of the network’s outputs with respect to all trainable weight parameters Hk , and Ak is a global scaling matrix function of the measurement noise covariance matrix Rk , Hk and Pk . Finally, the approximate error covariance matrix Pk evolves recursively with the weight vector estimate; this matrix encodes second derivative information about the training problem, and is augmented by the covariance matrix of the process noise Qk . The algorithm attempts to •nd weight values that minimize the sum of squared error ek . Note that the algorithm requires the measurement and process noise covariance matrices Rk and Qk to be speci•ed for all training. The values of these variables were extracted from the RSSI data set (see III-B) III. C ASE DESCRIPTION
power can be estimated. Assuming that this power is due to path-loss, it is consequently proportional to the distance (•rst order or higher). Here again the absence of line of sight and more importantly multipath diversity and the numerous re ections due to the irregularities in the architecture makes the power loss information inaccurate, thereby resulting in high errors. [11] [12] [13] The problem of constantly uctuating RSS and even absence of wireless signal introduces very unreliable distance estimations and, by extension, bad location estimations. Estimation reliability is directly affected by how good sample wireless signal data at target locations represent the real life situations. Therefore, we managed to collect a large number of RSS samples at each target location. To deal with these problems, we propose the use of a Multi Layer Perceptron (MLP) to cope with this uncertainty effectively. MLP has been employed by many researches for pattern recognition problem [14]. The RSS data taken from each location will be used as inputs and the neural networks will output the corresponding positions. B. Experiment description For the development of the paper was decided to take a real dataset of RSSI values, taken over an squared area of 4 X 12 meters. There was a wi-• access point (AP) in each corner (view Fig. 1) emitting a signal associated with a regular broadcast. The signals were acquired with a laptop describing a uniform trajectory at every intersection of the grid. With the signals, we could obtain an RSSI pro•le for the space of the experiment. In each instance, we took an amount of about 200 samples. Over 100 samples were used for training the neural network and the rest were left as validation data. From the data set, the variance for the AP are: 1.6478 1.1611 21.6571 4.4497 . Based on this information the values of Rk and Qk where derived. The value of Rk was calculated with the variance of the data set.
In this section, the study case used to compare the neural network training with LMA and EKF results is presented. With this purpose, an application of the neural network to a real localization problem is being shown. A. Indoor localization with neural networks Location information is an integral and crucial component of ubiquitous computing applications. Indoor localization has been subject to costly infrastructure and special hardware devices mounted on the objects of interest. Usually, to estimate a location, traditional systems perform triangulation using one out of three possible information parameters: time (Time of Arrival, ToA), power (Received Signal Strength,RSS), or angle (Angle of Arrival,AOA). Given the nature of our experimental data, we center these paragraphs around the power (RSS) related algorithms. For these techniques, the theoretical concept relies on power loss due to travel. Therefore, by calculating the received power and comparing it with the transmitted one, the total dissipated
Fig. 1.
Test Bed
To estimate the distance, we raised 4 neural networks with a Feed Forward architecture with one input. Each neural network matched the RSSI signal emitted for each router, and they were composed of 4 neurons in hidden layer and one output to estimate the distance.
IV. VALIDATION AND ANALYSIS The data set was previously processed for the neural networks training. The mean value of the RSSI samples was removed and normalized by dividing by the maximum value of each vector. Thus the neural networks output is also a standard value between 0 and 1. As indicated in the section III-B, a total of 100 samples were prepared for training the networks, and we traced a path with 20 samples per point, in order to test the capability of the neural networks for trajectory tracking. Figure 2 shows the training results of each neural network, and how it is able to track the objective signal with high accuracy.
For further testing, after determining that the network was following the training signal, the values of Qk and Rk matrices were changed in order to test the generality of the network’s performance. As a result, the network was used to track different paths instead of focusing on single points of the space.
Fig. 4.
Fig. 2.
EKF Training result
The training errors are shown in gure 3. It is possible to watch the convergence of the neural networks. There is a bigger error for the rst samples, but soon there is a convergence to zero, indicating that the error between the approximation and the actual signal has been minimized.
Fig. 3.
EKF Training error
Neural network output with EKF training
To compare the performance of the training method, the same neural network was implemented but the training function was replaced for an optimization routine with LevenbergMarquardt algorithm (LMA). The initial values for the methods exposed were randomly generated. The approximation of the LMA was not always successful; to solve this problem, each neural network was trained in a separated way. When we got an acceptable result, the values were saved and the process continued with the next network. The results of each network are placed in the same graphic. Both methods present a successful tracking of the reference signal. The MSE of the two compared systems are shown in Table I.
Fig. 5.
Neural network output with LMA training
Although the LMA method gives a smaller MSE for reference tracking, the EKF result is also very acceptable. Even if the average values show a better response for the LMA algorithm, the EKF method captured more accurately the trajectory trend in each case than the LMA method. This can be seen in •gures 4 and 5. TABLE I MSE EKF AND MLA NN 1 2 3 4
LMA 0.0293 0.0019 0.0293 0.0172
EFK 0.1305 0.0632 0.1305 0.0326
V. C ONCLUSIONS In this paper, we presented a performance comparison of Levenberg-Marquardt and Extended Kalman Filter methods for neural network training. As a case of study, we applied those methods to the training of a Multi Layer Perceptron intended to solve an indoor localization problem. Although the MSE values indicate a better overall result for the LevenbergMarquardt algorithm, this method also has an intensive computational load. For this reason, the Extended Kalman Filter could be more suitable for application where the computational resources are restringed (i.e. Wireless Sensor Networks). As a future work a more rigorous comparison of methods for neural network training is expected in order to achieve a more general result. R EFERENCES [1] S. Haykin, Kalman Filtering and Neural Networks. Wiley-Interscience, 2001. [2] F. Heimes, “Extended kalman lter neural network training: experimental results and algorithm improvements,” in Systems, Man, and Cybernetics, 1998. 1998 IEEE International Conference on, vol. 2, 1998, pp. 1639–1644 vol.2. [3] W. Wu and W. Min, “The mobile robot GPS position based on neural network adaptive kalman lter,” in Computational Intelligence and Natural Computing, 2009. CINC ’09. International Conference on, vol. 1, 2009, pp. 26–29. [4] H. Liu, “On the Levenberg-Marquardt training method for feed-forward neural networks,” in Natural Computation (ICNC), 2010 Sixth International Conference on, vol. 1, 2010, pp. 456–460. [5] S. Haykin, Neural Networks and Learning Machines (3rd Edition). Prentice Hall, 2008. [6] A. Shareef, Y. Zhu, M. Musavi, and B. Shen, “Comparison of mlp neural network and kalman lter for localization in wireless sensor networks,” in PDCS ’07: Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems. Anaheim, CA, USA: ACTA Press, 2007, pp. 323–330. [7] A. Awad, T. Frunzke, and F. Dressler, “Adaptive distance estimation and localization in wsn using rssi measures,” in DSD ’07: Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools. Washington, DC, USA: IEEE Computer Society, 2007, pp. 471–478. [8] D. W. Marquardt, “An algorithm for least-squares estimation of nonlinear parameters,” Journal of the Society for Industrial and Applied Mathematics, vol. 11, no. 2, pp. pp. 431–441, 1963. [Online]. Available: http://www.jstor.org/stable/2098941 [9] J. Patino, J. Espinosa, and R. Correa, “A comparison of kalmanbased schemes for localization and tracking in sensor systems,” in Communications (LATINCOM), 2010 IEEE Latin-American Conference on, sept. 2010, pp. 1 –5.
[10] M. S. Grewal and A. P. Andrews, Kalman Filtering: Theory and Practice Using MATLAB. Wiley-IEEE Press, 2008. [11] M. Bocquet, C. Loyez, and A. Benlarbi-Delai, “Using enhanced-tdoa measurement for indoor positioning,” Microwave and Wireless Components Letters, IEEE, vol. 15, no. 10, pp. 612 – 614, oct. 2005. [12] E. Elnahrawy, X. Li, and R. Martin, “Using area-based presentations and metrics for localization systems in wireless lans,” in Local Computer Networks, 2004. 29th Annual IEEE International Conference on, 16-18 2004, pp. 650 – 657. [13] P. Bahl and V. Padmanabhan, “Radar: an in-building rf-based user location and tracking system,” in INFOCOM 2000. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, vol. 2, 2000, pp. 775 –784 vol.2. [14] J. Hightower and G. Borriello, “Location systems for ubiquitous computing,” Computer, vol. 34, no. 8, pp. 57 –66, aug 2001.