Noisy data fitting with B-splines using hierarchical genetic algorithm

Report 2 Downloads 71 Views
Noisy data fitting with B-splines using hierarchical genetic algorithm C. H. Garcia-Capulin and G. Trejo-Caballero

H. Rostro-Gonzalez and J. G. Avina-Cervantes

Department of Mechatronics Instituto Tecnol´ogico Superior de Irapuato Irapuato, M´exico Email: [email protected], [email protected]

Department of Electronics DICIS - Universidad de Guanajuato Salamanca, M´exico Email: [email protected], [email protected]

use a novel hierarchical structure to represent both the model structure and the model parameters. This hierarchical structure combines a binary and real codification. We search for the best B-spline model by minimizing a model selection criterion. Three fitness functions are constructed and comparing for choosing a “Best” fitting curve model, these fitness functions are derived from three different model selection methods: AIC, BIC and GCV. As result, our method can simultaneously determine the number and position of the knots as well as B-spline coefficients. In addition, our approach is able to find optimal solutions with the fewest parameters within of the Bspline basis functions. This paper is organized as follows. In section II we review existing methods related to our work. In Section III, we provide some basic concepts of B-splines. In the same way we formulate our approach and we also present the main features of the HGA proposed. Numerical results and some comparison on three fitness functions are presented in Section IV. Finally, in Section V the paper closes with the main conclusions and perspectives.

Abstract—Data fitting by splines in noise presence, has been widely used in data analysis and engineering applications. In this regard, an important problem associated with data fitting by splines is the adequate selection of the number and location of the knots, as well as the calculation of the splines coefficients. Typically, these parameters are separately estimated in the aim of solving this non-linear problem. In this paper, we use a hierarchical genetic algorithm to tackle the data fitting problem by B-splines. The proposed approach is based on a novel hierarchical gene structure for the chromosomal representation, thus, allowing us to determine the number and location of the knots, and the B-spline coefficients automatically and simultaneously. The method is fully based on genetic algorithms and does not require subjective parameters like smooth factor or knot locations to perform the solution. In order to validate the efficacy of the proposed approach, numerical results from tests on smooth functions have been included. Index Terms—Genetic algorithm; regression; data fitting; Bsplines.

I. I NTRODUCTION Data fitting is an important tool for data analysis, and many engineering applications. In this task, the goal is to construct a curve for representing the best estimation of the unknown functional relationship between input and output variables given a data set of noisy observed values. Regression techniques are among some of the most widely used methods for data fitting. The existing regression techniques can be divided in two classes: parametric and nonparametric techniques [1]. The nonparametric techniques include the popular methods of smoothing splines [2] and regression splines [3]. In the regression spline methods is assumed that a function can be approximated by continuous piecewise polynomial (spline). In these methods, the most important issue is the choice of the number and location of the unions (knots) of these piecewise polynomial and then to estimate the spline coefficients by ordinary least squares. The selection of the best model is performed by minimizing some model selection criterion, such as, Akaike’s information criterion (AIC) [4], Bayesian information criterion (BIC) [5] or generalized cross-validation criterion (GCV) [6]. Then the problem becomes a hard continuous multimodal and multivariate nonlinear optimization problem. In this paper, we propose a method for solving the data fitting problem using a hierarchical genetic algorithm (HGA) [7]. We

978-1-4673-6155-2/13/$31.00 © 2013 IEEE

II. R ELATED W ORK Many approach have been proposed in order to solve this optimization problem [8], [9], [10] all of them are based on nonlinear optimization, other approaches use Bayesian models selection in combination with the Markov Chain Monte Carlo method [11], [12], [13]. These techniques, unlike the previous ones, are focus on the way to estimate the spline coefficcients, and secondly on the choice of the knots [3]. These methods have proven to be successful, however, they are computationally intensives and some times result in local optimal solutions [14]. Due to the complicated nature of the optimization problem, several researchers have proposed intelligent search schemes based on genetic algorithms (GA) [15]. Typically, the GA based approaches employ a model selection criterion as fitness function to optimize the number and location of the knots. In [3], [16], the authors convert the search space into a discrete form, using a binary codification to represent the knots. Contrary, in [17], [18], a real codification is used to improve its search space and convergence performance. On the other hand, in [14], the knots are represented by integer

62

Interior knots

codification. In this method, the number of knots is fixed and only their locations are optimized. Finally, the best model is searched for several numbers of knots. In all GA based approaches, the solution is performed for each iteration as a linear system of equations, thus allowing us to calculate the spline coefficients by the ordinary least squares.

Control genes

III. B- SPLINE DATA F ITTING U SING HGA A. Problem definition Let us to describe the problem of data fitting as follows: given n pairs of measurements (xj , yj ), j = 1, . . . , n, it can be expressed in the following form:

Parametric genes Fig. 1.

yj = f (xj ) + j

where f is the unknown functional relationship to approximate, the term j represents a vector with the zero-mean random errors, and yj is the jth observation at xj . The goal is to find the best estimation of the function f (xj ). In this work, we assume that f is a smooth function, wich can be properly approximated on the interval [a, b] by a B-spline curve. The B-spline curves are constructed from polynomial pieces joined at certain locations, commonly called knots. The B-spline curve is modeled using the following considerations: let τ1 , . . . , τm be a set of m points placed along the domain of the independent variable x, called interior knots, such that a = x1 < τ1 < . . . < τm < xn = b and let t = {t1 , . . . , tm+2k } be the knot vector defined by:  if i ≤ k  a, τi−k , if k < i < k + m ti = (2)  b, if i ≥ k + m

techniques, such as, Bayesian methods or penalized least squares. A more detailed discussion about B-splines can be found in [19]. B. Our Approach We use a HGA to simultaneously determine the number and positions of the knots (model structure) and the Bspline coefficients (model parameters) by minimizing a model selection criterion. In this approach, the main characteristics to consider are: (1) the chromosome encoding of potential solutions, (2) the fitness function, to evaluate the fitness of a chromosome and (3) the genetic operators, to evolve the individuals. 1) Chromosome encoding: We use a fixed length binary coding to represent the number and the locations of the interior knots τ and a real coding to represent the B-spline coefficients α. We represent the chromosome of an individual as:

The ith B-spline basis function Bi,k (x) of order k is denoted by the recurrence relations Bi,k (x)

x − ti Bi,k−1 (x) + ti+k−1 − ti ti+k − x Bi+1,k−1 (x) ti+k − ti+1

=

and

 Bi,1 (x) =

1, if ti ≤ x < ti+1 0, otherwise

θ = {b1 , . . . , bm , r1−k/2 , . . . , r0 , . . . , rm , rm+1 , . . . , rm+k/2 } where each bi is a control bit and ri is a real value (coefficient). Each control bit simultaneously enables or disables one of the interior knots and one of the coefficients. We establish oneto-one correspondences between the interior knots and the coefficients, such that are activated at the same time. Real values define the coefficients of the B-spline. The general structure of a chromosome is graphically shown in Figure 1. This representation scheme does not allow one to duplicate knots, because our particular interest is on smooth functions. However, it can be extended to handle discontinuous functions by introducing an additional type of gene. 2) Fitness function: In order to determine the model structure as well as the model parameters, the following three model selection methods are proposed and compared as fitness function: Akaikes information criterion (AIC). The AIC, originally developed by Akaike [4] is a measure of the fit goodness of an estimation and the number of parameters to be estimated for achieving this particular degree of fit [9]. The AIC is given by: n X AIC(fˆ) = n log {yj − fˆ(xj )}2 + 2p (6)

(3)

(4)

Under these assumptions, the function f can be written as a linear combination: f (x) =

m+k X

αi Bi,k (x)

(5)

i=1

where αi are B-spline coefficients and Bi,k (x) are B-spline basis functions of order k defined over knot vector t, according to equations (2),(3) and (4). If k is specified beforehand, we can define f completely by θ = {τ, α}, where τ = {τ1 , . . . , τm } and α = {α1 , . . . , αm+k }. Now, the problem is to find the number and location of the knots τi , and then to calculate the coefficients αi . In several methods, the coefficients are calculated as an additional step by ordinary least squares or other intensive

978-1-4673-6155-2/13/$31.00 © 2013 IEEE

General structure of a chromosome

(1)

j=1

63

TABLE I PARAMETERS USED FOR THE HGA

where fˆ is an estimation of f (see Equation(1)) denoted by Equation (5). The term p is the number of the model parameters. In AIC, the residual sum of squares in the first term is used as a measure of the deviation of an estimated function fˆ from f , and the second term is a penalty for increasing the number of parameters p. In this work the number of parameters consists of the number of interior knots m and the number of coefficients m + k, thus p = 2m + k. Bayesian information criterion (BIC). Also called the Schwarz information criterion (SIC) or Schwarz-Bayes information criterion proposed by Schwarz [5] is used in Yoshimoto et. al. [18] as model selection criterion for curve fitting, is very similar to AIC, but the BIC penalizes free parameters more strongly than does the AIC. It is defined as: BIC(fˆ) = n log

n X {yi − fˆ(xi )}2 + log(n) p

Parameter Population Size Crossover probability Mutation probability Number of elite individuals

method [15] is used. This method tries to keep the selection pressure relatively constant over all evolution process and it is calculated according to:  Fact − (F¯ − c · σ) if (Fact > F¯ − c · σ) Fnew = 0 otherwise (10) where Fnew is the new scaled fitness value, Fact is the current fitness value, F¯ is the average fitness, σ is the standard deviation of the population and c is a constant that allow it to control the selection pressure. In addition, an elitism strategy is used, this implies to have elite individuals that will be kept in the next population. The uniform crossover operator is used for the binary chromosome and the simulated binary crossover (SBX) [20] operator is used for the real chromosome. These crossover operators are applied with the same crossover probability. In uniform crossover method, two parents are chosen to recombine into a new individual. Each bit of the new individual is selected from one of the parents depending on a fixed probability. On the other hand, in SBX method, two new individuals c1 and c2 are generated from the parents p1 and p2 using a probability distribution.The procedure used in SBX is the following: first, a random number u between 0 and 1 is generated. The probability distribution β is calculated as:  1  (2u) η+1 if u ≤ 0.5 1   η+1 (11) β= 1  otherwise 2(1−u)

(7)

i=1

where the terms are defined in the same way as in previous criterion with the difference that the penalty factor is log(n) instead of 2. Generalized cross-validation (GCV). GCV criterion is applied to choose a best curve estimate in Friedman and Silverman [6] as: Pn 1 2 ˆ i=1 {yi − f (xi )} n ˆ (8) GCV (f ) = 2 {1 − d(m) n } Here d(m) is an increasing function of the number of knots m, suggested by the authors as d(m) = 3m + 1 instead of the conventional GCV choice. To complete this function, also we added the number of coefficients, thus we used GCV with d(m) = 3(2m + k) + 1. Now the question is which method is the perfect choice to use as fitness function, that give it the better performance for our approach. In order to ask the previous question we perform a comparison of three methods mentioned above. As it was mentioned earlier, the fitness function is used to evaluate the individuals, in this case the fitness function is one of model selection criteria (MSC) above described that evaluate an estimated fˆ which is completely defined by an individual θ. Our objective is then to minimize the MSC because a smaller value implies a better model fitted with a minimum of parameters. The fitness function can be expressed as follows: F (θ) = M SC(fˆθ ) (9)

where η is a non-negative real number that controls the distance between parents and the new individuals generated. After obtaining β, new individuals are calculated according to: c1 = 0.5[(1 + β)p1 + (1 − β)p2 ] (12) c2 = 0.5[(1 − β)p1 + (1 + β)p2 ] For the binary chromosome, the bit mutation method is used. In this method a bit is inverted or not depending on a mutation probability. For the real code chromosome each real value γ is changed also depending on the same mutation probability according to:

where fˆθ is a estimated function specified by individual θ and evaluated by the MSC selected. Finally the fitness function values are normalized between values minimum and maximum. 3) Genetic operators and HGA parameters tuning: Our implementation uses three genetic operators: selection, crossover and mutation. The roulette wheel method is used as selection operator. In this method, each individual is assigned to a slice in the roulette wheel. This selection strategy favors best fitted individuals but it gives a chance to the less fitted individuals to survive. To prevent premature convergence, the sigma scaling

978-1-4673-6155-2/13/$31.00 © 2013 IEEE

Value 300 0.9 0.005 3

γi = γi + δ(rand − 0.5)

(13)

where δ is the maximum increment or decrement of the real value and rand is a function which generates a random value between 0 and 1. The HGA parameters were tuning experimentally and they are shown in Table I.

64

TABLE II T EST FUNCTIONS No. 1 2 3 4 5

1.5

Test function f (x) = sin3 (2πx3 ) f (x) = (4x − 2) + 2 exp (−16(4x − 2)2 ) f (x) = sin (x) + 2 exp (−30x2 ) f (x) = sin (2(4x − 2)) + 2 exp (−16(4x − 2)2 ) f (x) = 10(4x − 2)/(1 + 100(4x − 2)2 )

1

0.5

0

−0.5

IV. S IMULATION R ESULTS We carried out a numerical simulations in order to examine the performance of our approach. Also, a comparison of three model selection methods used as fitness function is presented. We consider five smooth functions to form our experimental set in Table II. These test functions have been used for purposes of evaluation and comparison for curve fitting with splines in previous works [11], [13], [3], [14]. A collection of 100 noisy data sets is generated for each function. To generate one noisy data set, a specific function is evaluated at 201 design points xi uniformly distributed in the interval [0, 1]. A zero-mean normal noise with σ known is added to the data set. The signal noise ratio (SNR) used for all data sets was 3, where SNR is defined as SD(f )/σ. For test function 3, design points xi are uniformly distributed in [−2, 2] and then it is scaled to the interval [0, 1]. For each simulated data set, the three fitness functions were applied to estimate the test function. In these experimental tests, the proposed approach was configured as follows: we use a cubic B-spline function, i.e. order k = 4 and τ is a subset of design points. The population is randomly initialized at the beginning, each control gene bi is randomly select from [0, 1] and each real gene ri is a random real number defined over the range [min(yj ), max(yj )] of the dependent variable y. The algorithm was evolved during 900 generations in all cases. In order to compare the results, we use the mean square error (MSE) given by: n 2 1X ˆ {f (xi ) − f (xi )} (14) M SE = n i=1

−1

−1.5 0

Fig. 2.

0.4

0.6

0.8

1

Numerical results on test function No. 1

3

2

1

0

−1

−2

−3 0

0.1

Fig. 3.

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Numerical results on test function No. 2

2.5

2

1.5

1

0.5

0

−0.5

−1

−1.5

where f is the true function and fˆ is the estimated function given by each fitness function applied. For each test function, the median MSE and the average MSE are calculated over the collection of simulated data sets. The results are summarised in Table III. From the estimations shown in Table III we can deduce that the best objective function is that based on HGA+AIC. For this reason, we have only shown numerical results (Figures 2 to 6) based on the HGA+AIC method. Also, in these figures we can observe that the results are qualitatively comparable, with a despictable error in the edges of the functions, where our approach presents certain difficulties on fitting, this is due to the fact that the extreme points are interpolated for the Bspline. Our algorithm was implemented in C++ language. All test were run on a PC using an AMD Athlon X2 processor running

978-1-4673-6155-2/13/$31.00 © 2013 IEEE

0.2

0

0.1

Fig. 4.

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Numerical results on test function No. 3

2.5

2

1.5

1

0.5

0

−0.5

−1

−1.5 0

0.1

Fig. 5.

65

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Numerical results on test function No. 4

M EDIAN MSE Test function 1 2 3 4 5

AND AVERAGE

TABLE III MSE OF TEST FUNTIONS . ( MEDIAN – AVERAGE )

HGA+AIC 0.002018 – 0.002138 0.012370 – 0.013799 0.006900 – 0.007568 0.006987 – 0.007337 0.000430 – 0.000469

HGA+BIC 0.002156 – 0.002362 0.014696 – 0.015693 0.007806 – 0.008439 0.007354 – 0.007871 0.000461 – 0.000549

HGA+GCV 0.002152 – 0.002306 0.012922 – 0.013769 0.007313 – 0.007582 0.007228 – 0.007367 0.000445 – 0.000512

R EFERENCES

0.5 0.4

[1] M. Zamani, “A simple piecewise cubic spline method for approximation of highly nonlinear data,” Natural Science, vol. 4, pp. 79–83, 2012. [2] G. Wahba, Spline Models for Observational Data. SIAM [Society for Industrial and Applied Mathematics], 1990. [3] T. C. M. Lee, “On algorithms for ordinary least squares regression spline fitting: a comparative study,” Journal of Statistical Computation and Simulation, vol. 72, no. 8, pp. 647–663, 2002. [4] H. Akaike, “A new look at the statistical model identification,” IEEE Transactions on Automatic Control, vol. 19, no. 6, pp. 716–723, 1974. [5] G. Schwarz, “Estimating the dimension of a model,” Annals of Statistics, vol. 6, no. 2, pp. 461–464, 1978. [6] J. H. Friedman and B. W. Silverman, “Flexible parsimonious smoothing and additive modeling,” Technometrics, vol. 31, pp. 3–39, 1989. [7] K.-F. Man, K.-S. Tang, and S. Kwong, Genetic Algorithms: Concepts and Designs with Disk. Secaucus, NJ, USA: Springer-Verlag New York, Inc., 1999. [8] J. H. Friedman, “Multivariate adaptive regression splines (with discussion),” Annals of Statistics, vol. 19, no. 1, pp. 1–141, 1991. [9] J. Y. Koo, “Spline estimation of discontinuous regression functions,” Journal of Computational and Graphical Statistics, vol. 6, pp. 266–284, 1997. [10] T. C. M. Lee, “Regression spline smoothing using the minimum description length principle,” Statistics & Probability Letters, vol. 48, no. 1, pp. 71 – 82, 2000. [11] I. Dimatteo, C. R. Genovese, and R. E. Kass, “Bayesian curve-fitting with free-knot splines,” Biometrika, vol. 88, no. 4, pp. 1055–1071, 2001. [12] X. Zhao, C. Zhang, B. Yang, and P. Li, “Adaptive knot placement using a gmm-based continuous optimization algorithm in b-spline curve approximation,” Computer-Aided Design, vol. 43, no. 6, pp. 598–604, Jun. 2011. [13] D. G. T. Denison, B. K. Mallick, and A. F. M. Smith, “Automatic bayesian curve fitting,” Journal of the Royal Statistical Society: Series B: Statistical Methodology, vol. 60, pp. 333–350, 1998. [14] J. Pittman, “Adaptive splines and genetic algorithms,” Journal of Computational and Graphical Statistics, vol. 11, no. 3, pp. 615–638, SEP 2002. [15] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley Professional, January 1989. [16] F. Yoshimoto, M. Moriyama, and T. Harada, “Automatic knot placement by a genetic algorithm for data fitting with a spline,” in Proceedings of the International Conference on Shape Modeling and Applications. IEEE Computer Society Press, 1999, pp. 162–169. [17] L. Pingping, Z. Xiuyang, and Y. Bo, “Automatic knot adjustment by an improved genetic algorithm,” in International Conference on Future Computer and Communication (ICFCC), vol. 3, May 2010, pp. 763– 768. [18] F. Yoshimoto, T. Harada, and Y. Yoshimoto, “Data fitting with a spline using a real-coded genetic algorithm,” Computer-Aided Design, vol. 35, no. 8, pp. 751–760, July 2003. [19] C. De Boor, A Practical Guide to Splines. New York: Springer-Verlag, 1978. [20] K. Deb and R. B. Agrawal, “Simulated binary crossover for continuous search space,” Complex Systems, vol. 9, pp. 115–148, 1995.

0.3 0.2 0.1 0 −0.1 −0.2 −0.3 −0.4 −0.5

0

0.1

Fig. 6.

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Numerical results on test function No. 5

at 1.9 GHz with 2 MB of RAM. The average computation time of our approach for all tests was 19 seconds. V. C ONCLUSION In this paper, we have proposed an efficient hierarchical genetic algorithm to tackle the B-spline curve fitting problem. The method introduces a novel hierarchical gene structure for the chromosomal representation that permits the optimization of all free degrees of the B-spline. To be more specific, our algorithm finds the best model with the fewest knots, optimal knot locations and coefficients of the B-spline curve simultaneously. The method does not require subjective parameters like smooth factor or knot locations to perform the solution. This work was focused on smooth functions but can be extended to approximate non-smooth functions. The algorithm was tested on several benchmark functions and a comparison on three model selection methods was improved. Simulation results show that the best improvement was reached by the HGA+AIC even when in the edges of the functions presents certain difficulties on fit. Given the performance characteristics of the proposed approach, our future work will be to apply this method for fit real experimental data. We are interesting on extending our approach to surface fitting, to experiment with variable length chromosome and other basis functions. ACKNOWLEDGMENT We acknowledge the support of the Consejo Nacional de Ciencia y Tecnolog´ıa de M´exico, Consejo de Ciencia y Tecnolog´ıa del Estado de Guanajuato and Instituto Tecnol´ogico Superior de Irapuato.

978-1-4673-6155-2/13/$31.00 © 2013 IEEE

66