An elliptical basis function network for classification of remote-sensing ...

An Elliptical Basis Function Network for Classification of Remote-Sensing Images Jian-Cheng Luo, Qiu-Xiao Chen, Jiang Zheng, Yee Leung LREIS, Institute of Geographical Science and Natural Resources Research, Chinese Academy of Sciences, Beijing, China, 100101 E-mail: [email protected]

Yee Leung Department of Geography and Resource Management, The Chinese University of Hong Kong, Hong Kong

Jiang-Hong Ma Faculty of Sciences, Xi’an Jiaotong University, Xi’an, China, 710049

Abstract-An elliptical basis function (EBF) network is proposed in this study for the classification of remotely sensed images. Though similar in structure, the EBF network differs from the well-known radial basis function (RBF) network by incorporating full covariance matrices and uses the expectation-maximization (EM) algorithm to estimate the basis functions. Since remotely sensed data often take on mixture-density distributions in the feature space, the proposed network not only possesses the advantage of the RBF mechanism but also utilizes the EM algorithm to compute the maximum likelihood estimates of the mean vectors and covariance matrices of a Gaussian mixture distribution in the training phase. Experimental results show that the EM-based EBF network is faster in training, more accurate, and simpler in structure. Keywords-artificial neural networks, classification, elliptical basis functions, EM algorithm, mixture densities, radial basis functions, remotely sensed image

҇. INTRODUCTION Thematic information extraction and classification of remotely sensed images is an important task in the quantitative analysis of remote-sensing data. Over the years, three major paradigms have been developed for remote-sensing classification. They are statistical classifiers based on decision functions of statistical density distribution models, nonlinear-mapping neural network classifiers, and symbolic knowledge-based reasoning classifiers. (Richards and Jia, 1998; Mather, 1999; Bruzzone et al., 1999a). Remotely sensed data, being a synthetic reflection of the global surface, has a very complicated and uncertain characteristic which behaves like mixture-density distributions in the feature space. Thus, large bias may result if the parameters of the mixture densities are estimated by an inadequate approach. Artificial neural networks (ANN), on the other hand, have been successfully applied in information extraction and classification of remotely sensed data (Pao, 1989; Bischof et al., 1992; Kulkarni, 1994; Atkinson and Tatnall, 1997). Among various kinds of ANN architectures, radial basis function (RBF) networks (Powell, 1987; Moody and Darken, 1989; Fu, 0-7803-7929-2/03/$17.00 (C) 2003 IEEE

1994; Lippmann, 1994; Girosi, 1994; Scholkopf et al., 1997; Haykin, 1999) appear to be a more affective model for nonlinear function approximation and data classification in general, and remote-sensing classification in particular. Theoretically, while keeping similarly complicated mapping ability, RBF networks can overcome some of the limitations of BP-MLP (the multilayer perception with back-propagation algorithm) with rapid training, avoidance of chaotic behavior, and simpler architecture. Such characteristics and the intrinsic simplicities render RBF networks an interesting alternative to classification based on simple MLP neural models (Chen and Chen, 1995; Bishop, 1995; Bruzzone and Prieto, 1999b; Sundararajan et al., 1999), with good performance in remote-sensing classification (Rollet et al., 1998; Bruzzone and Prieto, 1999b). It is well known that the classification error made by the RBF networks depends strongly on the selection of the centers and widths of the kernel functions constituting the hidden layer (Bruzzone and Prieto, 1999b; Gomm and Yu, 2000). Conventional clustering algorithms, such as K-means and K-nearest neighbors, are ordinarily employed to select the centers of the kernel functions. Such techniques, however, fall short in the support of statistical properties. More importantly, since the response of each basis function is radically symmetric about the center calculated by the spherical distance in the RBF network, the presence of mixture kernel functions or overlapping of kernel functions associated with different classes will inevitably appear in real-life data. A large number of centers are thus needed in order to represent the mixture density distributions resulting in high computation cost. In recent years, the application of the EM algorithm in the estimation of parameters of probability density functions has received a great deal of attention. The EM algorithm is an iterative method and can be used to numerically approximate the maximum likelihood (ML) estimates of the parameters in a mixture model (Dempster et al., 1977; McLachlan and Basford, 1988; McLachlan and Krishnan, 1997). The EM algorithm is able to compute the maximum likelihood estimates of the

3489 0-7803-7930-6/$17.00 (C) 2003 IEEE

mean vectors and covariance matrices of a Gaussian mixture distribution. It has been applied in multiple spatial feature extraction, multiple data fusion, and spatial data mining with unsupervised approach (Bruzzone and Prieto, 2000, 2001; Tadjudin and Landgrebe, 2000). Usually, remote-sensing data may be described by a mixture with normally distributed components. Therefore, the EM algorithm can still be used for the statistical analysis of remote-sensing data. Moreover, the EM algorithm estimates the covariance matrices of clusters based on the statistical properties of the data rather than using heuristic methods such as the K-nearest neighbors algorithm. This leads to a better approximation to data distribution. To take advantages of the efficient structure of the RBF networks and the effective iterative mechanism of the EM algorithms, we propose in this paper a new model, called EBF networks for the classification of remotely sensed images. It is, in a way, similar to the idea of the probabilistic neural networks (Specht, 1990; Serpico et al., 1996, Mao et al., 2000). The proposed EBF networks can be viewed as the non-normalized version of the alternative model of mixture experts (Xu, 1998). Briefly, the EBF network incorporates full covariance matrices into the RBF network and employs the EM algorithm to estimate the parameters of the basis functions. When full covariance matrices are incorporated into the kernel units, complex and mixture distributions of the feature space can be represented by the hidden centers without the need of using a large number of basis functions. As a result, the EBF units are hyper-ellipsoidal, and can enhance the approximation capability of conventional RBF networks. Thus, the EBF networks can be considered as an extension of the RBF networks. Parameters of the EBF networks are estimated by the EM algorithm, followed by a least square minimization to determine the output weights. In section II, the universal EM algorithm is proposed for the estimation of parameters of the mixture distributions. In section III, EBF network, which is extended on the RBF networks, is first described. The EM algorithm is then employed to estimate the Gaussian parameters of the EBF network. The framework of the EM-based EBF network for the classification of remotely sensed images is subsequently outlined. In section IV, the EM-based EBF network is applied to a real-life land-cover classification and its performance is evaluated and compared with common conventional classifiers. The paper is then concluded with a summary and outlook for further research. II. ESTIMATES OF PARAMETERS OF MIXTURE DENSITY DISTRIBUTIONS IN REMOTELY SENSED DATA USING EM

Similar to pattern recognition in general, one of the key problems of information extraction and classification in remote sensing is the detection and description of serial procedures, phenomena and objects hidden in data sets. Such information is often regulated by finite mixture feature sets with different distribution properties. However, due to the complexity of the mixture density functions, it is very difficult to straightforwardly compute the MLE (maximum likelihood estimates) by conventional

0-7803-7929-2/03/$17.00 (C) 2003 IEEE

methods. To estimate the parameters in a mixture model, a typical approach is to apply the EM algorithm to numerically approximate the MLEs of the parameters in an iterative manner. By the EM algorithm, it is firstly assumed that the whole mixture distribution consists of a finite number of simple parametric density models. Then, by iterative correction, the maximum of the likelihood function under the mixture density are approximated step by step. It has been demonstrated that from using EM algorithm under the mixture model, better parameter estimates can be obtained by exploiting a large number of unlabeled samples in addition to training samples (Tadjudin and Landgrebe, 2000). In essence, the EM algorithm firstly estimates the posterior probabilities of each sample belonging to each of the component distributions, and then computes the parameter estimates using these posterior probabilities as weights. Using an iterative approach, the EM algorithm obtains the MLEs with an initial estimate θ ( 0) and repeats the following two steps at each iteration (time t = 0, 1, 2, …): (a) E-Step: Based on observed samples and current solutions θ compute the expectation value n

[

]

Q (θ | θ ( t ) ) = ¦ EY log f ( x i , Y ;θ ) x i , ;θ ( t ) , i =1

(t )

,

(1)

f ( x, y ;θ ) is the PDF of the data ( x, y ) from the mixture model, and EY represents taking the expectation with respect to the random variable Y . where

(b) M-Step:

θ (t + 1) = arg max Q (θ | θ (t ) ) . (2) θ Theoretically, beginning with proper initial estimates, the

EM algorithm can always estimate the final solution of θˆMLE . In the iterative procedure of the EM algorithm, the estimated parameters move closer to the MLEs after each iteration. Thus, it is an effective method for estimating the parameters of finite mixture density distributions. However, to be useful and practical, it needs more bridges to relate it with supervised labeling categories for the unknown patterns. In particular, after the overall distributions of the feature space are parametrically represented by finite mixture with ellipsoidal spreads, the spread of each component should correspond to a definite category. Such a task can be accomplished when the EM algorithm is utilized to estimate the parameters of the basis functions in the hidden layer of a EBF network. III. EBF NETWORKS AND ITS CLASSIFICATION MODEL A.

EBF Network: an Elliptical Extension of the RBF Network Generally, it would be more reasonable and beneficial if full covariance matrices could be incorporated into the basis functions of a RBF network so that complex distributions of mixture densities could be represented without the need of having to use a large number of basis functions. Under such circumstances, the range of spread of a hidden unit is hyper-ellipsoidal, and the RBF network is extended into

3490

0-7803-7930-6/$17.00 (C) 2003 IEEE

elliptical basis function (EBF) network, with the basis function taking on the following form: &

­ °

½ & & & ° − µ )T Σ − 1 ( x − µ )¾ , p j j p j °¯ 2γ j °¿ j = 1,..., M (3)

φ j ( x p ) = exp ®−

1

& (x

& & where x p is the pth input vector, µ j and Σj are the mean

vector and covariance matrix of the jth basis function respectively, γj is a smoothing parameter controlling the spread of the jth basis function, and it can be determined heuristically by: 3 5

where

&

µk

−d 1 − −1 ­° 1 2 f ( x ;θ ) = (2π ) | Σ | 2 exp ®− ( x − µ ) T Σ (x − µ i i i i i i °¯ 2 i = 1,  , g ,

&

&

γj= ¦ µ −µj , 5 k =1 k

(4)

denotes the kth nearest neighbor of

&

µj

where

where Tj is the desired activation of output j, and Oj is the actual activation at output unit j. B.

EM algorithm for the EBF network Although it has been shown that EBF networks trained by some straightforward methods such as K-means and K-nearest neighbors algorithms may perform better than RBF networks, it is also possible that they may cause undesirable results if estimate of the mean vector differs significantly from the true mean. Consequently, the covariance matrices will no longer be an accurate estimate of the true covariance matrices. Due to the complexity and randomness of distributions of spatial data, there also exist other limitations in conventional clustering algorithms. For example, typical problems are: How to determine the initial conditions? How to determine the optimal number of clusters? How to eliminate the passive effect coming from noisy data? How to integrate domain specific knowledge? The EM algorithm is, on the other hand, an alternative approach to solve these problems. In what follows, the EM algorithm for estimating the parameters of the EBF network is proposed. For the Gaussian density distribution:

0-7803-7929-2/03/$17.00 (C) 2003 IEEE

µi

and covariance matrices

component of the mixture via the EM algorithm can be simplified as follows (G. J. Mclachlan, 1988): (a) E-Step: π i( t ) f i ( x j ;θ i( t ) ) τ ij( t +1) = τ i ( x j ;θ i(t ) ) = g , ¦ π k(t ) f k ( x j ;θ k( t ) ) k =1

That is: τ ij( t +1)

π i( t ) Σ (i t )

−1

=

¦k π k( t ) Σ (kt )

where,

τ ij(t )

−1 ­ 1 ½ exp ®− ( x j − µ i( t ) ) T Σ i( t ) ( x j − µ i( t ) ) ¾ ¯ 2 ¿ (8) −1 − 1 1 ­ 2 (t ) T (t ) (t ) ½ exp ®− ( x j − µ k ) Σ k ( x j − µ k ) ¾ ¯ 2 ¿,

2

is the posterior probability of the jth data point

x j belonging to the ith component at the tth step. (b) M-Step: π i( t +1) =

where wij(t) is the weight from unit i in the hidden layer to unit j in the output layer at time t (or the tth iteration) and ∆wij is the weight adjustment at the current step. The weight adjustment may be computed by the delta rule: ∆wij = ηδ j o j , where η is a trial-independent learning rate (0. Initialization. Determine the initial cluster number g, the mean vector µ i and covariance matrix Σ i of class i. Generally, g can be a large number selected according to practical situations, and g vectors are randomly chosen as the initial value of the mean vector µ i within the sampled data sets. The initial covariance matrix Σ i can subsequently be determined. <Step 3>. Estimation of the Maximum Likelihood (ML) parameters. Based on the EM algorithm, the parameters of each cluster center are estimated till the overall movement of the parameter tends to a stable condition. <Step 4>. Determination of the optimal number of cluster centers of the EBF network. While the iterative process arrives at a stable condition, the cluster number can be added or deleted according to two indices: the proportional value π i

3491

0-7803-7930-6/$17.00 (C) 2003 IEEE

(

)

Samples Data S

EM Algorithm

A

Remote Sensing

Output

C B Figure 1. Architecture of the EM-based EBF Classification Network

and the number of sampling data contained within the elliptical range of class i. At first, the number of data contained within a current class is calculated on the basis of the ML probabilities. If the proportional value π i of the calss with minimal containing number is less than a given threshhold, then the center is deleted from the current set of centers, and the procedure returns to step 3 to adjust the parameters of the remaining center. If it is above the threshold, the center is kept and continues to carry on step 3 till attaining a longer term of stable condition. To recapitulate based on the mixture density model, the EBF network can determine the degree of closeness between the input vector and the hidden center with the elliptical response from the corresponding basis function. From this perspective, the EBF network is more suitable than the conventional RBF network for the classification of remotely sensed data.

C. EBF Network with embedded EM algorithm for image classification Based on the above analysis, information extraction and classification of remotely sensed images can be effectively carried out by an EBF network with EM algorithm. The basic mechanism of the EBF classification network can be divided into two parts. First, using the EM algorithm, the mixture density distributions of remotely sensed data in sparse feature space can be decomposed into hidden centers represented with elliptical spreads of probability distribution functions. Second, using the linear perceptron, the approximating relationship between cluster centers in the hidden layer and categorical classes in the output layer can be established. Basically, the proposed EBF network is a better model to solve classification problems especially when data are distributed with complicated mixture density. On the implementation level, the EM-based EBF network consists of three major components (Figure 1): the EM algorithm module (A), the EBF network training module (B), and the classification module (C). (1) In module A, the supervised sample data sets are first selected from remotely sensed image. Then, the status of hidden units in the kernel layer are determined by the EM algorithm. (2) In module B, the connective weights between units in

0-7803-7929-2/03/$17.00 (C) 2003 IEEE

Figure 2. Original SPOT image covering the study area (Obtained on 2/3/1999)

the hidden layer and output layer are adjusted by the iterative delta rule. (3) Unknown vector X read from the image are fed into the EM-based EBF network pixel by pixel to obtain the classification results. IV. AN APPLICATION

A.

The Study Area and experimental data To evaluate the performance of the proposed EBF network, a real-life application in the classification of land covers with remotely sensed data is made in this study. Experiments were conducted using SPOT-HRV data with all three spectral bands (CH1: Green Band, 0.50-0.59µm; CH2: Red Band 0.61-0.68µm; CH3: Near Infrared Band, 0.79-0.89µm). The remotely sensed image was acquired on Feb 3, 1999 covering the Hong Kong island (Figure 2). The size of the sub-image cut down from the whole image is 600 rows by 800 columns with a spatial resolution of 20m by 20m, covering about 192km2 area. Land covers in this area are very complicated. Due to its mountainous topography, the majority of the urban areas have to be built along the harbor on reclaimed land. The major part of the island is rugged hill, distributed with different types of vegetations. According to knowledge acquired from supporting materials and surveys, and with visual interpretation of the corresponding remote-sensing data, nine main types of land covers are identified as the reference for classification. They are: C1 -- Sea Water; C2 -- Inland Water; C3 -- Urban Area; C4 -- Concrete Land; C5 -- Baren Land; C6 -- Beach; C7 -- Grass Land; C8 -- Hilly Woodland; C9 -- Grass Land after fire. For some spectral pixels, land-covers such as water body (C1, C2), built-up area (C3, C4), vegetated area (C7,C8,C9), water and shadow built-up area (C1, C2, C3, C4) cannot be easily separated apart in the visual map because of their closeness in spectral characteristics due to mixture density distributions in the feature space. Typical sample data sets of each land cover are respectively selected for supervised classification. A total of 3500

3492

0-7803-7930-6/$17.00 (C) 2003 IEEE

Accuracy (%)

80

RBF EBF

75 70 65 60 20

C1

C2

C6

C4

C5

C8

C9

Figure3. Land covers obtained by the EBF network

supervised samples were selected for the training phase with 2600 training sample data and 900 test sample data.

B.

Experimental Results and Comparison With the 2600 training sample data, mixture densities of the three-dimensional feature space are decomposed into 62 clusters by the EM algorithm, and the ML parameters of each cluster are estimated at the same time. Therefore, the size of the hidden layer of the EBF network is of 62 nodes. In the linear training phase from the hidden layer to the output layer, the training rate η is kept to 0.02, small enough to avoid vibration. The test error matrix (Table 3) is obtained by running the test data sets in the trained EBF network. The time for the training phase of the EBF network is about 120 seconds, and the overall test accuracy is 76.00%. As a comparison, two common classification models are also trained and tested with the same sample data sets. They are the maximum likelihood classifier (MLC) (elliptical, but single density model) and the conventional RBF Network (mixture density model, but hyper-spherical). The c-means clustering algorithm is used to determine the size of the hidden layer of the RBF network resulting in 64 nodes (keeping a scale similar to the EBF network). The test error matrices obtained from the MLC and RBF model are respectively listed in Table 1 and Table 2. The overall test accuracy of the MLC is 69.11%, and that of the RBF network is 70.33%. The training phase of the RBF network is about 50 seconds. Comparing the three classifiers, we can reach the following conclusions: (1) The EBF network keeps the major advantages of the RBF network. Though data are freely distributed, the EBF network is more capable of separating the categories of mixture distribution in the feature space, like urban area C3 and inland water C2 in this experiment, than the conventional parametric statistical classifiers. (2) By keeping the size of the hidden layer similar, the EBF network can, however, attain higher level of accuracy in classification than the RBF network. So, the EBF network yields the most accurate classification

0-7803-7929-2/03/$17.00 (C) 2003 IEEE

60 80 Hidden Nodes

100

120

Figure 4. Comparison of average accuracy between the EBF and the RBF networks (The Curve represents the relationship between the number of hidden nodes and overall accuracy.)

C3

C7

40

result (Figure 3) in both the training and testing phases. According to the degree of mixture, to select a reasonable size for the hidden layer of the EBF network is a key problem to the success of classification. To test its relationship with the overall accuracy, different sizes of the hidden layer, including 20, 30, 40, 50, 60, 80, and 100 centers, are selected for evaluation. In other words, the feature space is partitioned into different hyper-ellipsoidal areas in different scales. Table 4 shows an increasing trend of overall classification accuracy with increasing size of the hidden layer. It should be noted that the computation time also increases. The increase in accuracy, however, levels off after a certain size. Since full covariance matrices are incorporated into the basis functions of the EBF network, more complex mixture density distributions of the feature space can be represented by the cluster centers without having to use a large number of basis functions. As depicted in Figure 4, the EBF network outperforms the RBF network under similar situations. V.

CONCLUSION

Extending on the structure of the RBF network, an elliptical basis function (EBF) network with full covariance matrices incorporated into the radial basis functions (RBF) whose parameters are estimated by the EM algorithm has been proposed for the classification of remotely sensed images. In addition to its advantages on the theoretical level, the EBF network has been demonstrated to be more accurate in real-life classification. Compared with the conventional MLC and RBF classifiers, the EBF network is simpler in structure, more accurate and more interpretable. To make the proposed EBF network more effective, further research should be carried out to take advantage of the EM algorithm by integrating prior knowledge via Bayes theory. In order to improve the accuracy and reliability of the EBF network, robustness can be naturally integrated into the mixture density model for the EM algorithm. To better approximate reality, complexity of distributions in the feature space should be further analyzed in future research. REFERENCES [1]. [2].

Atkinson, P.M., and A.R.L. Tatnall, “Neural networks in remote sensing,” Int. J. of Remote Sensing, 18(4): 699-709, 1997. Bischof, H., W. Schneider, and A.J. Pinz, “Multi-spectral classification

3493

0-7803-7930-6/$17.00 (C) 2003 IEEE

[3]. [4].

[5].

[6]. [7].

[8]. [9].

[10].

[11]. [12]. [13].

[14].

[15]. [16]. [17]. [18]. [19]. [20].

[21]. [22]. [23].

[24].

[25]. [26].

of Landsat images using neural network,” IEEE Transactions on Geoscience and Remote Sensing, 30:482-490, 1992. Bishop, C.M., “Radial basis functions,” in Neural Networks for Pattern Recognition. Oxford, New York: Clarendon Press, 1995, pp.. Bruzzone, L., D. Fernandez, and S.B. Serpico, “A neural-statistical approach to multi-temporal and multi-source remote-sensing image classification”, IEEE Transactions on Geoscience and Remote Sensing, 37(3):1350-1359, 1999a. Bruzzone, L., and D.F. Prieto, 1999b, “A technique for the selection of kernel-function parameters in RBF neural networks for classification of remote-sensing images,” IEEE Transactions on Geoscience and Remote Sensing, 37(2):551-559. Bruzzone, L., and D.F. Prieto, “Automatic analysis of the difference image for unsupervised change detection”, IEEE Transactions on Geoscience and Remote Sensing, 38(3):1171-1182, 2000. Chen, T., and H. Chen, “Approximation capability to functions of several variables, nonlinear functions, and operators by radial basis function neural networks,” IEEE Transactions on Neural Networks, 6:904-910, 1995. Dempster, A.P., N.M. Laird, and D.B. Rubin, “Maximum likelihood estimation from incomplete data via EM algorithm”, J. R. Statist. Soc., B39:1-38, 1977. Fu, L., 1994, Neural Networks in Computer Intelligence, McGRAW-HILL International Editions. Girosi, F., “Regulation Theory, Radial Basis Functions, and Networks, From Statistics to Neural Networks --- Theory and Pattern Recognition Applications” (V. Cherkassky and J.H. Friedman, editors), Springer-Verlag, Germany, 1994, pp. 166-187. Gomm, J.B., and D. Yu, “Selecting Radial Basis Function Network Centers with Recursive Orthogonal Least Squares Training,” IEEE Transactions on Neural networks, 11(2):306-314, 2000. Haykin, S., 1999, Neural Networks: A Comprehensive Foundation, Prentice-Hall. Kulkarni, A.D., 1994, Artificial Neural Networks for Image Understanding, Van Nostrand Reinhold, New York. Lippmann, R.P., Neural networks, “Bayesion a posteriori probabilities, and pattern classification, From Statistics to Neural Networks --Theory and Pattern Recognition Applications” (V. Cherkassky and J.H. Friedman, editors). Germany: Springer-Verlag, 1994, pp. 83-104. Mao, K.Z., K.C. Tan, and W. Ser, “Probabilistic neural-network structure determination for pattern classification,” IEEE Transactions on Neural Networks, 11(4):1009-1016, 2000. Mather, P.M., “Land cover classification revisited, Advances in Remote Sensing and GIS Analysis” (P.M. Atkinson and N.J. Tate, editors), Wiley, pp. 7-16. McLachlan, G.J., and K.E. Basford, 1988, Mixture Models: Inference and Applications to Clustering, Marcel Dekker, New York. McLachlan, G.J., and T. Krishnan, 1997, The EM Algorithm and Extensions, John Wiley. Moody, J., and C.J. Darken, 1989, Fast learning in network of locally-turned processing units, Neural Computation, 1:281-294. Powell, M.J.D., 1987, Radial basis functions for multivariable interpolation: a review, Algorithms for Approximation of Functions and Data (J.C. Mason and M.G. Cox, editors), Oxford University Press, Oxford, pp. 143-167. Richards, J.A., and X. Jia, 1998, Remote Sensing Digital Image Analysis: An Introduction, Springer-Verlag, Cambridge, UK. Rollet, R., G.B. Benie, W. Li, and S. Wang, 1998, Image classification algorithm based on the RBF neural network and K-means, International Journal of Remote Sensing, 19(15):3003-3009. Scholkopf, B., K. Sung, C.J.C. Burges, F. Girosi, P. Niyogi, T. Poggio, and V. Vapnik, 1997, Comparing support vector machines with Gaussian kernels to radial basis function classifiers, IEEE Transactions on Signal Processing, 45(11):2758-2765. Serpico, S.B., L. Bruzzone, F. Roli, 1996, An experimental comparison of neural and statistical non-parametric algorithms for supervised classification of remote-sensing images, Pattern Recognition Letters, 17:1331-1341. Specht, D.F., 1990, Probabilistic neural networks, Neural Networks, 3(1):109-118. Sundararajan, N., P. Saratchandran, and Y. Lu, 1999, Radial Basis Function Neural Networks with Sequential Learning, World Scientific.

0-7803-7929-2/03/$17.00 (C) 2003 IEEE

[27]. Tadjudin, S., and D.A. Landgrebe, 2000, Robust parameter estimation for mixture model, IEEE Transactions on Geoscience and Remote Sensing, 38(1): 439-445. [28]. Xu, L., 1998, RBF nets, mixture experts, and Baysian Ying-Yang learning, Neuro-computing, 19:223-257. Table 1. Error Matrix of classification by the MLC C1 C2 C3 C4 C5 C6 C7 C8 C1 98 25 0 0 0 0 0 0 C2 1 23 5 0 0 0 0 0 C3 0 46 75 2 0 2 0 1 C4 0 0 7 64 4 6 0 0 C5 0 0 0 19 89 60 0 0 C6 1 5 1 5 7 24 1 0 C7 0 0 0 6 0 7 84 2 C8 0 0 2 1 0 0 10 96 C9 0 1 6 3 0 1 5 1 Total 100 100 100 100 100 100 100 100 (Accuracy = 69.11%, Kappa = 0.653)

C9 0 1 6 13 0 0 0 11 69 100

Total 127 30 132 94 168 44 99 120 86 900

Table 2. Error Matrix of classification by the RBF Network C1 C2 C3 C4 C5 C6 C7 C8 C9 C1 91 22 1 0 0 0 0 0 0 C2 6 55 9 1 0 1 0 0 0 C3 2 22 76 4 0 2 0 1 7 C4 0 0 6 67 4 5 0 1 14 C5 0 0 0 9 18 3 0 0 0 C6 1 0 0 7 78 82 2 0 0 C7 0 0 0 4 0 5 81 0 0 C8 0 0 2 3 0 2 12 94 9 C9 0 1 6 5 0 0 5 4 70 Total 100 100 100 100 100 100 100 100 100 (Training Time =50Sec, Accuracy = 70.33%, Kappa = 0.666)

Total 114 72 114 97 30 170 90 122 91 900

Table 3. Error Matrix of classification by the EBF network C1 C2 C3 C4 C5 C6 C7 C8 C9 C1 97 18 2 0 0 0 0 0 0 C2 2 62 7 0 0 1 0 0 0 C3 1 20 76 4 0 0 3 1 6 C4 0 0 10 71 6 7 0 0 12 C5 0 0 0 7 49 8 0 0 0 C6 0 0 0 7 44 71 2 0 0 C7 0 0 0 4 1 9 88 1 0 C8 0 0 1 2 0 3 5 91 1 C9 0 0 4 5 0 1 2 7 81 Total 100 100 100 100 100 100 100 100 100 (Time =120Sec, Accuracy = 76.11%, Kappa = 0.731)

Total 117 72 111 106 64 124 103 103 100 900

Table 4. Relationship between accuracy and size of the hidden layer

20 Training Time (seconds)

A C C U R A C Y %

C1 C2 C3 C4 C5 C6 C7 C8 C9 Average

Size of the hidden layer in the EBF Network 30 40 50 60 80 40

70

100

120

170

250

96.0 39.0 78.0 71.0 35.0 71.0 90.0 89.0 74.0 71.3

97.0 40.0 78.0 74.0 39.0 71.0 90.0 91.0 78.0 73.0

96.0 52.0 82.0 72.0 37.0 70.0 88.0 91.0 80.0 74.1

97.0 64.0 76.0 72.0 34.0 78.0 89.0 89.0 80.0 75.3

97.0 62.0 76.0 71.0 49.0 71.0 88.0 91.0 81.0 76.1

97.0 62.0 80.0 77.0 56.0 69.0 86.0 92.0 78.0 77.2

98.0 63.0 79.0 77.0 56.0 69.0 86.0 92.0 78.0 77.3

3494

0-7803-7930-6/$17.00

(C)

100

20

2003

IEEE