Variable selection using neural-network models - Semantic Scholar

Report 9 Downloads 95 Views
Neurocomputing 31 (2000) 1}13

Variable selection using neural-network models Giovanna Castellano*, Anna Maria Fanelli Dipartimento di Informatica, Universita% di Bari, Via E. Orabona, 4-70126 Bari, Italy Received 3 April 1998; accepted 22 March 1999

Abstract In this paper we propose an approach to variable selection that uses a neural-network model as the tool to determine which variables are to be discarded. The method performs a backward selection by successively removing input nodes in a network trained with the complete set of variables as inputs. Input nodes are removed, along with their connections, and remaining weights are adjusted in such a way that the overall input}output behavior learnt by the network is kept approximately unchanged. A simple criterion to select input nodes to be removed is developed. The proposed method is tested on a famous example of system identi"cation. Experimental results show that the removal of input nodes from the neural network model improves its generalization ability. In addition, the method compares favorably with respect to other feature reduction methods. ( 2000 Elsevier Science B.V. All rights reserved. Keywords: Variable selection; Neural network pruning; Least-squares methods; Principal component analysis

1. Introduction A crucial issue in many problems of pattern recognition or system identi"cation is reducing the amount of data to process, which is often a key factor in determining the performance of the information processing system. The problem of data reduction, also termed `feature reductiona, is de"ned as follows: given a set of available features, select a subset of features that retain most of the intrinsic information content of data. There are two di!erent approaches to achieve feature reduction: feature extraction and feature selection. Feature extraction, linearly or non linearly, transforms the original

* Corresponding author. E-mail address: [email protected] (G. Castellano) 0925-2312/00/$ - see front matter ( 2000 Elsevier Science B.V. All rights reserved. PII: S 0 9 2 5 - 2 3 1 2 ( 9 9 ) 0 0 1 4 6 - 0

2

G. Castellano, A.M. Fanelli / Neurocomputing 31 (2000) 1}13

set of features into a reduced one. Well-known feature extraction methods are Principal Component Analysis (PCA) and Discriminant Analysis [9]. On the other hand, feature selection, also referred to as variable selection, selects a subset of features from the initial set of available features. A number of di!erent methods have been proposed to approach the optimal solution to feature selection [10]. Signi"cant contributions have come from statisticians in the "eld of Pattern Recognition, ranging from techniques that "nd the optimal feature set (e.g. Exhaustive search or Branch and Bound algorithm [20]) and those that result in a sub-optimal feature set that is near to the optimal solution [13,22]. More recently, some variable selection methods for arti"cial neural networks have been developed [1,16,4]. However, no optimal and generally applicable solution to the feature selection problem exists: some methods are more suitable under certain conditions and some under others, depending on the degree of knowledge about the problem at hand. When the only source of available information is provided by training data, the feature selection task can be well performed using a neural approach. In fact neural networks do not make any assumption about probability distribution functions of data, thus relieving the restricted formal conditions of the statistical approach. This paper is concerned with the problem of variable or feature selection using arti"cial neural networks. In this context, variable selection can be seen as a special case of network pruning. The pruning of input nodes is equivalent to removing the corresponding features from the original feature set. Several pruning procedures for neural networks have been proposed [23], but most of them focus on removing hidden nodes or connections, and they are not directly applicable to prune irrelevant input nodes. Pruning procedures extended to the removal of input nodes were proposed in [8,12,14,17,18,24], where the variable selection process is typically based on a measure of the relevance of an input node, so that the less relevant features are removed. However, most of these techniques evaluate the relevance of input nodes during the training process, thus they strictly depend on the adopted learning algorithm. We propose a variable selection method based on an algorithm, that we developed for pruning hidden nodes in neural networks [5}7]. The method performs a backward feature selection by successively removing input nodes (along with their connections) in a satisfactorily trained network and adjusting the remaining weights in such a way that the overall input}output behavior learnt by the network is kept approximately unchanged. This condition leads to the formulation of a linear system that is solved in the least-squares sense by means of a very e$cient preconditioned conjugate gradient procedure. The criterion for choosing the features to be removed is derived from a property of the particular least-squares method employed. This procedure is repeated until the desired trade-o! between accuracy and parsimony of the network is achieved. Unlike most variable selection methods, that remove all useless features in one step, our algorithm removes features iteratively, thus enabling a systematic evaluation of the reduced network models produced during the progressive elimination of features. Therefore, the number of input nodes (i.e. the "nal number of features) is determined

G. Castellano, A.M. Fanelli / Neurocomputing 31 (2000) 1}13

3

just according to the performance required to the network, without making a priori assumptions or evaluations about the importance of the input variables. This gives more #exibility to the variable selection algorithm that can be iterated until either a predetermined number of features have been eliminated or the performance of the current reduced network falls below speci"ed requirements. Moreover, the method does not depend on the learning procedure, since it removes input nodes after the training phase. The paper is organized as follows: Section 2 introduces notations and de"nitions of the neural network, in Section 3 the proposed variable selection algorithm is described, while in Section 4 experimental results are given. Finally, in Section 5 some conclusions about the method are drawn.

2. The neural network A neural network of arbitrary topology can be represented by a directed graph N"(