Neural Comput & Applic (2008) 17:509–519 DOI 10.1007/s00521-007-0143-5
ORIGINAL ARTICLE
Neural network committee-based sensitivity analysis strategy for geotechnical engineering problems Maosen Cao Æ Pizhong Qiao
Received: 17 May 2006 / Accepted: 8 August 2007 / Published online: 13 September 2007 Springer-Verlag London Limited 2007
Abstract Neural network usually acts as a ‘‘black box’’ in diverse fields to perform prediction, classification, and regression. Different from the conventional usages, neural network is herein attempted to handle factor sensitivity analysis in a geotechnical engineering system. After systematically investigating instability of employing single neural network in factor sensitivity analysis, a neural network committee (NNC)-based sensitivity analysis strategy is first algorithmically presented based on the particular mathematical ideas of weak law of large numbers in probability and optimization. Significantly, this study especially emphasizes the practical application of the NNC-based sensitivity analysis strategy to highlight the mechanism underlying in strata movement. The principal goal is to reveal the relationships among influential factors on strata movement through estimating the relative contribution of each explicative (input) variable on dependent (output)
M. Cao P. Qiao Department of Engineering Mechanics, College of Civil Engineering, Hohai University, Nanjing 210098, People’s Republic of China e-mail:
[email protected] P. Qiao e-mail:
[email protected] M. Cao College of Water-Conservancy and Civil Engineering, Shandong Agricultural University, Tai’an 271018, People’s Republic of China M. Cao P. Qiao (&) Department of Civil and Environmental Engineering and Wood Materials and Engineering Laboratory, Washington State University, Pullman, WA 99164-2910, USA e-mail:
[email protected] variables of strata movement. It is demonstrated that the NNC-based sensitivity analysis strategy rationally not only reveals the relative contribution of each explicative variable on dependent variables but also indicates the predictability of each dependent variable. In addition, an improved prediction model is resulted from integrating the sensitivity analysis results into neural network modeling, and it is capable of facilitating the convergence training of neural network model and advancing its prediction precision on strata movement angles. The above outcomes indicate that the NNC-based sensitivity analysis strategy provides a new paradigm of applying neural networks to deal with complex geotechnical engineering problems. Keywords Neural network Neural network committee (NNC) Sensitivity analysis Predictability Geotechnical engineering Strata movement Prediction model
1 Introduction Sensitivity analysis is a feasible method for extracting the cause and effect relationship between the explicative (input) and dependent (output) variables of a system. The basic idea is that by offsetting each explicative variable slightly and recording the corresponding change in the dependent variable(s), the explicative variables which produce high sensitivity values become considerably significant. Sensitivity analysis of influential variables or factors can explicitly improve the domain interpretability of an engineering system model. This is especially valuable for analyzing complicated geotechnical engineering system. Investigation of the mechanism underlying in strata movement of underground metal mines, particularly,
123
510
sensitivity analysis of influential or critical factors on strata movement, is an important issue in geotechnical engineering. For example, the angles of upper and lower wall rocks and the avalanche angle are the key parameters for designing protective coal pillars, which can probably affect the safety of structures above underground metal mines. There are many influential factors on strata movement, and most of them possess stochastic, uncertain, and fuzzy characteristics. It has been proven that complex and highly nonlinear relationships exist among these influential factors, and it is almost impossible to explicitly interpret these relations by conventional statistical methods [1, 2]. Multiple linear regression is one of the most popular methods used in statistical analysis due to its merits of simplicity and capability of producing predictive and explanatory results. However, its inability to reflect nonlinear relationships between the dependent variable and each explicative variable makes it very little value in dealing with strata movement. Recently, artificial neural network has been increasingly employed as an effective tool for modeling complicated problems, in which the governing equations are difficult to be defined [3–7]. Many studies have explored the applications of artificial neural network to complex geotechnical engineering system. In particular, several studies employ neural network to predict strata movement angles and produce prediction results superior to traditional methods [1, 2]. However, in existing applications, neural network has commonly been treated as a ‘‘black box’’ to perform the tasks of classification, prediction and regression. With increasing interest in understanding the mechanism of strata movement, it is valuable to explore the neural network-based methods to analyze the sensitivity of influential or critical factors on strata movement. Neural network-based sensitivity analysis in the general application in prediction, classification and regression is becoming a research focus in artificial intelligence [8–13]. Among the existing studies, a single optimal neural network from model selection procedure is commonly employed to perform modeling and afterwards sensitivity analysis. Nevertheless, it is comparatively difficult to select the most optimal neural network model due to multiple undetermined factors such as diversity in model types, indeterminacy in model structure, random initialization in connection weights, etc. Despite of the employment of advanced resampling methods (e.g., cross validation and bootstrap), the most optimal neural network model is commonly undesired. The uncertainty in neural network model selection necessarily results in its instability in sensitivity analysis. While systematic investigations on this problem are not available so far. Neural network ensemble, originated from Hansen and Salamons’s work [14], is a learning paradigm where a
123
Neural Comput & Applic (2008) 17:509–519
collection of a finite number of neural networks is trained to perform the same task [15]. It has been becoming a popular paradigm to enhance the precision of prediction, classification and regression. In neural network ensemble, Bagging and Boosting are usually used to train component neural networks. In [16], an important finding exhibited that it might be better to ensemble many instead of all of the neural networks for regression and classification, i.e., ‘‘many better than all’’ for short; moreover, a genetic algorithm based on selective ensemble was proposed. At present, various fashions of neural network ensembles have been extensively examined in the existing studies. While most of applications in neural network ensemble focused on the tasks of prediction, regression and classification, a few works employed the neural network ensemble to perform sensitivity analysis. The study of underlying theoretical foundation and algorithmic representation on neural network ensemble-based factor sensitivity analysis is still not available. Among the existing neural network software for archiving the function of neural network ensemble-based factor sensitivity analysis, Trajan’s Demonstrator (http://www.trajan-software. demon.co.uk) was probably the one with most prevailing performance; however, the existence of underlying theoretical foundation and algorithmic presentation on neural network ensemble-based factor sensitivity analysis is still relatively limited. Derived from the basic idea of ‘‘many better than all’’ of neural network ensemble in [16] and inspired by the software—Trajan’s Demonstrator, the theoretical aspects on neural network ensemble-based factor sensitivity analysis is explored in this study, mainly including the reason of origination (i.e., why to use) and fundamental mathematical background. Moreover, a strategy on neural network ensemble-based factor sensitivity analysis is algorithmically presented. To address the particular meaning of vote through excellent component neural networks for factor sensitivity analysis, the presented strategy is termed ‘‘neural network committee (NNC)-based sensitivity analysis’’. The application of NNC-based sensitivity analysis to reveal intrinsic mechanism underlying in a geotechnical engineering problem (i.e., strata movement) is emphasized in this study. The primary goal attempts to reveal the influential or critical factors on strata movement by illuminating the conventional ‘‘black box’’ of neural network. To investigate the strategy and provide an explicit observation on sensitivity analysis of influential factors on strata movement, a computer program code for the NNC-based sensitivity analysis is implemented in Matlab. The paper is organized as follows. The database of strata movement (Sect. 2) is first described, followed by the presentation of two representative algorithms on
Neural Comput & Applic (2008) 17:509–519
511
conventional neural network-based sensitivity analysis (Sect. 3). In Sect. 4, the instability of neural network-based sensitivity analysis is systematically investigated using the database of strata movement. To overcome the instability of single neural network-based sensitivity analysis, a new NNC-based sensitivity analysis strategy is algorithmically presented in Sect. 5. In Sect. 6, the procedure and discussion for implementing the NNC-based sensitivity analysis strategy to sensitivity analysis of influential factors on strata movement are given. Finally, concluding remarks are drawn in the last section (Sect. 7).
2 Database The raw data used in this study are collected partly using available technical reports and engineering surveys from various research and design institutes in China. To avoid insufficiency in the raw data, a sample reconstruction technique is applied to expand the raw data, followed by a correlation analysis preprocessing to eliminate the redundant samples. It results in 168 representative samples over multiple typical observation stations on earth’s surface movement of underground metal mines. Each sample consists of nine physical characteristic variables, i.e., six explicative variables and three dependent variables (see Table 1). These variables are used to describe the behavior of strata movement of underground metal mines.
3 Available neural network-based sensitivity analysis 3.1 Neural network modeling The foundation for neural network-based sensitivity analysis consists of first-hand neural network modeling. One of principal aims of neural network modeling is to achieve the optimal network performance, of which the network is
capable of correctly capturing the intrinsic relationship between explicative and dependent variables. The optimal network performance is achieved by progressively adjusting the connection weights of network structure in terms of certain training algorithm and sampling method. The advanced resampling method of cross validation is adopted in this study in order to facilitate the attainment of optimal neural network performance. The core module of neural network modeling involves the following steps: the entire data set on strata movement is randomly split into two subsets, a training subset (4/5 of the samples) and an test subset (1/5 of the samples); ` the connection weights of model is adjusted with the training subset, and the performance of model is tested with the test subset; ´ the above process is repeatedly carried out for many times so as to determine the best configuration of artificial neural network, which captures the intrinsic mechanism of strata movement and transfers the observed data to implicit knowledge carried by the successfully trained neural network model. Through neural network modeling, model selection can be performed. For a large number of candidates of neural network models of a kind, modeling is independently implemented on each of candidates until its optimal performance is met, and subsequent model selection is performed by selecting the particular one or several candidates with the most optimal performance. To illustrate the neural network modeling process, a network framework for strata movement modeling constructed by the simplest form of multilayer perception (MLP), i.e., a three-layer perception, is illustrated in Fig. 1, and it consists of one input layer of six neurons (one for each explicative variable), one hidden layer of eight neurons which are empirically determined, and one output layer of three neurons (one for each dependent variable).
Table 1 Measured variables on strata movement Variable
Characteristics
Variable type
MCU
Mean consistency of upper wall rock
Explicative
LCL
Mean consistency of lower wall rock
Explicative
SAO
Slope angle of orebody
Explicative
TO
Thickness of orebody
Explicative
LO
Length of orebody
Explicative
DE
Depth of excavation
Explicative
MAU
Movement angle of upper wall rock
Dependent
MAL
Movement angle of lower wall rock
Dependent
AA
Avalanche angle
Dependent
Fig. 1 Neural network modeling on strata movement using a threelayer perception (F1: input layer consisting of as many neurons as explicative variables on strata movement; F2: hidden layer whose neurons number is determined empirically; F3: output layer consisting of as many neurons as dependent variables on strata movement)
123
512
Neural Comput & Applic (2008) 17:509–519
3.2 Typical neural network-based sensitivity analysis algorithms Recently, the development of neural network-based sensitivity analysis algorithms is of increasing interest [8–13]. In particular, a comprehensive review and comparison of neural network-based sensitivity analysis algorithms was provided [11], in which a conclusion was reached that the partial derivative algorithm [9] and input perturbation algorithm [10] performed relatively better, especially than the algorithms based on the magnitude of weights [8, 12, 13]. Only the following two algorithms will be addressed and later employed in the analysis of strata movement.
3.2.1 Partial derivative algorithm The partial derivative algorithm [9, 17] is one of the most popular artificial neural network-based sensitivity analysis algorithms, and it possesses explicit analytical expression. This algorithm is usually suitable for the neural networks holding first-derivative activation functions, e.g., multilayered perceptron neural networks with back-propagation algorithm (MLP/BP) and radial basis function (RBF) neural networks. Obtaining the Jacobian matrix by calculation of the partial derivatives of the output with respect to the input constitutes the analytical version of partial derivative algorithm [9, 17]. For the most popular MLP/BP model, the partial derivative algorithm is summarily defined as follows. Regarding a successfully trained MLP/BP model with input i xi and output yi, the Jacobian matrix oy oxi can be derived by applying the chain rule as oyk oyk onetk oyj1 onetj1 ¼ oxi onetk oyjn onetj1 oxi XX X ¼ Wjn k f 0 ðnetk ÞWjn1 jn f 0 ðnetjn ÞWij1 f 0 ðnetj1 Þ jn
jn1
j1
ð1Þ where netk, netjn and netj1 denote the weighted sums of kth output neuron, the hidden neuron jn and j1, respectively; f 0 denotes the differential function of the activation function f; xi is the ith input variable; yk, yjn and yj1 denote the computed output for output neuron k, hidden neuron jn and j1 in the respective nth and first hidden layer; jn, jn–1,…, and ji denotes the hidden neurons from the nth to first hidden layer, respectively; Wjn k denotes the connection weight between the kth output neuron and hidden neuron jn; Wjn1 jn denotes the connection weight between the hidden neurons jn–1 and jn; and Wij1 denotes the connection weight between the ith input neuron and the hidden neuron j1.
123
yk, yjn and yj1 is expressed as X 8 yk ¼ f ðnetk Þ; netk ¼ yjn Wjn ;k þ hk > > > > jn > > X > > > y ¼ f ðnet Þ; net ¼ yjn1 Wjn1 ;jn þ hjn > j j j n n n < jn1
> .. > > > > >. > X > > > xi Wi;j1 þ hj1 : yj1 ¼ f ðnetj1 Þ; netj1 ¼
ð2Þ
i
The total variance of training samples as a temporary variance Tik of xi and yk is given as X oyk ð3Þ Tik ¼ oxi p p where p is the pth training samples. T represents the relative contribution of input on the output with respect to the data set. In terms of T, a classification of the variables can be generated, representing their degree of contribution to the output variable in the model. The input variable with the highest T value is the variable which most influences the output variable. For the RBF, the mathematical formulas on the partial derivative algorithm are hereby omitted due to their simpler form of training algorithm. 3.2.2 Input perturbation algorithm The input perturbation algorithm is another one of the most popular artificial neural network-based sensitivity analysis algorithms [11, 12]. It produces sensitivity analysis results based on assessing the effect of small input perturbation in each input on the neural network output. Through properly adjusting the values of each explicative variable while keeping all the others unchanged, the effect of the output variables corresponding to each perturbation in the input variable is recorded. The result of sensitivity analysis is yielded by ranking the effect on neural network output induced by the same perturbation fashion in every input variable. The input variable whose perturbation influences the output most possesses the highest sensitivity or importance. In principle, the mean squared error (MSE) of the neural network output increases with the increase of perturbation caused by adding noise to the selected input variable. The changes of input variable take the form of xi = xi + d, where xi is the selected input variable and d is noise. d increases in steps of 5% of the mean magnitude of input value up to 50%. According to the increasing magnitude of MSE due to each input variable change, the input variables can be ranked, i.e., resulting in a sensitivity analysis outcome.
Neural Comput & Applic (2008) 17:509–519
In viewing of the fact that there is no single sensitivity analysis algorithm or measure appropriate for all applications [18, 19], the incorporation of the results of relatively excellent sensitivity analysis algorithms is promising to produce a reliable result.
4 Instability on neural network-based sensitivity analysis Not many efforts have been devoted to developing neural network-based sensitivity analysis methods [8]. Almost all the existing studies on this aspect usually addressed the advantages of neural network-based sensitivity analysis over traditional methods such as statistics analysis. However, almost no attention is concerned with the limitation of this kind of method, mainly referring to the instability of results, which influences the further development of neural network-based sensitivity analysis. In essence, the performance of neural network-based sensitivity analysis is mainly determined by the trained network structure and sensitivity analysis algorithm; while the sensitivity analysis algorithm is based on the trained network structure. However, it is comparatively difficult to explicitly determine a most optimal and uniform network structure [20]. First, there is still no reliable theoretical limit or convention for optimizing the number of hidden layers or number of neurons in each hidden layer [21], though some empirical rules have been discussed in literatures [22, 23]. Moreover, the common criteria to cease training are fuzzy, meaning that it is impossible to provide a concrete and quantitative procedure as to when to stop training. The ceasing training requires more and less human intervention [24]. Finally, the random initialization values are noticeable factor which influences the convergence behavior, e.g., convergence speed and generalization performance [25, 26]. The uncertainty in network structure necessarily induces the instability of sensitivity analysis result. In this study, an MLP/BP is experimentally used to model the database on strata movement using an error back-propagation training algorithm, and the training is stopped when the error on the test set stops changing any more or starts rising further. Employing the partial derivative algorithm to perform sensitivity analysis of explicative variables with respect to dependent variables, three typical instability situations induced by the network structure are illustrated in Fig. 2. Each includes five different neural network models (i.e., the ith model) and the corresponding sensitivity analysis rankings (i.e., ranking numbers of explicative variables). It is shown in Fig. 2a that the different initialization of connection weights can make the successfully-trained models in non-identical sensitivity analysis rankings; In Fig. 2b,
513
the indeterminacy in hidden neuron number can prompt non-identical sensitivity analysis rankings, and Fig. 2c presents that the random initialization of connection weights for two hidden-layer architecture can produce nonidentical sensitivity analysis rankings. Comparing Fig. 2a with Fig. 2c, it can be seen that the difference of hidden layer number can yield non-identical sensitivity analysis rankings. In addition, the different hidden neuron number for multiple hidden-layer architecture and the different kinds of neural network models may lead to non-identical sensitivity analysis results. The experimental results show that it is difficult to get a consistent result by using a signal neural network-based sensitivity analysis. The reason mainly lies in the uncertainty in the network structure in addition to the inherent complexity of the engineering system. This limitation of single neural network-based sensitivity analysis was also recognized in [11, 27], where the basic measures of multiple repetition of training were taken to deal with the instability induced by the random initialization.
5 Algorithmic presentation of NNC-based sensitivity analysis 5.1 Mathematical foundation Suppose a sensitivity analysis ranking of input variables resulting from any successfully-trained neural network model is A = [x1, x2,…, xi,…, xn] and the genuine ranking of input variables is A0 = [r1, r2,…, ri,…, rn], where xi or ri is the ranking number of ith input variable. The goal of input variable sensitivity analysis is to approach A0 with rationally calculated A. Since the instability of single neural network-based sensitivity analysis is unavoidable, it is almost impossible to obtain a reliable estimate of A approaching A0 based on a single neural network-based model. This is always the fact even if the excellent sensitivity analysis algorithm is integrated into the optimized neural network mode. Therefore, a more effective and reliable neural networkbased sensitivity analysis paradigm is comparatively desired. Implementation of a few available neural network ensemble-based sensitivity shows promise of producing improved sensitivity analysis results, while the mathematical foundation underlying in such an implementation is not considered and a complete algorithmic presentation has still not been reported. In this study, a strategy termed ‘‘neural network committee (NNC)’’-based sensitivity analysis is algorithmically presented by exploring the theoretical background for neural network ensemble-based sensitivity analysis. The study of theoretical background naturally starts from the instability of single neural
123
514
Neural Comput & Applic (2008) 17:509–519
Fig. 2 Illustration of the instability of single neural network-based sensitivity analysis. a The situation of random initialization of connection weights for the same network architecture: 6-8-3; b the
situation of respective 4, 5, 6, 7, 8 hidden neurons in hidden layer; c the situation of random initialization of connection weights for the two hidden-layer network architecture: 6-4-4-3
network-based sensitivity analysis, as discussed in Sect. 4. In this section, a resumed exploration focuses on how to overcome the instability of single neural network-based sensitivity analysis in order to achieve a relatively precise and reliable sensitivity analysis result of input variables. Regarding each element of A as a random variable, the fundamental mathematical idea of NNC-based sensitivity analysis is rationally originated from the weak law of large numbers in probability. The weak law of large numbers is stated as follows. If Y1, Y2, Y3,… is an infinite sequence of random variables, any two of which are uncorrelated (i.e., the correlation between any two of variable is zero), and each of which has the same expected value of l and variance r2, then the sample average converges in probability to l and is expressed as [28]
infinite sequence of random variables. However, this condition cannot be met in a practical operation of neural network ensemble-based sensitivity analysis. First, xi is not an absolute random variable due to different restriction condition generated from different neural network models; second, the number of neural networks in a neural network ensemble is definitely limited. Under this situation, a mathematical idea of optimization is valuable to prompt a better sensitivity analysis ranking, which can be achieved by picking out excellent neural networks and eliminating the poorer ones by a specific procedure, such as cross validation. Through eliminating the adverse effect induced by the poorer neural network elements, the optimization can effectively guarantee an unbiased estimation of sensitivity analysis ranking. In principle, optimization is the mathematical support underlying in ‘‘many better than all’’ paradigm in neural network ensemble. Since only the excellent element in global neural network ensemble is used to vote for a sensitivity analysis task, the neural network ensemble-based sensitivity analysis is termed as NNC-based sensitivity analysis in this study.
Yn ¼ ðY1 þ Y2 þ þ YÞ=n
ð4Þ
or, somewhat more precisely, for any positive number e, no matter how small, the following exists: lim PðjYn j l\eÞ ¼ 1
n!1
ð5Þ
With respect to the input variable ranking A = [x1, x2,…, xi,…, xn] generated by a single neural network model, each element xi in A is regarded as a random variable, and thus A consists of n random variables. Under the situation of a neural network ensemble-based sensitivity analysis, a sequence of random variables x1i ; x2i ; x3i ; . . .; xki associated with xi can be achieved. According to the weak law of large numbers, for a sufficient large k, the average of x1i ; x2i ; x3i ; . . .; xki approximately converges to ri in A0 because of ri is the expected genuine value, i.e., any element in A. Thus, on the basis of neural network ensemblebased sensitivity analysis, it is feasible to get a ranking A which approaches the genuine ranking A0 in a probability meaning. It should be noted that the weak law of large numbers is established under the condition that Y1, Y2, Y3,… is an
123
5.2 Algorithmic procedure of strategy Derived from the above mathematical foundation, the NNC-based sensitivity analysis consists of three basic elements: an NNC is employed to conduct a group of neural network modeling through each neural network independently performing modeling in terms of the basic steps described in Sect. 3.1. This is a time-consumed process in present code; ` an optimal NNC formed by choosing a group of better-performed neural network models is employed to conduct ensemble neural network sensitivity analysis through each neural network performing sensitivity analysis independently, resulting large numbers of A; and ´ the mean of A approximately forms an unbiased estimate of A0. A schematic of the specific procedure of NNC-based sensitivity analysis strategy is
Neural Comput & Applic (2008) 17:509–519
515
presented Fig. 3, using the sensitivity analysis of influential factors on strata movement as an engineering implementation example, i.e., six input variables and three output variables. A step-by-step procedure for implementing the proposed NNC-based sensitivity analysis strategy is shown in Fig. 4, and it involves the following four basic steps: Step 1 Empirically choose excellent types of neural network as the seeds of neural network model from available popular types of neural network. The seeds should be particularly specialized in handling nonlinearity problems due to the extreme complexity of strata movement; Step 2 Each neural network model seed produces m various neural network models through rationally choosing the number of hidden layers or number of neurons in each hidden layer in their empirical ranges, forming candidate neural network model group; Step 3 From each candidate model group, pick up k 3 (k ¼ 10 m; a optimal value experimentally determined in practical operations of strata movement) neural network models with better performance indicated by less generalization error to form superior neural network model group (committee), and then the sensitivity analysis of every element of superior neural network model group is implemented to handle strata movement issue, from which a group of sensitivity analysis rankings are generated, i.e., a set of A; Step 4 For each explicative variable, calculate the mean of corresponding ranking numbers in the set of A to form its final ranking number as an estimated of A0, which is formulated as ri r^i ¼
N X K 1 X xst ; NK s¼1 t¼1 i
i ¼ 1; 2; . . .; 6
ð4Þ
where r^i denotes the estimate of ri in A0 by variable xi in A, K denotes the number of elements in superior neural
s te p 1
s te p 2
s te p 3
a1
aa1 aa2
am b1
aak bb1 bb2
b
bbk
n1
nn1 nn2
nm
nnk
s te p 4
A
B
N
Fig. 4 Step-by-step implementation procedure of NNC-based sensitivity analysis strategy (A–N, the neural network model seeds; a1–am, b1–bm and n1–nm, the candidate neural network model groups; aa1– aak, bb1–bbk and nn1–nnk, the superior neural network model groups; ellipse denotes a sensitivity analysis ranking of input variables
network model groups (committee), and N denotes the number of kinds of neural network model seeds. Since it is proved that the input perturbation algorithm and partial derivative algorithm possess better capability and are suitable for a few good kinds of neural networks [8, 11], only these two algorithms are implemented in the code of NNC-based sensitivity analysis for strata movement.
6 NNC-based sensitivity analysis of influential factors on strata movement 6.1 Sensitivity analysis Strata movement analysis is a difficult issue in geotechnical engineering due to its stochastic, uncertain, and fuzzy characteristics. The aforementioned NNC-based sensitivity analysis strategy is used to study the sensitivity of influential factors on strata movement. The principal goal is to reveal the contribution of each explicative variable on dependent variables (i.e., the underlying relationship among the influential factors and strata movement). Four
Fig. 3 A schematic of ‘‘neural network committee (NNC)’’based sensitivity analysis strategy. Each neural network model with identical input and output variables is used to reflect strata movement. The ‘‘·’’ symbolizes the elimination of neural network model with worse-performance
123
516
scenarios are considered in terms of dependent variables (see Table 1): all dependent variables, ` only MAU, ´ only MAL, and ˆ only AA. First, by an overall comparison, MLP/BP and RBF are chosen as neural network model seeds due to their robustness and power in dealing with complex nonlinearity; second, each seed independently produces 50 explicative neural network models forming a candidate neural network model group; third, by performance comparison, 15 better-performed neural network models are picked out from each candidate model group, and they construct a superior neural network model group (committee), and then the sensitivity analysis is performed on each member independently by using both perturbation algorithm and partial derivative algorithm, resulting in a set of input variable ranking A; finally, calculate the sum of the ranking number of each variable in the set of A to yield a score for each input variable. The scores provide the final ranking which approaches A0. The reason using the sum instead of the mean to yield the final ranking is to avoid the similar value for different variables due to a relatively small number of neural networks used to produce the final ranking. The superior neural network model groups and resulting sensitivity analysis rankings for Scenario are exemplified in Table 2. As described in [11], a series of interrelated profiles of variation for each dependent variable due to small changes of each explicative variable are also obtained in this study from the partial derivative algorithm, and they depict the detailed interaction between each dependent variable and explicative variable. The profiles offer a better understanding on how the explicative variable affects the dependent variable in a step-by-step way, and they thus provide a feasible method to determine effective boundary on explicative variable in protective designs related to strata movement. This is a unique merit of partial derivative algorithm in factor sensitivity analysis. All the final sensitivity analysis results for four scenarios are shown in Table 3. It reveals that for Scenario , DE is the highest contributing variable, the second is SAO, followed by MCU, LCL, and the last two variables TO and LO are not significantly different. For Scenario `, the result is similar to that of Scenario , except that TO is after LO and the difference between them is clear. For Scenario ´, TO is the variable with the largest contribution, followed by LCL, SAO and MCU with no significant difference among them, and then DE and LO, the most insignificant ones. For Scenario ˆ, LO is the most important variable, followed by DE, TO, MCU, SAO, and finally LCL. However, the differences between DE and TO and among MCU, SAO and LCL in Scenario ˆ are not significant.
123
Neural Comput & Applic (2008) 17:509–519 Table 2 Sensitivity analysis rankings produced by superior neural network model groups Superior neural network model
MCU
LCL
SAO
TO
LO
DE
RBF_1
5
6
1
4
3
2
RBF_2
2
1
6
5
4
3
RBF_3 RBF_4
3 5
6 6
5 3
2 1
4 4
1 2
RBF_5
4
6
3
2
5
1
RBF_6
5
4
2
3
6
1
RBF_7
4
6
5
2
3
1
RBF_8
2
6
3
5
4
1
RBF_9
4
6
5
2
3
1
RBF_10
2
5
6
3
4
1
RBF_11
3
4
1
6
5
2
RBF_12
3
4
2
6
5
1
RBF_13
2
4
3
6
5
1
RBF_14
2
4
3
6
5
1
RBF_15
4
2
3
5
6
1
MLP/BP_1
1
4
3
6
5
2
MLP/BP_2
2
4
3
6
5
1
MLP/BP_3 MLP/BP_4
4 4
2 3
3 2
5 6
6 5
1 1
MLP/BP_5
2
3
4
6
5
1
MLP/BP_6
4
2
1
5
6
3
MLP/BP_7
5
3
2
6
4
1
MLP/BP_8
4
3
1
5
6
2
MLP/BP_9
4
3
2
6
5
1
MLP/BP_10
4
3
1
5
6
2
MLP/BP_11
3
4
2
5
6
1
MLP/BP_12
3
5
1
4
6
2
MLP/BP_13
4
3
2
5
6
1
MLP/BP_14
4
3
1
6
5
2
MLP/BP_15
3
5
1
4
6
2
Table 3 Sensitivity analysis results of influential factors on strata movement Scenarios
Score and ranking
MCU
LCL
SAO
TO
LO
DE
Score
101
120
80
138
148
43
3
4
2
5
6
1
96
114
64
162
145
49
Ranking `
Score Ranking
3
4
2
6
5
1
96
81
87
75
155
136
Ranking Score
4 109
2 121
3 113
1 103
6 86
5 98
Ranking
4
6
5
3
1
2
´
Score
ˆ
Neural Comput & Applic (2008) 17:509–519
517
In engineering applications, understanding the specific status of strata movement is a basic precondition for designing protective coal pillars, which can probably affect the safety of structures above underground metal mines. The specific status of strata movement can be reflected by the activity of dependent variables on strata movement and farther be indicated by the predictability of dependent variables. For three sensitivity analysis scenarios, the score comparisons of all the explicative variables corresponding to different dependent variables are illustrated in Fig. 5, where the degree of vertical diversification of scores for a scenario implies the sensitivity of dependent variable to different explicative variables. The dependent variable with the highest sensitivity against explicative variables possesses strongest predictability, and the sensitivity can be measured by the variance of the score vector of explicative variables. With respect to scenarios `, ´ and ˆ, the variances of the score vector of explicative variables corresponding to dependent variable MAU, MAL and AA are 1,965.6, 1,068.4 and 150, respectively. It can be clearly concluded that MAU possesses strongest predictability with the highest sensitivity against explicative variables, followed by MAL and then AA. It thus ranks the activity of three dependent variables in the order of MAU, MAL and AA. Accordingly, the angles of upper wall rocks are the most important factor for designing protective coal pillars, followed by the lower wall rocks, and the then the avalanche angle.
and nonlinearity of strata movement, neural network is a better tool commonly used to establish the prediction model of strata movement angles compared to conventional statistics methods. However, how to select the appropriate input variables or influential factors on strata movement is equivalently important. When insignificant explicative variables serve as input variables, they will introduce negative effect on the performance of neural network model; whereas when some useful or critical explicative variables are missed, the performance of neural network model will be degraded due to inability to reflect comprehensive raw data. Derived from the aforementioned NNC-based sensitivity analysis strategy, an alternative tactical scheme to select the input variables is to preprocess the value of input variables by assigning weighting coefficients in terms of their specific contribution on the outputs. The scheme is demonstrated on two aforementioned scenarios and `. For Scenario , when the least significant contributing variables LO and TO are weighted by coefficients 0.75 and 0.6, respectively, the typical trained sum squared error (SSE) curve for neural network modeling on the underlying relationship of strata movement is illustrated in Fig. 6a. Compared to the SSE curve shown in Fig. 6b generated by involving all intact input values, it is clear that the preprocessing of input values in terms of the prior knowledge obtained from NNC-based sensitivity analysis is capable of better facilitating the neural network convergence and remarkably improving the precision of model. Similarly, for Scenario `, when the most insignificant contributing variables TO and LO are weighted by coefficients 0.70 and 0.5, respectively, the typical curve of SSE generated during neural network modeling is illustrated in Fig. 6c. Compared to the SSE curve shown in Fig. 6d relevant to all intact input values involved, the same conclusion can be made as that resulted in Scenario . By the way, it can be seen on the whole that the magnitude of SSE curves associated with Scenario ` are obviously lower than Scenario , implying that modeling of Scenario ` is easier than that of Scenario due to elimination of the dependent variables MAL and AA which hold relatively lower predictability. It validates that the strategy of NNC-based sensitivity analysis effectively reveals the underlying relationships between the explicative and dependent variables, i.e., the underlying relationship among the influential factors of strata movement.
6.2 Improved prediction model due to the strategy
7 Concluding remarks
Establishing the prediction model of strata movement angles is of significant importance in practical geotechnical engineering application. In view of the strong complexity
The investigation of mechanism underlying in strata movement attracts increasing interest in geotechnical engineering. The traditional statistics methods, such as multiple
Fig. 5 Activity analysis of dependent variables on strata movement based on NNC-based sensitivity analysis results. Three scenarios are exhibited in light of dependent variables: ` only MAU, ´ only MAL and ˆ only AA, respectively, while scenario for all dependent variables
123
518
Neural Comput & Applic (2008) 17:509–519
Fig. 6 Illustration of the positive effect yielded by the NNC-based sensitivity analysis strategy on modeling the underlying relationship for Scenarios and `. a SSE curve generated by neural network modeling for Scenario using weighted input variables LO and TO with coefficients of 0.75 and 0.6, respectively; b SSE curve generated by neural network modeling for Scenario using the intact input variables; c SSE curve generated by neural network modeling for Scenario ` using the weighted input variables TO and LO with coefficients of 0.7 and 0.5, respectively; d SSE curve generated by neural network modeling for Scenario ` using the intact input variables
linear regression, are not capable of handling the nonlinear relation inherent in strata movement. In this study, a novel usage of neural networks to reveal the mechanism underlying in strata movement is explored. Differing from the conventional usage of neural network, which usually acts as a ‘‘black box’’ to perform prediction, classification, regression, etc., a strategy of NNC-based factor sensitivity analysis is algorithmically presented, based on the paradigm of ‘‘many better than all’’ in neural network ensemble and inspired by the related software system. In particular, the theoretical aspects on the strategy of NNC-based factor sensitivity analysis are explored by introducing the ideas of probability and optimization. Significantly, the NNC-based factor sensitivity analysis instead of conventional single neural network analysis is employed to reveal the underlying relationships among the influential factors on strata movement. The results demonstrate that the strategy is able to produce relatively reliable results of relative contribution of each explicative variable on dependent variables, presents the predictability of each dependent variable, and is helpful for creating improved neural network prediction model on strata movement. These preliminary outcomes are valuable for the mechanism interpretation of strata movement; meanwhile, more thorough investigation under more sufficient experimental data is necessary due to the extreme complexity of strata movement. This study suggests that rational application of neural network has potential to handle complicated geotechnical engineering problems.
123
Acknowledgments The first author (MC) gratefully acknowledges the support provided by Wood Materials and Engineering Laboratory at Washington State University; while the second author (PQ) thanks the Changjiang Scholarship provided by Hohai University and Ministry of Education of the People’s Republic of China. The authors would like to acknowledge M. M. Gevrey for providing the related code on neural network sensitivity analysis. This study is partially supported by the National Natural Science Foundation (NSFC) of China under Grant No. 50608027, the Jiangsu Planned Projects for Postdoctoral Research Funds under Grant No. 0601037B, and the Science and Technology Innovation Foundation of Shandong Agricultural University under Grant No. 20060315.
References 1. Guo WB, Deng KZ, Zou YF (2003) Study on artificial neural network method for calculation of displacement angle of strata. J China Saf Sci 13(9):69–73 2. Wang YH, Cai SJ, Song WD (2003) Research on strata movement in underground metal mines based on the artificial neural network. J Univ Sci Technol Beijing 25(2):106–109 3. Yilmaz M, Ertunc HM (2005) The prediction of mechanical behavior for steel wires and cord materials using neural networks. Mater Des 28(2):599–608 4. Carpinteiro OAS, Lima I, Leme RC, Zambroni de Souza AC, Moreira EM, Pinheiro CAM (2007) A hierarchical neural model with time windows in long-term electrical load forecasting. Neural Comput Appl 16:465–470 5. Korosec M (2007) Technological information extraction of free form surfaces using neural networks. Neural Comput Appl 16:453–463 6. Postalcioglu S, Becerikli Y (2007) Wavelet networks for nonlinear system modeling. Neural Comput Appl 16:433–441
Neural Comput & Applic (2008) 17:509–519 7. Scheffer C, Engelbrecht H, Heyns PS (2005) A comparative evaluation of neural networks and hidden Markov models for monitoring turning tool wear. Neural Comput Appl 14:325–336 8. Montan˜o JJ, Palmer A (2003) Numeric sensitivity analysis applied to feedforward neural networks. Neural Comput Appl 12:119–125 9. Dimopoulos Y, Bourret P, Lek S (1995) Use of some sensitivity criteria for choosing networks with good generalization ability. Neural Process Lett 2:1–4 10. Zeng XQ, Yeung DS (2003) A quantified sensitivity measure for multiplayer perceptron to input perturbation. Neural Comput 15:183–212 11. Gevrey M, Dimopoulos I, Lek S (2003) Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol Model 160:249–264 12. Wang W, Jones P, Partridge D (2000) Assessing the impact of input features in a feedforward neural network. Neural Comput Appl 9:101–112 13. Gedeon TD (1997) Data mining of inputs: analyzing magnitude and functional measures. Int J Neural Syst 8:209–218 14. Hansen LK, Salamon P (1990) Neural networks ensembles. IEEE Trans Pattern Anal Mach Intell 12(10):993–1001 15. Sollich P, Krogh A (1996) Learning with ensembles: how overfitting can be useful. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advance in neural information processing systems 8, Denver, CO. MIT Press, Cambridge, pp 190–196 16. Zhou ZH, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137(1–2):239–263 17. Yang Y, Zhang Q (1997) A hierarchical analysis for rock engineering using artificial neural networks. Rock Mech Rock Eng 30(4):207–222
519 18. Masters T (1994) Practical neural network recipes in C++. Academic, San Diego 19. Warren SS, Cary NC (2000) How to measure importance of inputs? ftp://ftp.sas.com/pub/neural/importance.html 20. Aires F, Prigent C, Rossow WB (2004) Neural network uncertainty assessment using Bayesian statistics: a remote sensing application. Neural Comput 16:2415–2458 21. Benaouda D, Wadge G, Whitmarsh RB, Rothwell RG, Macleod C (1999) Inferring the lithology of borehole rocks by applying neural network classification to downwhole logs: an example from the Ocean Drilling Program. Geophys J Int 136:477–491 22. Pulli JJ, Dysart PS (1990) An experiment in the use of trained neural networks for regional seismic event classification. Geophys Res Lett 17:977–980 23. Maiti S, Tiwari RK, Ku¨mpel HJ (2007) Neural network modeling and classification of lithofacies using well log data: a case study from KTB borehole site. Geophys J Int 169:733–746 24. Natarajan S, Rhinehart RR (1997) Automated stopping criteria for neural network training. In: Proceedings of the American Control Conference, Albuquerque, New Mexico 25. Atiya A, Ji C (1997) How initial conditions affect generalization performance in large networks. IEEE Trans Neural Netw 8(2):448–451 26. Thimm G, Fiesler E (1997) High-order and multilayer perception initialization. IEEE Trans Neural Netw 8(2):349–359 27. Olden JD, Joy MK, Death RG (2004) An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecol Model 178:389–397 28. Durrett R (2004) Probability: theory and example, 3rd edn. Duxbury, Pacific Grove
123