Expert Systems with Applications 38 (2011) 5958–5966
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Multiple regression, ANN (RBF, MLP) and ANFIS models for prediction of swell potential of clayey soils Isßık Yilmaz a,⇑, Oguz Kaynar b a b
Cumhuriyet University, Faculty of Engineering, Department of Geological Engineering, 58140 Sivas, Turkey Cumhuriyet University, Faculty of Economics and Administrative Sciences, Department of Management Information Systems, 58140 Sivas, Turkey
a r t i c l e
i n f o
a b s t r a c t
Keywords: ANN ANFIS Multiple regression Soft computing Clayey soil Swell potential
In the recent years, new techniques such as; artificial neural networks and fuzzy inference systems were employed for developing of the predictive models to estimate the needed parameters. Soft computing techniques are now being used as alternate statistical tool. Determination of swell potential of soil is difficult, expensive, time consuming and involves destructive tests. In this paper, use of MLP and RBF functions of ANN (artificial neural networks), ANFIS (adaptive neuro-fuzzy inference system) for prediction of S% (swell percent) of soil was described, and compared with the traditional statistical model of MR (multiple regression). However the accuracies of ANN and ANFIS models may be evaluated relatively similar. It was found that the constructed RBF exhibited a high performance than MLP, ANFIS and MR for predicting S%. The performance comparison showed that the soft computing system is a good tool for minimizing the uncertainties in the soil engineering projects. The use of soft computing will also may provide new approaches and methodologies, and minimize the potential inconsistency of correlations. Ó 2010 Elsevier Ltd. All rights reserved.
1. Introduction
loaded, structures on the surface will be affected by heave. As reported by Bell, Cripps, Culshaw, and Entwisle (1993) depending on the catalogue of Burland (1984), the annual cost of the problem in the USA and Sudan in the mid 1980’s was $6–$8 billions and $6 millions, respectively (Yilmaz, 2008). A great deal of structural movement has been unduly blamed on expansive soils. Many floor slabs, constructed in an expansive soil area, crack and sometimes heave due to improperly designed concrete. It is a well known fact that the improper curing of concrete, in addition to the lack of expansion joints, will cause cracking (Chen, 1975). In order to classify swelling soils and design structures either upon or inside a clayey soil, swell potential of the soil have a vital importance. Swell potential of the soil is mainly used in numerical and analytical methods in design approaches for estimation of surface heave and swelling pressure acting on a building. Correlations have been a significant part of soil mechanics from the earliest days. In some cases it is essential as it is difficult to measure the amount directly and in other cases it is desirable, to ascertain the results with other tests through correlations. The correlations are generally semi-empirical based on mechanics or purely empirical based on statistical analysis. However, determination of swell potential of a soil material is time consuming, expensive and involves destructive tests. If reliable predictive models could be obtained to correlate swell percent (S%) to quick, cheap and nondestructive test results, they will be very valuable for at least the preliminary stage of designing a structure. The use of
Many buildings are constructed with foundations that are inadequate for existing soil conditions. Because of the lack of suitable land, homes are often built on the marginal land that has insufficient bearing capacity to support the substantial weight of a structure. Land becomes scarce with city growth and it often becomes necessary to construct buildings and other structures on the sites in unfavorable conditions. The most important characteristic of clayey soils is their susceptibility to the volume change from swelling and shrinkage. Such volume changes can give rise to ground movements which may result in damage to buildings (Bell & Jermy, 1994; Bell & Maud, 1995). The clays most prone to swelling and shrinkage are over-consolidated clays (Dhowian, Ruwiah, & Erol, 1985) and Tertiary and Quaternary alluvial/colluvial soils (Donaldson, 1969). Swelling potential of expansive clayey soils is due to reductions of overburden stress, unloading conditions, or exposure to water and increase in moisture content. Bell and Maud (1995) suggest that low rise buildings are particularly vulnerable to ground movements as they generally do not have sufficient weight or strength to resist such movement. Geotechnical engineers have long recognized that swelling of expansive soils caused by moisture variation may result in considerable distress and consequently in severe damage to the overlying structures (Basma, 1991). If the substrata are not heavily ⇑ Corresponding author. Tel.: +90 346 219 1010x1305; fax: +90 346 219 1171. E-mail address:
[email protected] (I. Yilmaz). 0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.11.027
5959
I. Yilmaz, O. Kaynar / Expert Systems with Applications 38 (2011) 5958–5966
empirically obtained parameters from the index test results may not be reliable for engineering projects. However, these data would be very valuable for at least the preliminary stage of designing a structure, when the data joined with interpretation is based on engineering experiences. However, the literature contains a considerable number of empirical equations obtained from the conventional statistical techniques for assessing the swell potential of soils. In recent years, some new soft computing techniques such as artificial neural networks, fuzzy inference systems, evolutionary computation, etc. and their hybrids have been successfully employed for developing predictive models to estimate the needed parameters. These techniques have more attraction in many research fields because a wide range of uncertainty can be tolerated by them, and soft computing techniques are now being used as alternate statistical tool. This study aims to determine the empirical relationships for estimation of swell percent of soils by using multiple regression (MR), MLP and RBF functions of ANN (artificial neural network) and ANFIS (adaptive neuro-fuzzy inference system) models, and compare the prediction capabilities of the models. Soil samples (215) were tested for determination of swell percent (S%), liquid limit (LL), activity (A) and cation exchange capacity (CEC), and in order to establish predictive models, statistical and soft computing techniques such as multiple regression, artificial neural networks by means of MLP and RBF and adaptive neuro-fuzzy inference system models were used, and prediction performances were then analyzed. It was found that the relationships developed in this study will allow LL, A and CEC to be used a rapid, easy to determine, low cost means to estimate the swelling potential with sufficient accuracy to allow for adequate foundation design in situations where urgency and/or lack of money prevents a thorough geotechnical investigation from being conducted. Moreover, the comparison of performance indices and coefficient of correlations for predicting swell percent revealed that prediction performances of the RBF function of artificial neural network model is higher than those of multiple regression equations, MLP function of artificial neural networks and artificial neuro-fuzzy inference system.
2. Experimental framework In this study, the data were provided from extensive field studies and our database was constructed over 15 years. As is well known sufficient number of data having high quality is required in order to construct reliable predictive model, that’s why 215 samples were used in the analyses; however we have the data of 350 or more soil samples. Soils were tested for determination of swell percent, Atterberg limits, cation exchange capacity and grain size distribution according to the procedure suggested by international standards. In order to determine the swelling percent of the soils samples, swelling tests were carried out thereon in accordance with ASTM D-4546 (1994). A 0.07 kgf/cm2 pre-loading pressure and samples with a radius of 5.0 cm were used in our tests. When clay minerals are present in fine-grained soil it can be remoulded in the presence of some moisture without crumbling. This cohesive nature is caused by the adsorbed water surrounding the clay particles. LL increases with the increasing of the quantity of expansive clay minerals such as montmorillonite, etc. The liquid limit and plastic limit values of the samples were determined according to the procedure outlined in British Standard (BS) 1377 (BS, 1975). Swelling properties of the soils are affected by CEC, in other words the swelling capacity is closely related to the CEC. The amount of swelling increases with increasing of CEC (Christidis,
1998). Al-Rawas (1998) has also reported that the cations are the factors controlling the expansive nature of soils. One of the fundamental differences between clay minerals lies in the amount and kind of exchangeable cations present on their surfaces and the excess negative charge of the crystal lattice which these cations neutralize. The property of ion exchange is of great fundamental and practical importance in the investigation of the clay minerals. The CEC of a soil is the number of moles of adsorbed cation charge that can be desorbed from unit mass of soil, under given conditions of temperature, pressure, soil solution composition and soil-solution mass ratio (Sposito, 1989). For soils in which the readily exchangeable cations are solely monovalent or bivalent, the ‘‘index’’ cation can be Na+, whereas for soils also bearing trivalent readily exchangeable cations, Ba2+ is the ‘‘index’’ cation of choice. þ Often NHþ 4 has been used as an ‘‘index’’ cation. In this study NH4 was used as an index cation (Yilmaz, 2006). In the last stage of the laboratory experiments, CEC of the soils was measured by using the ammonium acetate (NH4OAc) method. The basis of this method is the replacement of sodium (Na+) ions with ammonium (NHþ 4 ) ions. In the tests, the soils were first saturated with the sodium ions and then replacing of sodium ions with ammonium ions were provided by adding a solution containing ammonium at a pH of 7 (Bache, 1976). At the end of the CEC tests, the amount of sodium in the solution was determined by the atomic adsorption method. The results obtained and their basic test statistics are tabulated in Table 1. The swell percent of the soils ranged between 1.1 and 15.2 with an average value of 6.75. While the average value of liquid limit was 56.5%, values varied from 4% to 112%. The respective average values of activity and cation exchange capacity were determined as 0.85% (0.11–1.84%) and 47.1 meq/100 g (5.1– 94.9 meq/100 g). It was particularly paid to attention to select the data set having a normal distribution. In order to characterize the variation of S% used as an independent value, descriptive statistics such as; minimum, maximum, mean, mode, median, variance, standard deviation, skewness and kurtosis etc. were calculated using the SPSS Version 10.0.1 (1999) package. Table 2 shows that the independent value shows almost normal distribution. However it is close to the normal distribution, data are skewed left and showed a kurtosis (Fig. 1). It can be seen that the respective skewness and kurtosis values of 0.207 and 0.471 were very low. In conclusion, it was evident that the analyses will work well in case. 3. Data processing and analyses In order to establish the predictive models among the parameters obtained in this study, simple regression analysis was performed in the first stage of the analysis. The relations between S with other parameters were analyzed employing linear, power, logarithmic and exponential functions. Statistically significant and strong correlations were found to be linear, and regression equations were established among index parameters with S (Table 3). All obtained relationships were found to be statistically significant according to the Student’s t-test at 95% level of confidence.
Table 1 Basic statistics of the results obtained from tests.
Minimum Maximum Average Std. Dev.
S (%)
LL (%)
A (%)
CEC (meq/100 g)
1.1 15.2 6.75 3.435
4 112 56.5 26.576
0.11 1.84 0.85 0.356
5.1 94.9 47.1 24.312
S, swell percent; LL, liquid limit; A, activity; CEC, cation exchange capacity.
5960
I. Yilmaz, O. Kaynar / Expert Systems with Applications 38 (2011) 5958–5966
all the independent variables are 0. The standardized versions of the b coefficients are the beta weights, and the ratio of the beta coefficients is the ratio of the relative predictive power of the independent variables. The major conceptual limitation of all regression techniques is that one can only ascertain relationships, but never be sure about underlying causal mechanism. Multiple regression analysis was carried out to correlate the measured swell percent to three soil index, namely, liquid limit, activity and cation exchange capacity (Table 4). Multiple regression model to predict swell percent is given below.
Table 2 Descriptive statistics for S% as an independent value. N
Valid: 215 Missing: 0
Mean Std. error of mean Median Mode Std. deviation Variance Skewness Std. error of skewness Kurtosis Std. error of kurtosis Range Minimum Maximum Sum
6.7474 0.2343 6.5000 6.00 3.4351 11.8002 0.207 0.106 0.471 0.230 14.10 1.10 15.20 1450.70
S% ¼ ð9:223 102 ÞLL þ ð2:401 102 ÞA þ ð5:535 102 ÞCEC 0:153
ð1Þ
In fact, the coefficient of correlation between the measured and predicted values is a good indicator to check the prediction performance of the model. Fig. 3 shows the relationships between measured and predicted values obtained from the MR model for S%, with good correlation coefficient. In this study, values account for (VAF) (Eq. (2)) and root mean square error (RMSE) (Eq. (3)) indices were calculated to control the performance of the prediction capacity of predictive model developed in the study as employed by Alvarez and Babuska (1999), Finol, Guo, and Jing (2001), Gokceoglu (2002), Yilmaz and Yüksek (2008, 2009):
varðy y0 Þ 100 VAF ¼ 1 varðyÞ vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N u1 X RMSE ¼ t ðy y0 Þ2 N i¼1
ð2Þ ð3Þ
where y and y0 are the measured and predicted values, respectively. The calculated indices are given in Table 5. If the VAF is 100 and RMSE is 0, then the model will be excellent. Mean absolute percentage error (MAPE) which is a measure of accuracy in a fitted series value in statistics was also used for comparison of the prediction performances of the models. MAPE usually expresses accuracy as a percentage (Eq. (4)). Fig. 1. Frequency distribution of UCS values of samples used in analyses.
Table 3 Predictive models for assessing the S%.
S–LL S–A S–CEC
Predictive model
R2
S = 0.1239 LL + 0.2584 S = 7.6766 A + 0.2174 S = 0.1335 CEC + 0.457
0.91 0.68 0.89
Fig. 2 shows the plot of the swell percent versus liquid limit, activity and cation exchange capacity. 3.1. Multiple regression model Multiple regression, a time-honored technique going back to Pearson’s 1908 use of it, is employed to account for (predict) the variance in an interval dependent, based on linear combinations of interval, dichotomous, or dummy independent variables. The general purpose of multiple regression is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. The multiple regression equation takes the form y = b1x1 + b2x2 + + bnxn + c. b1, b2, . . ., bn are the regression coefficients, representing the amount the dependent variable y changes when the corresponding independent changes 1 unit. c is a constant, where the regression line intercepts the y axis, representing the amount the dependent y will be when
N 1X Ai Pi 100 MAPE ¼ N i¼1 Ai
ð4Þ
where Ai is the actual value and Pi is the predicted value. The obtained values of RMSE, VAF and MAPE, given in Table 5, indicated high prediction performances. 3.2. ANN (artificial neural networks) models – MLP and RBF Neural networks may be used as a direct substitute for auto correlation, multivariable regression, linear regression, trigonometric and other statistical analysis and techniques (Singh, Kanchan, Verma, & Singh, 2003). Neural networks, with their remarkable ability to derive a general solution from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an ‘‘expert’’ in the category of information it has been given to analyze. This expert can then be used to provide projections given new situations of interest and answer ‘‘what if’’ questions. When a data stream is analyzed using a neural network, it is possible to detect important predictive patterns that are not previously apparent to a nonexpert. Thus, the neural network can act as an expert. The particular network can be defined by three fundamental components: transfer function, network architecture and learning law (Simpson, 1990). It is essential to define these components, to solve the problem satisfactorily. Neural networks consist of a large class of different architectures. Multi Layer Perceptron (MLP) and Radial Basis
I. Yilmaz, O. Kaynar / Expert Systems with Applications 38 (2011) 5958–5966
5961
Fig. 2. Swell percent versus liquid limit, activity and cation exchange capacity.
Function (RBF) are two of the most widely used neural network architecture in literature for classification or regression problems Table 4 Model summaries of multiple regressions for prediction of S%. Independent variables
Coefficient
Std. error
t-Value
Sig. level
Constant LL A CEC
0.153 9.223 102 2.401 102 5.535 102
0.178 0.011 0.331 0.012
0.859 8.727 0.073 3.064
0.392 0.000 0.942 0.002
(Cohen & Intrator, 2002, 2003; Kenneth, Wernter, & MacInyre, 2001; Loh & Tim, 2000). Both types of neural network structures are good in pattern classification problems. They are robust classifiers with the ability to generalize for imprecise input data. General difference between MLP and RBF is that RBF is a localist type of learning which is responsive only to a limited section of input space. On the other hand, MLP is more distributed approach. The output of a MLP is produced by linear combinations of the outputs of hidden layer nodes in which every neuron maps a weighted average of the inputs through a sigmoid function. In one hidden
Fig. 3. Cross-correlation of predicted and observed values of S% for multiple regression model.
5962
I. Yilmaz, O. Kaynar / Expert Systems with Applications 38 (2011) 5958–5966
Table 5 Performance indices (RMSE, VAF and R2) for models.
x0 ¼
1 P 1 þ exp ð xh who Þ
ð6Þ
The activation level of the nodes in the hidden layer is determined in a similar fashion. Based on the differences between the calculated output and the target value an error is defined as follows: RMSE, root mean square error; VAF, value account for; MAPE, mean absolute percentage error.
layer RBF network hidden nodes map distances between input vectors and center vectors to outputs through a nonlinear kernel or radial function. In this study, the two different architectures of ANN (MLP and RBF) were also used to estimation of the swelling percent of the soils. All data were first normalized and divided into three data sets such as; training (60% of all data), test (20% of all data) and verification (20% of all data). In this study Matlab 7.1. (2005) software was used in neural network analyses having a three-layer feed-forward network that consists of an input layer (3 neurons), one hidden layers (2 neurons for MLP, 16 neurons for RBF) and one output layer (Fig. 4). Neuron numbers in hidden layers were selected from a series of trial runs of the networks having 1 neuron to 20 neurons in order to obtain the neuron number in the network having minimum error. In the analyses, network parameters of learning rate and momentum were set to 0.01 and 0.9, respectively. Variable learning rate with momentum (trainLm) as networks training function and tansig as an activation (transfer) function for all layer was used. 3.3. Multi Layer Perceptron (MLP) model Multi Layer Perceptron (MLP) network models are the popular network architectures used in most of the research applications in medicine, engineering, mathematical modeling, etc. In MLP, the weighted sum of the inputs and bias term are passed to activation level through a transfer function to produce the output, and the units are arranged in a layered feed-forward topology called Feed Forward Neural Network (Venkatesan & Anitha, 2006). MLP networks consist of an input layer, one or more hidden layers and an output layer. Each layer has a number of processing units and each unit is fully interconnected with weighted connections to units in the subsequent layer. The MLP transforms n inputs to l outputs through some nonlinear functions. The output of the network is determined by the activation of the units in the output layer as follows:
x0 ¼ f
X
! xh who
ð5Þ
h
where f() is activation function, xh: activation of hth hidden layer node and who: is the interconnection between hth hidden layer node and oth output layer node. The most used activation function to is the sigmoid and it is given as follows:
Fig. 4. MLP and RBF neural network structure used in the study.
E¼
N X L 1X 2 ðt ðsÞ xðsÞ o Þ 2 s o o
ð7Þ
where N is the number of pattern in data set and L is the number of output nodes. The aim is to reduce the error by adjusting the interconnections between layers. The weights are adjusted using gradient descent back propagation (BP) algorithm. The algorithm requires a training data that consists of a set of corresponding input and target pattern values to. During training process, MLP starts with a random set of initial weights and then training continues until set of wih and that of who are optimized so that a predefined error threshold is met between xo and to (after Altun & Gelen, 2004). According to the BP algorithm, each interconnection between the nodes are adjusted by the amount of the weight update value as follows:
dE ¼ gdo xh dwho dE Dwih ¼ g ¼ gdh xi dwih
Dwho ¼ g
ð8Þ ð9Þ
where E is the error cost function given in Eq. (7), do is x0o (to xo) P and dh is x0h ¼ o do who , where x0o ¼ xo ð1 xo Þ and x0h ¼ xh ð1 xh Þ when a sigmoid activation function is used (Altun & Gelen, 2004). Cross-correlation between predicted and observed values (Fig. 5) indicated that the ANN model of MLP is highly acceptable for prediction of S%. RMSE, VAF, MAPE and R2 values are tabulated in Table 5. 3.4. Radial Basis Function (RBF) model Radial Basis Function (RBF) neural network is based on supervised learning. RBF networks were independently proposed by many researchers and are a popular alternative to the MLP. RBF networks are also good at modeling nonlinear data and can be trained in one stage rather than using an iterative process as in MLP and also learn the given application quickly (Venkatesan & Anitha, 2006). The structure of RBF neural network is similar to that of MLP. It consists of layer of neurons. The main distinction is that RBF has a hidden layer which contains nodes called RBF units. Each RBF has two key parameters that describe the location of the function’s center and its deviation or width. The hidden unit measures the distance between an input data vector and the center of its RBF. The RBF has its peak when the distance between its center and that of the input data vector is zero and declines gradually as this distance increases. There is only a single hidden layer in a RBF network there are only two sets of weights, one connecting the hidden layer to the input layer and the other connecting the hidden layer to the output layer. Those weights connecting to the input layer contain the parameters of the basis functions. The weights connecting the hidden layer to the output layer are used to form linear combinations of the activations of the basis functions (hidden units) to generate the network outputs. Since the hidden units are nonlinear, the outputs of the hidden layer may be combined linearly and so processing is rapid. The output of the network is derived from (Foody, 2004).
I. Yilmaz, O. Kaynar / Expert Systems with Applications 38 (2011) 5958–5966
5963
Fig. 5. Cross-correlation of predicted and observed values of S% for ANN (MLP and RBF) models.
yk ðxÞ ¼
M X
wkj /j ðxÞ þ wk0
ð10Þ
j¼1
where M is the number of basis functions, x the input data vector, wkj represents a weighted connection between the basis function and output layer and /j is the nonlinear function of unit j, which is typically a Gaussian of the form
/j ðxÞ ¼ exp
kx lj k2
!
2r2j
ð11Þ
where x and l are the input and the center of RBF unit respectively. rj is the spread of the Gaussian basis function (Foody, 2004). The weights are optimized using least mean square LMS algorithm once the centers of RBF units are determined. The centers can be chosen randomly or using clustering algorithms. In this study, centers were randomly selected from data set. As seen from Table 5 and Fig. 5 of cross-correlation between predicted and observed values (Fig. 5), RBF type of ANN model is highly acceptable for prediction of S%. RMSE, VAF, MAPE and R2 values are also tabulated in Table 5. 3.5. Adaptive neuro-fuzzy inference system (ANFIS) model In ANFIS, both of the learning capabilities of a neural network and reasoning capabilities of fuzzy logic were combined in order to give enhanced prediction capabilities, as compared to using a single methodology alone. The goal of ANFIS is to find a model or mapping that will correctly associate the input values with the target values. The fuzzy inference system (FIS) is a knowledge representation where each fuzzy rule describes a local behavior of the system. The network structure that implements FIS and employs hybrid-learning rules to train is called ANFIS. Let X be a space of objects and x be a generic element of X. A classical set A # X is defined as a collection of elements or objects x 2 X such that each x can either belong or not belong to the set A. By defining a characteristic function for each element x in X, we can represent a classical set A by a set of ordered pairs (x, 0) or (x, 1) which indicates x R A or x 2 A, respectively. On the other
hand, a fuzzy set expresses the degree to which an element belongs to a set. Hence the characteristic function of a fuzzy set is allowed to have values between 0 and 1, which denotes the degree of membership of an element in a given set. So a fuzzy set A in X is defined as a set of ordered pairs:
A ¼ fðx; lAðxÞÞjx 2 Xg
ð12Þ
where lA(x) is called the membership function (MF) for the fuzzy set A. The MF maps each element of X to a membership grade (or a value) between 0 and 1. Usually X is referred to as the universe of discourse or simply the universe. The most widely used MF is the generalized bell MF (or the bell MF), which is specified by three parameters {a, b, c} and defined as (Jang & Chuen-Tsai, 1995)
bellðx; a; b; cÞ ¼ 1=ð1 þ jx ðc=aÞj2b Þ
ð13Þ
Parameter b is usually positive. A desired BellMF can be obtained by a proper selection of the parameter set {a, b, c}. During the learning phase of ANFIS, these parameters are changing continuously in order to minimize the error function between the target output values and the calculated ones (Lee, 1990a,b). The proposed neuro-fuzzy model of ANFIS is a multilayer neural network-based fuzzy system. Its topology is shown in Fig. 6, and the system has a total of five layers. In this connected structure, the input and output nodes represent the training values and the predicted values, respectively, and in the hidden layers, there are nodes functioning as membership functions (MFs) and rules. This architecture has the benefit that it eliminates the disadvantage of a normal feed forward multilayer network, where it is difficult for an observer to understand or modify the network. For simplicity, we assume that the examined fuzzy inference system has two inputs x and y and one output. For a first-order Sugeno fuzzy model, a common rule set with two fuzzy if–then rules is defined as
Rule 1 : If x is A1 and y is B1 ; then f 1 ¼ p1 x þ q1 y þ r1 ;
ð14Þ
Rule 2 : If x is A2 and y is B2 ; then f 2 ¼ p2 x þ q2 y þ r 2 :
ð15Þ
5964
I. Yilmaz, O. Kaynar / Expert Systems with Applications 38 (2011) 5958–5966
Fig. 6. Type-3 fuzzy reasoning (a) and equivalent ANFIS (b) (after Jang, 1993).
As seen from Fig. 6b, different layers of ANFIS have different nodes. Each node in a layer is either fixed or adaptive (Jang, 1993). Different layers with their associated nodes are described below: Layer 1. Every node I in this layer is an adaptive node. Parameters in this layer are called premise parameters. Layer 2. Every node in this layer is a fixed node labeled P, whose output is the product of all the incoming signals. Each node output represents the firing strength of a rule. Layer 3. Every node in this layer is a fixed node labeled N. The ith node calculates the ratio of the ith rules’ firing strength. Thus the outputs of this layer are called normalized firing strengths. Layer 4. Every node i in this layer is an adaptive node. Parameters in this layer are referred to as consequent parameters. Layer 5. The single node in this layer is a fixed node labeled R, which computes the overall output as the summation of all incoming signals. The learning algorithm for ANFIS is a hybrid algorithm, which is a combination of gradient descent and the least-squares method. More specifically, in the forward pass of the hybrid learning algorithm, node outputs go forward until layer 4 and the consequent parameters are identified by the least-squares method (Jang, 1993). In the backward pass, the error signals propagate backwards and the premise parameters are updated by gradient descent. Table 6 summarizes the activities in each pass. The consequent parameters are optimized under the condition that the premise parameters are fixed. The main benefit of the hybrid approach is that it converges much faster since it reduces the search space dimensions of the original pure back propagation
Table 6 Forward and backward pass for ANFIS.
Premise parameters Consequent parameters Signals
Forward pass
Backward pass
Fixed Least-squares estimator Node outputs
Gradient descent Fixed Error signals
method used in neural networks. The overall output can be expressed as a linear combination of the consequent parameters. The error measure to train the above-mentioned ANFIS is defined as (Loukas, 2001):
E¼
n X ðfk fk0 Þ2
ð16Þ
k¼1
Table 7 Different parameter types and their values used for training ANFIS. ANFIS parameter type
Value
MF type
Gauss function 9 Linear 12 18
Number of MFs Output function Number of linear parameters Number of nonlinear parameters Total number of parameters Number of training data pairs Number of checking data pairs Number of testing data pairs
30 129 43 43
I. Yilmaz, O. Kaynar / Expert Systems with Applications 38 (2011) 5958–5966
5965
Fig. 7. Cross-correlation of predicted and observed values of S% for ANFIS model.
Fig. 8. The variation of the values predicted by MR, MLP, RBF and ANFIS models, from the observed values.
where fk and fk0 are the kth desired and estimated output, respectively, and n is the total number of pairs (inputs–outputs) of data in the training set. In this study, a hybrid intelligent system called ANFIS (the adaptive neuro-fuzzy inference system) for predicting S% was also applied. ANFIS was trained with the help of Matlab version 7.1 (2005), SPSS 10.0.1 (1999) package was used for RMSE and statistical calculations. Parameter types and their values used in ANFIS model can be seen in Table 7. According to the RMSE, VAF, MAPE and R2 values (Table 5) and cross-correlation between predicted and observed values (Fig. 7), ANFIS model constructed to predict S% has a high prediction performance. 4. Results In this paper, use of multiple regression (MR), artificial neural network (ANN) and artificial neuro-fuzzy inference system (ANFIS) models, for the prediction of swell percent of soils, was described and compared. According to the results of simple regression analyses, there are statistically meaningful relationships between swell percent with liquid limit, activity and cation exchange capacity. The models of multiple regression, MLP and RBF types of artificial neural network, artificial neuro-fuzzy inference system for the prediction of the swell percent were then constructed using three inputs and one output. The results of the present paper can be drawn as follows: a. The result of the model for prediction of the swell percent showed that the equation obtained from the multiple regression model has a high prediction performance.
b. The ANFIS model for prediction of swell percent revealed a more reliable prediction when compared with the multiple regression model. c. In order to predict the swell percent, ANN models (MLP and RBF), particularly RBF, having three inputs and one output was applied successfully, and exhibited the more reliable predictions than the regression and ANFIS models. As a result of the comparison of VAF, RMSE and MAPE indices and coefficient of correlations (R2) for predicting S%, it was obtained that prediction performance of the ANN-RBF model is higher than those of ANN-MLP, ANFIS and multiple regression. In order to show the deviations from the observed values of S%, the distances of the predicted values from the models constructed from the observed values were also calculated and graphics were drawn (Fig. 8). These graphics indicated that the deviation interval (1.240 to +1.304) of the predicted values from ANN-RBF is smaller than the deviation interval of ANN-MLP (1.535 to +2.117), ANFIS (2.137 to +2.107) and multiple regression (2.721 to +1.754).
5. Conclusions However the accuracies of ANN and ANFIS models may be evaluated relatively similar. It is shown that the constructed ANN models of RBF and MLP exhibit a high performance than ANFIS and multiple regression for predicting S%. The performance comparison showed that the soft computing system is a good tool for minimizing the uncertainties in the soil engineering projects. The use of soft computing will also may provide new approaches and methodologies, and minimize the potential inconsistency of correlations.
5966
I. Yilmaz, O. Kaynar / Expert Systems with Applications 38 (2011) 5958–5966
As is known, the potential benefits of soft computing models extend beyond the high computation rates. Higher performances of the soft computing models were sourced from greater degree of robustness and fault tolerance than traditional statistical models because there are many more processing neurons, each with primarily local connections. However the comparison of the RBF and MLP network models indicates the good predictive capabilities of RBF model. Their accuracies are almost the same. It was found that the time taken by RBF is less than that of MLP in this study. But, limitation of the RBF model is that it is more sensitive to dimensionality and has greater difficulties if the number of units is large. It appears that there is a possibility of estimating swell percent of soils by using the proposed empirical relationships and soft computing models. The population of the analyzed data is relatively limited in this study. Therefore, the practical outcome of the proposed equations and models could be used, with acceptable accuracy, at the preliminary stage of design.
References Al-Rawas, A. A. (1998). The factors controlling the expansive nature of the soils and rocks of northern Oman. Engineering Geology, 53(3–4), 327–350. Altun, H., & Gelen, G. (2004). Enhancing performance of MLP/RBF neural classifiers via an multivariate data distribution scheme. In International conference on computational intelligence (ICCI2004), Nicosia, North Cyprus, 24–29, May 2004 (pp. 1–6). Alvarez, G. M., & Babuska, R. (1999). Fuzzy model for the prediction of unconfined compressive strength of rock samples. International Journal of Rock Mechanics and Mining Sciences, 36, 339–349. ASTM (1994). Annual book of ASTM standards (ASTM, D-4546), Soil and rock (I):D420D4914, V. 04.08 (pp. 693–699). Bache, B. W. (1976). The measurement of cation exchange capacity of soils. Journal of Science Food Agriculture, 27(3), 273–280. Basma, A. A. (1991). Estimating uplift of foundations due to expansion: A case history. Geotechnical Engineering, 22, 217–231. Bell, F. G., Cripps, J. C., Culshaw, M. G., & Entwisle, D. (1993). Volume changes in weak rocks: predictions and measurement. In Anagnostopoulos et al. (Eds.), Geotechnical engineering of hard soils-soft rocks, Balkema, Rotterdam (pp. 925– 932). Bell, F. G., & Jermy, C. A. (1994). Building on clay soils which undergo volume changes. Architectural Science Review, 37, 35–43. BS 1377. (1975). Methods of test for soils for civil engineering purposes. London: British Standards Institution. Bell, F. G., & Maud, R. R. (1995). Expansive clays and construction, especially of lowrise structures: A viewpoint from Natal, South Africa. Environmental and Engineering Geoscience, 1(1), 41–59. Burland, J. B. (1984). Building on expansive soils. In 1st national conf. on the science and technology of buildings with special reference to buildings in hot climates, Khartoum, Sudan, Theme lecture (pp. 925–931). Chen, F. H. (1975). Foundations of expansive soils 280p. Amsterdam, The Netherlands: Elsevier. Christidis, G. E. (1998). Physical and chemical properties of some bentonite deposits of Kimolos Island, Greece. Applied Clay Science, 13(2), 79–98.
Cohen, S., & Intrator, N. (2002). Automatic model selection in a hybrid perceptron/ radial network. Information Fusion: Special Issue on Multiple Experts, 3(4), 259–266. Cohen, S., & Intrator, N. (2003). A study of ensemble of hybrid networks with strong regularization. Multiple Classifier Systems, 227–235. Dhowian, A., Ruwiah, I., & Erol, A. (1985). The distribution and evaluation of expansive soils in Saudi Arabia. Proceedings of second Saudi engineering conference (Vol. 4, pp. 1969–1990). Dhahran: King Fahd University of Petroleum and Minerals. Donaldson, G. W. (1969). The occurrence of problem heave and the factors affecting its nature. Proceedings of second international research and engineering conference on expansive clay soils, Texas (pp. 1969). College Station, TX: A&M Press. Finol, J., Guo, Y. K., & Jing, X. D. (2001). A rule based fuzzy model for the prediction of petrophysical rock parameters. Journal of Petroleum Science and Engineering, 29, 97–113. Foody, G. M. (2004). Supervised image classification by MLP and RBF neural networks with and without an exhaustively defined set of classes. International Journal of Remote Sensing, 25(15), 3091–3104. Gokceoglu, C. (2002). A fuzzy triangular chart to predict the uniaxial compressive strength of Ankara agglomerates from their petrographic composition. Engineering Geology, 66, 39–51. Jang, J. R. (1993). ANFIS: Adaptive-Network-Based Fuzzy Inference System. IEEE Transactions on Systems, Man, and Cybernetics, 23, 665–685. Jang, J. S. R., & Chuen-Tsai, S. (1995). Neuro-fuzzy modeling and control. Proceeding of IEEE, 83, 378–406. Kenneth, J., Wernter, S., & MacInyre, J. (2001). Knowledge extraction from Radial Basis Function networks and Multi Layer Perceptrons. International Journal of Computational Intelligence and Applications, 1(3), 369–382. Lee, C. C. (1990a). Fuzzy logic in control systems: Fuzzy logic controller. I IEEE Transactions on Systems, Man, and Cybernetics, 20, 404–418. Lee, C. C. (1990b). Fuzzy logic in control systems: Fuzzy logic controller. II IEEE Transactions on Systems, Man, and Cybernetics, 20, 419–435. Loh, W., & Tim, L. (2000). A comparison of prediction accuracy, complexity, and training time of thirty three old and new classification algorithm. Machine Learning, 40(3), 203–238. Loukas, Y. L. (2001). Adaptive neuro-fuzzy inference system: an instant and architecture-free predictor for improved QSAR studies. Journal of Medical Chemistry, 44, 2772–2783. Matlab 7.1 (2005). Software for technical computing and Model-Based Design. The MathWorks Inc. Simpson, P. K. (1990). Artificial neural system-foundation, paradigm, application and implementation. New York: Pergamon Press. Singh, T. N., Kanchan, R., Verma, A. K., & Singh, S. (2003). An intelligent approach for prediction of triaxial properties using unconfined uniaxial strength. Mining Engineering Journal, 5, 12–16. Sposito, G. (1989). The chemistry of soils 277p. Oxford University Press. SPSS 10.0.1 (1999). Statistical analysis software (Standard Version). SPSS Inc. Venkatesan, P., & Anitha, S. (2006). Application of a radial basis function neural network for diagnosis of diabetes mellitus. Current Science, 91(9), 1195–1199. Yilmaz, I. (2006). Indirect estimation of the swelling percent and a new classification of soils depending on liquid limit and cation exchange capacity. Engineering Geology, 85(3–4), 295–301. Yilmaz, I. (2008). A case study for mapping of spatial distribution of free surface heave in alluvial soils (Yalova, Turkey) by using GIS software. Computers and Geosciences, 34(8), 993–1004. Yilmaz, I., & Yüksek, A. G. (2008). An example of artificial neural network application for indirect estimation of rock parameters. Rock Mechanics and Rock Engineering, 41(5), 781–795. Yilmaz, I., & Yüksek, A. G. (2009). Prediction of the strength and elasticity modulus of gypsum using multiple regression, ANN, ANFIS models and their comparison. International Journal of Rock Mechanics and Mining Sciences, 46(4), 803–810.