Landslide Susceptibility Zonation through ratings derived from Artificial ...

Report 2 Downloads 55 Views
International Journal of Applied Earth Observation and Geoinformation 12 (2010) 340–350

Contents lists available at ScienceDirect

International Journal of Applied Earth Observation and Geoinformation journal homepage: www.elsevier.com/locate/jag

Landslide Susceptibility Zonation through ratings derived from Artificial Neural Network Shivani Chauhan a , Mukta Sharma b , M.K. Arora a,∗ , N.K. Gupta c a b c

Department of Civil Engineering, Indian Institute of Technology Roorkee, Roorkee 247667, Uttarakhand, India Department of Geology, Delhi University, Delhi, India Institute Computer Center, Indian Institute of Technology Roorkee, Roorkee, India

a r t i c l e

i n f o

Article history: Received 6 August 2009 Accepted 19 April 2010 Keywords: Artificial Neural Network Landslide susceptibility Remote sensing

a b s t r a c t In the present study, Artificial Neural Network (ANN) has been implemented to derive ratings of categories of causative factors, which are then integrated to produce a landslide susceptibility zonation map in an objective manner. The results have been evaluated with an ANN based black box approach for Landslide Susceptibility Zonation (LSZ) proposed earlier by the authors. Seven causative factors, namely, slope, slope aspect, relative relief, lithology, structural features (e.g., thrusts and faults), landuse landcover, and drainage density, were placed in 42 categories for which ratings were determined. The results indicate that LSZ map based on ratings derived from ANN performs exceedingly better than that produced from the earlier ANN based approach. The landslide density analysis clearly showed that susceptibility zones were in close agreement with actual landslide areas in the field. © 2010 Elsevier B.V. All rights reserved.

1. Introduction Landslides in the Himalayas are one of the major and widely spread natural disasters that often strike life and property, and cause a major concern. LSZ is, therefore, important for taking quick and safe mitigation measures. LSZ maps categorize a region according to their potential stability or instability based on a number of factors categorized into preparatory factors such as lithology, geomorphology, and triggering factors such as seismicity, rainfall. The need for LSZ maps at different scales has increased in the recent past to support decision makers at various levels of the territorial planning management. Preparation of LSZ maps requires evaluation of the relationships between various terrain conditions and instances of landslide occurrence. An experienced earth scientist assesses the overall terrain conditions so as to identify the causative factors affecting the occurrence of landslides in a region, which acts as a knowledge and can be input to a LSZ process in several ways (Gong, 1996; Pachauri and Pant, 1992; Anbalagan, 1992). Therefore, an objective procedure is often desired to quantitatively support the landslide studies. Aiming at a higher degree of objectivity and better reproducibility of the susceptibility zonation, a number of studies (Saha et al., 2005; Garrett, 1994) have been car-

∗ Corresponding author. E-mail addresses: [email protected] (S. Chauhan), [email protected] (M. Sharma), [email protected], [email protected] (M.K. Arora), [email protected] (N.K. Gupta). 0303-2434/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jag.2010.04.006

ried out in recent years for LSZ mapping in the Himalaya and other parts of the world. Most of these studies are based on establishing the relationships between the categories of causative factors and the incidences of landslides in a given region through spatial data analyses. These relationships are defined in terms of weights and ratings. Thus, a number of data driven approaches have been proposed, which include logistic regression and multivariate statistical methods (Dai et al., 2000; Ohlmacher and Davis, 2003; Mathew et al., 2005), artificial neural network (Ermini et al., 2005; Gomez and Kavzoglu, 2005), fuzzy relations (Ayalew and Yamagishi, 2005), and neuro-fuzzy approaches (Arora et al., 2004; Kanungo et al., 2006). Each approach is based on a different logic with the ultimate aim to produce an LSZ map in an objective manner, thereby reducing the subjective involvement of the experts. ANN is a useful approach for problems such as regression and classification, since it has the capability of analyzing complex data at varied scales such as continuous, categorical and binary data. The concept of ANN is based on learning from data with known characteristics to derive a set of weighting parameters which are used subsequently to recognize the unseen data (Horton, 1945). Arora et al. (2004), Ermini et al. (2005) and Gomez and Kavzoglu (2005) proposed the use of ANN as a black box model for classification to produce an LSZ map. The three layer (input, hidden and output) ANN consisted of input neurons corresponding to the seven causative factors. The input data to these neurons were a set of normalized attributes of each factor, as obtained from the expert’s opinions. This approach may thus be termed as a semi-objective approach, where connection weights between units in one layer to

S. Chauhan et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 340–350

341

Fig. 1. Location map of study area.

other layers remained hidden. These connection weights may carry key information about the relationship between various causative factors, which can be explored. Kanungo et al. (2006) proposed a combined neuro-fuzzy approach and suggested opening the ANN black box by capturing the connection weights between various neurons in the input, hidden and output layers, and organizing them to determine the weight of each causative factor. These weights were then combined with the ratings of the categories of each causative factor determined through a fuzzy relation concept to produce an LSZ map. In this paper, we propose a modification in the design of ANN, wherein each input neuron corresponds to a category of a causative factor (e.g., a slope category of 30–45◦ pertaining to the causative factor slope). Thus, the connection weights in this ANN design will depict the relationship between different categories of various causative factors. These connection weights for each category are defined as the ratings of each category, which are then integrated to produce an LSZ map. Thus, each category of the causative factor has been considered as an independent variable for the determination of individual rating through ANN. 2. Study area and data The study area belongs to parts of Chamoli and Rudraprayag districts of the State of Uttarakhand, India and covers about 600 km2 (Fig. 1). These districts are well connected by roads and are located on the way to a number of tourist destinations like Badrinath, Joshi-

math and Kedarnath. The topography of the area is highly rugged and dissected with altitudes ranging from 1000 m to 4600 m. The region comprises high hills and mountains with very narrow valleys, deep gorges having very high gradients. A major river Alaknanda passes through the study area. There are several thrusts and faults pass in the area which have rendered the rock mass weak. The Main Central Thrust (MCT) or Vaikrita Thrust passes in the close vicinity of the northern part study area, separating the rocks of the Lesser Himalayas than those of the Higher Himalayas. The MCT is a major tectonic feature of the Himalayas and has brought the crystalline rocks of the Higher Himalayas over the younger sedimentary rocks. The Lesser Himalayas are composed of tectonically compressed blocks of Paleozoic and Mesozoic crystallines, metamorphics, and sedimentary rocks belonging to Almora, Ramgarh and Jaunsar Group of rocks. The area is also characterized by fragile geology and complex tectonics. Several irregularly oriented faults are present. The rock groups in the area are highly folded. The region has also witnessed two major earthquakes in the recent past, one in 1991 in Uttarakashi and the other in 1999 in Chamoli, which caused extensive damage to life and property. These seismic ground movements make the lithology fragile and cause landslides. All these factors along with torrential rainfall make the slopes inherently unstable, and bring about occurrences of landslides in the region. Several remote sensing and GIS-based landslides studies (Saha et al., 2002; Pachauri and Pant, 1992) have been conducted in the past with the aim to produce accurate LSZ maps. Based on

Table 1 Data sources and specific use. Data types

Description

Specific use

Satellite/sensor

Spatial resolution

Year of acquisition

Remote sensing data

IRS PAN IRS LISS-III IRS LISS-IV

5.8 m 23.5 m 5.8 m

1999 2001 2008

Google Earth Data Topographical maps (1963) Geological map (1980) Field data

Various images of study area Google Earth Data Scale: 1:50,000 Scale: 1:250,000 GPS surveys

Land use/land cover, structural features, landslide distribution Landslide distribution DEM: slope, aspect, relative reliefDrainage network Lithology, structural features Landslide distribution, land use/land cover

342

S. Chauhan et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 340–350

these studies, seven key causative factors, namely, slope, aspect, relative relief, lithology, structural features, land use land cover and drainage density, have been identified. Spatial data pertaining to these causative factors were collected from satellite remote sensing images (IRS-1C & P6 satellite sensors PAN, LISS-III and LISS-IV), Survey of India (SOI) topographic maps, geological map (Valdiya, 1980) and field campaigns. Table 1 provides the details of these data along with their usage in the study. These spatial data were appropriately processed and analyzed to prepare seven thematic data layers, as described below.

2.1. Digital Elevation Model (DEM) and its derivatives Surface topography largely controls the flow sources and run off direction, and so limits the density and spatial extent of landslides. Therefore, key terrain attributes such as slope gradient, slope aspect and relative relief were derived from DEM, which represents the spatial variation of elevation (i.e., altitude or height) over an area. The DEM has been generated by digitization of contours from Survey of India topographic maps at 40 m contour interval. After digitization, the vector layer has been converted to raster DEM at a cell size of 23.5 m to match with the spatial resolution of IRS LISS-III sensor data. Slope is the most substantial cause of landsliding. On a slope of uniform isotropic material, increased slope correlates with increased likelihood of failure. In order to assess the contribution of various slope gradients to cause landslides, it is necessary to know the spatial distribution of the slope categories, which can be obtained from the DEM (Dai et al., 2000). Five slope categories, as defined by Anbalagan (1992), Gong (1996) (a) 0–15◦ , (b) 15–24◦ , (c) 25–34◦ , (d) 35–44◦ , (e) ≥45◦ , were considered and represented in the form of slope thematic data layer (Fig. 2a). The aspect map also plays a significant role in slope stability assessment in the Himalayan terrain where most of the southfacing slopes are devoid of vegetation or are scantily vegetated, and experience more orographic rainfall resulting in rapid mass wasting on moderate to steep scarps. Similar to slope, an aspect thematic data layer was also derived from the DEM, which represent nine aspect categories (Mathew et al., 2005) listed as (a) north, (b) northeast, (c) east, (d) southeast, (e) south, (f) southwest, (g) west, (h) northwest and (i) flat (Fig. 2b). Relative relief is also an important causative factor, since landslides occur frequently in high relative relief areas (Gong, 1996). Relative relief is defined as the difference in maximum and minimum elevation values within an area. Relative relief has been computed using the DEM as input. A 5 × 5 pixels neighborhood moving window of 23.5 m cell size has been run over the DEM. The difference between maximum and minimum elevation values in the 5 × 5 neighborhood has been assigned to the central pixel as the relative relief value. The relative relief data layer thus prepared consist of six categories defined in different relative relief intervals, namely, (a) 0–25 m, (b) 25–50 m, (c) 50–75 m, (d) 75–100 m, (e) 100–125 m, and (f) 125–150 m (Fig. 3a).

2.2. Lithology It has been widely recognized that lithology greatly influences the occurrence of landslides, because lithological and structural variations often lead to a difference in strength and permeability of rocks and soils. The lithological units depicted in the geological map by Valdiya (1980) were digitized to produce the lithology thematic data layer which show three broad litho-tectonic units as (a) granite–grandiorite–gneiss, (b) quartzite with intercalated slate and (c) granite (Fig. 3b).

Fig. 2. Thematic data layers: (a) slope and (b) aspect.

2.3. Structural features buffer data layer The study area is traversed by three major thrusts, i.e. Munsiari thrust, Ramgarh thrust and Berinag thrust, and several irregularly oriented faults and folds. These structural features were extracted from the structural geology map of Valdiya (1980) and from a PAN sharpened LISS-III image in the form of lineaments which were further updated from LISS-IV (MX) image recently acquired in April 2008. A total of 38 lineaments were visually interpreted through delineation of abrupt relief changes, sharp tonal contrasts in valleys and cliffs, lithological variations, and straight drainage courses (Fig. 4). Six buffers at 500 m class interval around lineaments were created, and a lineament buffer data layer was formed. The lineament buffer categories were thus defined as (a) 0–500 m, (b) 500–1000 m, (c) 1000–1500 m, (d) 1500–2000 m, (e) 2000–2500 m, and (f) >2500 m (Fig. 5a). 2.4. Drainage density Landslides in hilly areas may also occur due to the erosional activity associated with drainage. Drainage density is defined as total stream length per unit area of a river basin, and has often

S. Chauhan et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 340–350

343

Fig. 5. Thematic data layers: (a) structural features and (b) drainage density. Fig. 3. Thematic data layers: (a) relative relief and (b) lithology.

been used to express the degree of fluvial dissection (Horton, 1945). In this study, the drainage streams were digitized from the topographic maps to create a drainage data layer which was then used to compute the drainage density values at each pixel. These drainage density values were further categorized into five equal categories represented in the form of drainage density data layer. The categories in this layer were (a) 0–3.72 m−1 , (b) 3.72–7.45 m−1 , (c) 7.45–11.17 m−1 , (d) 11.17–14.90 m−1 , and (e) 14.90–18.62 m−1 (Fig. 5b). 2.5. Landuse and landcover

Fig. 4. LISS-IV image draped with lineaments in the study area.

Land use land cover map depicts spatial distribution of vegetative and non-vegetative cover and types of land use practices. The vegetated areas, especially dense with strong and large root systems, help in improving the stability of slopes (Greenway, 1987). Vegetation provides both hydrological and mechanical effects that generally are beneficial to the stability of slopes. In contrast, barren areas and fallow lands destabilize the slopes. Therefore, a land use land cover data layer was prepared using IRS-1C LISS-III remote sensing image as the primary data input along with DEM and NDVI images as additional data inputs to the supervised image classification process, typically known as multi-source classification. Eight dominant land use land cover classes in the area were identified and mapped. These include (a) sparse vegetation, (b) settlement,

344

S. Chauhan et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 340–350

Fig. 6. Thematic data layers: (a) land use land cover layer and (b) existing landslide distribution layer.

(c) barren land, (d) fallow land, (e) agriculture, (f) dense forest, (g) snow, and (h) water (Fig. 6a). 2.6. Landslide distribution layer

of the membership of each terrain unit with regard to the occurrence of landslide (Ermini et al., 2005). The higher the membership value, the more susceptible is the terrain unit to the occurrence of landslide and vice versa. Further, since ANN can process input data at varied measurement scales and units, such as continuous, categorical and binary data, it appears to be an appropriate approach for LSZ mapping (Garrett, 1994). An ANN may be defined as a collection of basic units, called neurons, interconnected with one another. Multi-Layer Perception (MLP) is perhaps the most popular and most widely used ANN, which consists of two layers, input and output, and one or more hidden layers between these two layers. The hidden layers are introduced to increase the network’s ability to model complex functions (Paola and Schowengerdt, 1995). Each layer in a network contains sufficient number of neurons depending upon the application. The input layer is passive and merely receives the data (e.g., the data pertaining to various causative factors). Unlike the input layer, both hidden and output layers actively process the data. The output layer produces the neural network’s results. Thus, the number of neurons in the input and output layers are typically fixed by the application designed. The number of hidden layers and their neurons are typically determined by trial and error (Gong, 1996). There are three stages involved in ANN data processing for a classification problem: the training stage, the weight determination stage, and the classification stage. The training process is initiated by assigning arbitrary initial connection weights which are constantly updated until an acceptable training accuracy is reached. While developing an ANN, the data are commonly partitioned into at two subsets, such as training and testing dataset. It is expected that the training data include all the data belonging to the problem domain. Certainly, this subset is used in the training stage of the model development to update the weights of the network. The adjusted weights obtained from the trained network have been subsequently used to process the testing data in order to evaluate the generalization capability and accuracy of the network. The performance of the networks has been evaluated by determining both training and testing data accuracies in terms of percent correct or overall classification accuracy (Congalton, 1991). Training data from input neurons are processed through hidden neurons to generate an output in the output neuron. The input that a single neuron j in the 1st hidden layer (HA), received from the neurons (i) in its preceding input layer, may be expressed as: netj =

t 

wij pi

(1)

i=1

The identification and mapping of existing landslides constitute a pre-requisite for developing any data driven model for LSZ. In this study, existing landslides were interpreted visually and mapped from high-resolution LISS-IV (MX) image and PAN sharpened multi-spectral image. The interpretation and recognition of landslides were based on the basic image interpretation and other elements, such as tone/color, shape, landform, drainage and vegetation. A total of 154 landslides of varying dimensions were mapped, which were subsequently digitized and rasterized to create a landslide distribution data layer (Fig. 6b). The mapped landslides in this layer were validated through recent Google Earth Data and field campaign conducted during April, 2008. 3. Artificial neural network (ann)—concepts ANNs are generic non-linear function approximators that have been extensively used for problems like pattern recognition and classification. The categorization of a terrain into ordinal zones of landslide susceptibility may also be regarded as a classification problem. Thus, the ANN outputs may be considered as the degree

where wij represents the connection weight between input neuron i and hidden neuron j, pi is the data at the input neuron i and t is the number of input neurons. The output value produced at the hidden neuron j, pj , is the transfer function, f, evaluated as the sum produced within neuron j, netj . So, the transfer function f can be expressed as: pj = f (netj ) =

1 1 + e−netj

(2)

The function f is usually a non-linear sigmoid function that is applied to the weighted sum of input data before the data are processed to the next layer. Similarly, the neural network output value, po , at the output neuron o, is obtained using Eqs. (1) and (2). The training of the ANN may be performed in both supervised and unsupervised modes via a number of algorithms. Among many supervised algorithms, the back-propagation algorithm has been widely used in remote sensing studies, and thus shall be used here. In this algorithm, an error function (E), determined from a training data of known outputs, called as target outputs and the networkderived outputs, is minimized iteratively. The process continues

S. Chauhan et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 340–350

345

until E, given by Eq. (3), converges to some minimum value and the adjusted weights are obtained: n 

E = 0.5

2

(Ti − Oi )

(3)

i=1

where Ti is the target output vector, Oi is the network output vector, and n is the number of training samples. The factor of 0.5 is included for arithmetic convenience. The target values of the output neurons are generally kept between 0 and 1 for every input pixel. The error is back propagated through the neural network design, and is minimized by adjusting the weights between layers. The weight adjustment is performed, using Eq. (3): wij (n + 1) = ıj Oi + ˛wij

(4)

where  is the learning rate, ıj is an index of the rate of change of the error, and ˛ is the momentum. Learning rate is a positive constant that controls the amount of adjustment of the connection weights. The momentum factor is used to accelerate the convergence during the search for the minimum value on the error surface. Once the appropriate weights of all the connections are found, the network is assumed to be trained. These weights are used to determine the class outputs of a testing (known) dataset to evaluate the performance of the network. If the accuracy of the testing dataset is low, it means that though the network has been trained accurately, it has not attained the generalization capability, and, therefore, retraining is required. After the network is trained and tested to the desired accuracy, the adjusted weights are used to determine the output of the entire dataset. In this study, the ANN has been used for LSZ mapping, as described in the next section. 4. Ann for LSZ mapping The ANN was implemented in two different modes to produce LSZ mapping: (i) The conventional ANN black box implementation (ii) The proposed ANN derived ratings implementation In both the cases, a feed forward back-propagation multi-layer artificial neural network with one input layer, two hidden layers and one output layer was considered. 4.1. LSZ mapping through ANN black box implementation The input layer of ANN consists of seven neurons representing seven causative factors pertaining to the study area. The thematic data layer pertaining to each factor depicts the categories of each factor (Table 2). Each category is assigned an attribute value subjectively, depending upon its relative significance in causing landslides. These attribute values were normalized with regard to the highest attribute within the corresponding causative factor and form the input data for the ANN. The output layer of ANN contains a single neuron that represents the presence or absence of existing landslide locations (i.e., a target output of 1 denotes presence and 0 denotes absence). Various ANN architectures were designed by varying the number of neurons in hidden layers. The dataset consisted of a total of 2621 pixels denoting the presence of landslide and an equal number of pixels denoting the absence of landslide. This complete dataset was divided into two mutually exclusive datasets. There are several heuristics to decide on the size of training and testing datasets. In this study, based on earlier studies (e.g., Nelson and Illingworth, 1990; Haykin, 1994; Masters, 1994; Dowla and Rogers, 1995; Looney, 1996; Swingler,

Fig. 7. LSZ map produced from ANN black box approach.

1996) 80% of the total dataset was kept for training, and 20% for testing. The back-propagation learning algorithm, with a learning rate of 0.01 and a momentum factor of 0.2, was implemented to train various ANN architectures. The training process was initiated by assigning arbitrary initial connection weights which were constantly updated until an acceptable training accuracy was achieved. The adjusted weights obtained from the trained network were subsequently used to process the testing data in order to evaluate the generalization capability and accuracy of each network. The training and testing data accuracies of each network are listed in Table 3. From this table, it may be observed that as the neural network architecture changes, the training and testing data accuracies increase up to a certain neural network design, after which a decrease in accuracies occurs. This shows that there is an optimum ANN design for this dataset. Moreover, the training data accuracies differ from testing data accuracies for various architectures. The larger is the difference between training and testing data accuracies, the less is the generalization capability of an ANN. Keeping this in view, an ANN with 7 × 9 × 5 × 1 architecture, providing training accuracy of 75.2% and testing accuracy of 71.7%, which not only depicted less difference between the training and testing data accuracies but also attained high absolute values, was considered as the most appropriate one for the current dataset. Thus, this ANN was used to determine the network output at all the pixels of the image, although the values of neural network outputs for each pixel varied from 0.01 to 0.998. These network outputs were considered to express the landslide susceptibility index (LSI) values of pixels. The higher the value of LSI, the more susceptible is that pixel to the occurrence of landslide. These LSI values were further categorized arbitrarily into five landslide susceptibility zones in an ordinal fashion (Table 4) to produce an LSZ map (Fig. 7). 4.2. LSZ mapping through ANN derived ratings—implementation One of the major limitations of ANN black box implementation was that the weights remained hidden. Since the fundamental premise of LSZ mapping is based on the weights of causative factors and ratings of the categories of those causative factors, it was hypothesized that ANN connection weights depict the importance of a category of a causative factor in the form of ratings. In this ANN implementation, a feed forward back-propagation multi-layer artificial neural network with one input layer, two hidden layers and one output layer was considered. Unlike ANN black

346

S. Chauhan et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 340–350

Table 2 Normalized attributes of categories of thematic layers used as input data to ANN black box approach. Causative factors

Categories

Attributes

Normalized attributes of categories

Slope



0–15 15–25◦ 25–35◦ 35–45◦ >45◦

1 2 3 4 5

0.200 0.400 0.600 0.800 1.000

Aspect

North Northeast East Southeast South Southwest West Northwest Flat

3 4 8 9 7 6 5 2 1

0.333 0.444 0.889 1.000 0.778 0.667 0.555 0.222 0.111

Relative relief

125

1 2 3 4 5 6

0.170 0.330 0.500 0.660 0.830 1.000

Lithology

Granite–grandiorite–gneiss Granite Quartzite with slate

1 2 3

0.667 0.333 1.000

Structural features buffer

0–500 m 500–1000 m 1000–1500 m 1500–2000 m 2000–2500 m >2500 m

6 5 4 3 2 1

1.000 0.830 0.660 0.500 0.330 0.170

Land use/land cover

Sparse vegetation Dense forest Snow Water Settlement Barren land Fallow land Agriculture

5 2 1 3 7 8 6 4

0.625 0.250 0.125 0.375 0.875 1.000 0.750 0.500

Drainage density

14.90 m/m2

1 2 3 4 5

0.200 0.400 0.600 0.800 1.000

box implementation where each causative factor is considered as independent variable, here each category of the causative factor was considered as an independent variable. There were a total of 42 categories. Therefore, 42 new thematic data layers in binary form were derived. In each of these binary data layers, 1 denotes the presence of a category at a pixel and 0 denotes the absence of a category at a pixel. The input layer of ANN, thus, consisted of 42 neurons pertaining to the categories of each of causative factors considered in this study. The output layer of ANN consisted of a single neuron representing the presence or absence of existing landslide locations

(i.e., a target output of 1 denotes presence and 0 denotes absence). Various ANN architectures were designed by varying the number of neurons in hidden layers. The training and testing datasets, and the values of learning rate and momentum factor, as used in ANN black box implementation, did also form the basis of training and testing of various networks in this implementation. This was found necessary for effective comparison of the two implementations. The training and testing accuracies of various ANN architectures selected here are given in Table 5. Maintaining the same logic, an ANN with 42 × 14 × 5 × 1

Table 3 Training and testing data accuracies (I: input layer, H1: first hidden layer, H2: second hidden layer and O: output layer). Bold values indicate the best suitable architecture. ANN Architecture (I × H1 × H2 × O)

Training accuracy (%)

Testing accuracy (%)

Difference in training and testing data accuracies

7×4×1×1 7×5×2×1 7×6×3×1 7×7×4×1 7×8×5×1 7×9×5×1 7×9×6×1 7 × 10 × 5 × 1 7 × 10 × 6 × 1 7 × 11 × 5 × 1

64.9 67.4 68.0 71.0 73.9 75.2 75.1 74.9 73.3 72.4

63.2 65.0 65.3 69.3 70.0 71.7 68.0 67.6 67.1 66.8

1.7 2.4 2.7 1.7 3.9 3.5 8.1 8.3 10.2 11.6

S. Chauhan et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 340–350 Table 4 Classification of LSI values obtained from ANN black box approach into landslide susceptible zones.

347

Table 6 Weights of categories corresponding to respective causative factors derived through proposed ANN approach.

Range of LSI values

Landslide susceptibility zones

Causative factors

Categories

≤0.1 >0.1 and ≤0.25 >0.25 and ≤0.8 >0.8 and ≤0.9 >0.9

VLS LS MS HS VHS

Slope

0–15◦ 15–25◦ 25–35◦ 35–45◦ >45◦

−105.1 −29.7 −25.4 68.6 −66.7

Aspect

North Northeast East Southeast South Southwest West Northwest Flat

−27.4 −77.2 81.3 85.1 68.2 80.5 −46.1 −253.1 −58.3

Relative relief

125

−94.2 64.5 −56.1 −53.2 −104.5 101.5

Lithology

Granite–grandiorite–gneiss Granite Quartzite with slate

−150.7 20.8 38.4

Structural features (buffer)

0–500 m 500–1000 m 1000–1500 m 1500–2000 m 2000–2500 m >2500 m

376.5 93.4 45.9 5.45 −241.2 −311.8

Land use/land cover

Sparse vegetation Dense forest Snow Water Settlement Barren land Fallow land Agriculture

−210.4 −284.1 6.53 −67.0 −20.5 185.1 174.3 79.0

Drainage density

14.90 m/m2

−15.2 −112.8 −92.5 79.3 −91.0

architecture, providing training accuracy of 72% and testing accuracy of 69.2%, was considered as the most appropriate one for the current dataset. The updated weight matrices obtained for input–hidden connections (42 × 14), hidden–hidden connections (14 × 5) and hidden–output connections (5 × 1) were captured for further analysis. Simple matrix multiplication was performed on each of these weight matrices in a sequential manner so as to obtain a 42 × 1 weight matrix. Thus, 42 different values of weights were obtained from this weight matrix. These weights were assumed to depict the importance of each category to the occurrence of landslides, and were treated as ratings of each of the 42 categories (as shown in Table 6). The higher is the value of rating of a category, the more influence it has on the occurrence of landslide. From Table 6, it can be inferred that both positive and negative ratings were obtained. Positive ratings were assumed to imply that those categories had significant impact on the occurrence of landslides, whereas negative ratings were assumed to imply that the occurrence of landslides was negatively related to those categories. By assigning the ratings of the 42 categories in the corresponding binary layers of categories, rating layers were generated. These layers were integrated by simple arithmetic overlay operation, using Eq. (5) to find landslide susceptibility index (LSI) values: LSI =

c 

Rl

(5)

l=1

where Rl represents the rating layers of c categories of thematic data layers. The LSI values were found to range between −3803.4 and 2105.2. The observed mean (0 ) and standard deviation ( 0 ), obtained from the probability distribution curve of these LSI values, were obtained as −559.418 and 884.183, respectively. The success rate curve approach, as suggested by Saha et al. (2005), was used to classify the LSI values into five different landslide susceptible zones. The success rate curve method defines the boundaries of a dataset based on mean (0 ) and standard deviation ( 0 ) of data. These boundaries are determined by (0 − 1.5m 0 ), (0 − 0.5m 0 ), (0 + 0.5m 0 ) and (0 + 1.5m 0 ), where m is a positive, non-zero value. Five representative success rate curves corresponding to m = 1, 1.1 and 1.2 were plotted (Fig. 8). It may be observed that for the first 20% of the area in VHS and HS zone, the curves corresponding to m = 1, 1.1, 1.2 show landslide occurrences of 46.3%, 50% and 47%,

Ratings from ANN

respectively. Based on this analysis, the LSZ map, corresponding to m = 1.1, appeared to be the most appropriate one for the study area. Accordingly, the boundaries of landslide susceptible zones were fixed at LSI values of −2018.32, −1045.72, −73.12, and 899.49. The LSZ map, thus, produced is given in Fig. 9.

Table 5 Training and testing data accuracies (I: input layer, H1: first hidden layer, H2: second hidden layer and O: output layer). Bold values indicate the best suitable architecture. ANN Architecture (I × H1 × H2 × O)

Training

Testing

Difference in training and testing data accuracies

42 × 10 × 3 × 1 42 × 11 × 4 × 1 42 × 11 × 5 × 1 42 × 12 × 4 × 1 42 × 13 × 5 × 1 42 × 14 × 5 × 1 42 × 14 × 6 × 1 42 × 15 × 5 × 1 42 × 15 × 6 × 1 42 × 16 × 5 × 1

63.2 66.4 68.0 71.0 71.3 72.0 72.2 73.0 73.5 74.4

57.0 58.1 60.0 62.3 65.6 69.2 69.0 68.6 67.7 66.1

6.2 8.3 8.0 8.7 5.7 2.8 3.2 3.2 5.8 8.3

348

S. Chauhan et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 340–350

Table 7 Landslide distribution in various landslide susceptible zones. Landslide susceptible zones

VHS HS MS LS VLS

ANN black box approach

% area of identified zones (a)

% area of observed landslides per class (b)

11.60 27.40 46.15 8.80 6.05

29.3 42.7 27.4 0.26 0.34

Ratings derived from proposed ANN approach

Landslide density (b/a) 2.52 1.56 0.59 0.29 0.05

Fig. 8. Success rate curves for selecting the appropriate value of m in proposed ANN approach.

5. Results and discussions The two LSZ maps produced from two ANN approaches have been evaluated comparatively in respect of the distribution of existing landslides in the area, on the basis of landslide density, by plotting the best-fit curves, and finally through receiver operating curves (ROC). Landslide density is defined as the ratio of the exist-

Fig. 9. LSZ map produced from the proposed ANN approach.

% area of identified zones (a)

% area of observed landslides per class (b)

Landslide density (b/a)

4.5 26.9 40.7 22.0 5.9

24.8 40.8 29.4 5.0 0

5.51 1.52 0.72 0.23 0.0

ing landslide area in percent to the area of each landslide hazard zone in percent, and is computed here on the basis of the number of pixels in the image. However, before this evaluation is discussed, it is expedient to conduct an analysis of the ratings of categories derived from the ANN approach which provides additional insight into LSZ mapping. Structural buffer category 0–500 m (i.e., the minimum distance from the structural features namely thrust and faults marked as lineaments), showed the highest positive rating of 376.5 among all 42 categories, and thus was considered as the most significant category influencing landslides. In other structural buffer categories, a decreasing trend of ratings with the increase in distance from structural feature was observed (Table 6). High negative rating of structural buffer categories, 2000–2500 m and >2500 m, indicate that these categories do not have any influence and thus buffer categories around structural features may not be defined beyond 2000 m in studies on LSZ mapping. The land use land cover categories, barren land and fallow land, showed high positive ratings of 185.1 and 174.3, respectively. This may be explained from the fact that barren or fallow lands are covered with loose soil material which tends to erode, causing more susceptibility to landslides. In a slope map, the slope category 35–55◦ was found to have high rating value of 68.6, illustrating a strong positive relationship with the occurrence of landslides (Table 6). Among the aspect categories, the north-facing slopes ratings were relatively low, which increased with the orientation angle, reaching the highest for south and the SE facing slopes. The south, SE, SW facing slopes are usually sun-facing and thus are devoid of vegetation growth. These slopes are, therefore, more susceptible to landslides and hence got high ratings from the ANN approach. The relative relief category, namely, relative relief greater than 125 m, had highest rating value vis a vis the other categories of this factor. Among the lithology categories, quartzite, with a rating of 38.4, was found to be the most susceptible to landslides. This category belongs to Nagthat-Berinag Formation of Jaunsar Group of rocks, and lies exposed in the central part of the study area. The joint planes of these rock formations are the planes of weakness facilitating the movement of rocks. Lastly, four out of five drainage density categories got negative ratings which show their insignificant relationships with the occurrence of landslides. Only the category with drainage density 11.17–14.90 m−1 showed positive correlation with the landslides. This category corresponds to the areas near river channels and drainage lines. The above analysis clearly illustrates that the ANN has been able to produce realistic values of the ratings of categories of various factors in an objective manner, thereby eliminating the expert’s subjective opinions. However, through ANN black box approach, this information cannot be obtained. The LSZ maps, produced from both the ANN approaches, were evaluated with regard to the distribution of existing landslides in the area, as obtained from satellite data, Google Earth Data and field

S. Chauhan et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 340–350

Fig. 10. Best-fit density curve for LSZ map from ANN black box approach.

data verifications. The areal distribution of landslide susceptibility zones and the landslide densities for both the approaches are given in Table 7. It may be seen that in the case of ANN black box approach, 72% of the observed landslides fall in 39% of the total area categorized into very high and high susceptibility zones, whereas, in the case of the proposed approach, 65.6% of observed landslides fall in 31.4% of identified very high and high susceptibility zones which, in fact, should be the case (i.e., areas belonging to very high and high susceptibility zones were further refined). It may also be corroborated from landslide density values. Usually, an ideal LSZ map should have the highest landslide density for VHS zone as compared to other zones, and there ought to be a decreasing trend of landslide density values successively from VHS to VLS zone. It may be observed from LSZ maps (Figs. 7 and 9) that the landslide density for VHS zone of LSZ maps is higher than that obtained for other susceptibility zones. There is also a decreasing trend of landslide density values from VHS zone to VLS zone. Thus, based on the landslide density values of different landslide susceptibility zones and their trend from VHS to VLS zones for the two LSZ maps, it may be inferred that the proposed approach is significantly better than ANN black box approach for LSZ mapping. Also, in black box approach, in 46.15% of moderately susceptible zone, 27.4% landslides fall. 0.6% landslides fall in 14.85% of area classified into low susceptible and very low susceptible zones. In the proposed ANN approach, it may be seen that 29.4% of observed landslides fall in 40.7% of moderate susceptible zone. In 22% of low susceptible zone, 5% of observed landslides fall. 5.9% of the total area is classified into very low susceptible zone which has no observed landslides. A very large area of about 29% is classified as very high susceptible zone in ANN black box model which does not show any defined pattern, and is distributed overall in the map, whereas LSZ map, obtained from ANN derived ratings, has been able to confine very high susceptible zone largely concentrated around Chamoli area. Further, best-fit quadratic regression curves of landslide density versus landslide susceptible zones were drawn for the two approaches as shown in Figs. 10 and 11. These curves will normally show the general trend of the results by indicating a gradual and smooth decrease in the density from the very high susceptible zone to the very low susceptible zone. The best-fit curve produced from the proposed ANN approach shows this trend, while in the case of ANN black box approach a sudden increase in density in very low susceptibility zone may be observed.

349

Fig. 11. Best-fit density curve for LSZ map from the proposed ANN approach.

Finally, the acceptability of LSZ map produced from the proposed approach may further be strengthened through ROC curves (Swets, 1988). It is obtained by plotting true positive rate along Yaxis and false positive rate along X-axis. True positive rate is the number of correctly classified predicted landslide pixels over the total predicted landslides, and is represented on the Y-axis. False positive rate is the number of incorrectly classified landslide pixels over the total predicted no landslide pixels, and is represented on the X-axis. The area under ROC curves (AUC) constitutes one of the most commonly used accuracy statistics for the prediction models in natural hazard assessments. The minimum value of AUC is 0.5 means model which does not accurately predict the occurrence of landslide, while the maximum value of that is 1 which denotes perfect prediction. To assess the performance of two ANN approaches, the ROC curves were also prepared. For this purpose, a test data set was obtained in the form of randomly selected pixels from landslide bodies and no landslide area. The ROC curve for the map derived from ANN black box (AUC is 0.84) and the map derived from rat-

Fig. 12. ROC curves for LSZ maps produced by ANN black box and the proposed ANN approach.

350

S. Chauhan et al. / International Journal of Applied Earth Observation and Geoinformation 12 (2010) 340–350

ings extracted from ANN (AUC is 0.88) are shown in Fig. 12. These curves depict that although both the models have been successful in predicting the probability of landslide susceptibility for the study area, the AUC estimate for the LSZ map, derived from the proposed ANN approach, is higher than that obtained from ANN black box approach. These results sufficiently demonstrate that the proposed ANN approach for the derivation of ratings of the categories of factors, affecting the landslides, portrays actual scenario of landslide occurrences in a given region. 6. Conclusions In this paper, a new approach for LSZ mapping based on the ratings derived from an ANN model was proposed. The study was conducted in a landslide prone area in the Himalayan Region. These ratings depicted the specific individual influence of each category on landslide occurrences. The structural buffer category 0–500 m, with the highest rating of 376.5, was found to be the most influential among the 42 categories. This supports the observation that landslides are common in the region and are characterized by a number of active faults. The evaluation of LSZ mapping through landslide density analysis, best-fit quadratic curves and ROC curves clearly demonstrated the efficacy of the proposed approach and provided an accurate representation of the actual scenario of landslide occurrences in the region. Acknowledgements This paper is an outcome of a landslide based study under a Department of Science and Technology (DST) sponsored research project, Govt. of India. The authors acknowledge with thanks the contribution of Dr. Smita Jha, faculty in English at IIT Roorkee in vetting this manuscript. References Anbalagan, R., 1992. Landslide susceptibility evaluation and zonation mapping in mountainous terrain. Eng. Geol. 32, 269–277. Arora, M.K., Das Gupta, A.S., Gupta, R.P., 2004. An artificial neural network approach for landslide hazard zonation in the Bhagirathi (Ganga) Valley, Himalayas. Int. J. Rem. Sens. 25, 559–572. Ayalew, L., Yamagishi, H., 2005. The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 65, 15–31.

Congalton, R.G., 1991. A review of assessing the accuracy of classifications of remotely sensed data. Rem. Sens. Environ. 37, 35–46. Dai, F.C., Lee, C.F., Li, J., Xu, Z.W., 2000. Assessment of landslide hazard on the natural terrain of Lantau Island, Hong Kong. Environ. Geol. 40, 381–391. Dowla, F.U., Rogers, L.L., 1995. Solving problems in environmental engineering and geosciences with artificial neural Networks. MIT Press, Cambridge, MA. Ermini, L., Catani, F., Casagli, N., 2005. Artificial neural networks applied to landslide susceptibility assessment. Geomorphology 66, 327–343. Garrett, J., 1994. Where and why artificial neural networks are applicable in civil engineering. J. Comput. Civil Eng. 8, 129–130. Gomez, H., Kavzoglu, T., 2005. Assessment of shallow landslide susceptibility using artificial neural networks in Jabonosa River Basin, Venezuela. Eng. Geol. 78, 11–27. Gong, P., 1996. Integrated analysis of spatial data for multiple sources: using evidential reasoning and artificial neural network techniques for geological mapping. Phonogram. Eng. Rem. Sens. 62, 513–523. Greenway, D.R., 1987. Vegetation and slope stability. In: Anderson, M.G., Richards, K.S. (Eds.), Slope Stability. Wiley, Chichester, UK, pp. 187–230. Haykin, S., 1994. Neural Networks: A Comprehensive Foundation. Macmillan, New York. Horton, R.E., 1945. Erosional development of streams and their drainage basins: hydro physical approach to quantitative morphology. Bull. Geol. Soc. Am. 56, 275–370. Kanungo, D.P., Arora, M.K., Starker, S., Gupta, R.P., 2006. A comparative study of conventional, ANN black box, fuzzy and combined neural and fuzzy weighting procedures for landslide susceptibility zonation in Darjeeling Himalayas. Eng. Geol. 85 (3 and 4), 347–366. Looney, C.G., 1996. Advances in feedforward neural networks: demystifying knowledge acquiring black boxes. IEEE Transactions on Knowledge and Data Engineering 8 (2), 211–226. Masters, T., 1994. Practical Neural Network Recipes in C++. Academic Press, Boston, MA. Mathew, J., Jha, V.K., Rabat, G.S., 2005. Application of binary logistic regression analysis and its validation for landslide hazard mapping in part of Narwhal Himalaya, India. Int. J. Rem. Sens. 28, 2257–2275. Nelson, M., Illingworth, W.T., 1990. A Practical Guide To Neural Nets. AddisonWesley, Reading, MA. Ohlmacher, C.G., Davis, J.C., 2003. Using multiple logistic regression and GIS technology to predict landslide susceptibility in northeast Kansas, USA. Eng. Geol. 69, 331–343. Pachauri, A.K., Pant, M., 1992. Landslide hazard mapping based on geological attributes. Eng. Geol. 32, 81–100. Paola, J.D., Schowengerdt, R.A., 1995. A review and analysis of back propagation neural networks for classification of remotely sensed multi-spectral imagery. Int. J. Rem. Sens. 16, 3033–3058. Saha, A.K., Gupta, R.P., Arora, M.K., 2002. GIS-based landslide hazard zonation in the Bhagirathi (Ganga) Valley, Himalayas. Int. J. Rem. Sens. 23 (2), 357–369. Saha, A.K., Gupta, R.P., Starker, I., Arora, M.K., Csaplovics, E., 2005. An approach for GIS based statistical landslide susceptibility zonation—with a case study in the Himalayas. Landslides 2, 61–69. Swets, J.A., 1988. Measuring the accuracy of diagnostic systems. Science 240, 1285–1293. Swingler, K., 1996. Applying Neural Networks: A Practical Guide. Academic Press, New York. Valdiya, K.S., 1980. Geology of Kumaon Lesser Himalayas. Wada Institute of Himalayan Geology, Defraud, India, p. 291.