On-site earthquake early warning with multiple ... - Semantic Scholar

Report 2 Downloads 86 Views
Computers & Geosciences 58 (2013) 1–7

Contents lists available at SciVerse ScienceDirect

Computers & Geosciences journal homepage: www.elsevier.com/locate/cageo

On-site earthquake early warning with multiple regression analysis: Featuring two user-friendly applications for Excel J.P. Wang a,n, Duruo Huang a, Su-Chin Chang b, Logan Brant c a

Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong Department of Earth Sciences, The University of Hong Kong, Hong Kong c Department of Civil Engineering and Engineering Mechanics, Columbia University, NYC, USA b

art ic l e i nf o

a b s t r a c t

Article history: Received 6 December 2012 Received in revised form 8 April 2013 Accepted 22 April 2013 Available online 27 April 2013

A variety of user-friendly spreadsheet templates have been developed for geoscience studies. However, the use of the built-in matrix functions within spreadsheet programs, such as Excel, is not particularly straightforward, lowering the value of spreadsheet programs for matrix-based computations, such as multiple regression analyses. Therefore, this study first developed two applications for Excel to perform multiple regression analyses in a much more user-friendly manner. Then using earthquake time histories from a reputable database, a series of regression analyses were performed. A new framework for on-site earthquake early warning based on multiple regression analyses is presented as an alternative to conventional models which were developed with single regression analyses. & 2013 Elsevier Ltd. All rights reserved.

Keywords: Excel Multiple regression analysis Earthquake early warning

1. Introduction Regression analysis is commonly used in a variety of research applications. Single regression analysis, usually referred to as simple regression analysis, is used to characterize the correlation between a single independent variable (X) and a dependent variable (Y). In contrast, multiple regression analysis is used to find the correlation between many Xis and a single Y. Understandably, multiple regression analysis results in a more tedious calculation, although fundamentally similar to single regression analysis. Excel is a widely used tool for data reductions and quantitative analyses. Its user-friendly interface and low cost contribute to its general acceptance (Wang and Huang, 2012). With this in mind, it is not surprising that a variety of Excel applications and spreadsheet templates have been developed for geoscience analyses (Biddle et al., 1995; Grossenbacher et al., 1996; Keskin, 2002; Martin, 1996; Mayborn and Lesher, 2011; Ozcep, 2010; Tindle and Webb, 1994; Wang et al., 2012a,2013). Excel contains several built-in functions helpful for performing single regression analysis (e.g., SLOPE, INTERCEPT). However, there are no similar functions available for multiple regression analysis. One reason for this is because the built-in matrix functions (e.g., MMULT) are designed to operate very differently from the single regression analysis functions. Unlike most functions which use only a single cell, matrix functions occupy a range of cells. To n

Corresponding author. Tel.: +852 2358 8482; fax: +852 2358 1534. E-mail address: [email protected] (J.P. Wang).

0098-3004/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.cageo.2013.04.018

perform matrix functions three buttons, “Shift, Control and Enter” must be pressed simultaneously, rather than simply pressing “Enter”, which is common when executing most other functions in Excel. This study presents two spreadsheet applications for Excel developed to effectively perform matrix calculations and multiple regression analysis. These applications are later used to perform a series of regression analyses on earthquake data, from which a new framework for on-site earthquake early warning is suggested. The details of these applications and an overview of the new earthquake early warning framework are presented in this paper.

2. On-site earthquake early warning Earthquake early warning is considered a practical solution to earthquake hazard mitigation (Allen and Kanamori, 2003; Kanamori et al., 1997; Satriano et al., 2008; Wu et al., 2007; Wu and Kanamori, 2005a, 2005b, 2008a, 2008b; Hsiao et al., 2011). It works by sending out warning messages and exercising timely preventive measures, such as immediately stopping moving trains, before the arrival of the destructive ground motion. On-site systems and regional systems are the two main types of earthquake early warning. As shown in Fig. 1, on-site systems utilize the first few seconds of an earthquakes motion to forecast the peak motion for that specific location. Alternatively, regional systems analyze motions detected by the stations close to the epicenter of occurring earthquakes, then send out warning messages to other distant locations.

J.P. Wang et al. / Computers & Geosciences 58 (2013) 1–7

Displacement (cm)

Velocity (cm/sec)

Acceleration (g)

2

0.5

1994 Northridge Earthquake, Mw = 6.69 Canoga Park Station, Rrup = 14.7 km

PGA

Precursors

0.0

-0.5 10 0 -10

PGV

6

0 PGD

-6 -3

0

3

6

9

12

15

18

21

24

Time after P-wave arrival (sec) Fig. 1. Schematic diagrams illustrating the concept of on-site earthquake early warning: utilizing the correlation between early motion and peak motion in earthquake time histories. Note both precursors and peaks can be in ground displacement, velocity, acceleration, etc.

The key to an on-site early warning system is a robust correlation between early motions and peak motions in an earthquake time history. A useful relationship must interpret the motion as early as possible in order to deliver a warning as quickly as possible. In most scenarios the warning message is received only seconds before the destructive shaking begins. Following these criteria, several empirical correlations relating early motions to peak motions have been proposed for on-site early warning, such as the correlation between early ground displacement and peak ground velocity developed from as many as 800 earthquake time histories recorded at locations such as Taiwan, Japan, and Southern California (Wu and Kanamori, 2008b). These models were developed using single regression analysis utilizing the correlation between a single precursor and the peak motion. It is understood that the no empirical relationship can perfectly model actual earthquake data, which means that there will be some uncertainty associated with any model. In order to account for such uncertainty, Wang et al. (2012b) developed a risk-based framework for deciding whether or not an early warning should be activated. This decision-making calculates the respective risk of both alternatives (i.e., activating or not activating the warning) and selects the alternative with the lowest risk.

3. Multiple-regression applications for excel 3.1. Option one

Fig. 2. Screenshot of the Option One application for performing multiple regression analysis on Excel; the upper and lower parts of the spreadsheet are outputs and inputs, respectively.

Option one involved developing an application from scratch, without using any of the built-in functions contained within Excel. Fig. 2 shows the interface of the spreadsheet application. Users simply need to input data and select the “Multiple Regression” button to run the analysis. Once the analysis is complete, the outputs containing model parameters and R2 values are provided. This application is created with two subroutines coded with Visual Basic Editor integrated in Excel. Subroutine “Start Analysis” clears any values from the input cells and “Multiple Regression” performs the multiple regression analysis. Due to the dimension of

the spreadsheet, the application is capable of analyzing up to around 100 dependent variables (Xis) and 50,000 samples, quantities which should be adequate for most studies. Subroutine “Multiple Regression” contains computer codes executing each step of the calculation for multiple regression analysis. The algorithms and corresponding codes are given in the Appendix. The key to this development is using Excel's matrix functions in a more effective manner. For example, function

J.P. Wang et al. / Computers & Geosciences 58 (2013) 1–7

MMULT, can be used to perform matrix multiplication with the following scripts: Dim X(n1, n2) as double Dim Y(n2, n3) as double Dim Z(n1, n3) as double Z¼Application.WorksheetFunction.MMult(X, Y) where n1, n2, and n3 denote the dimension of the matrix. X and Y are the input and Z is the output. Similar programming can be applied to other matrix operations, such as inverse matrix calculation, for completing the development of this application. With such techniques, matrix operations can be accomplished in a user-friendly manner without pre-selecting the cell range manually. 3.2. Option two Another alternative is to create an application using Excel's existing built-in functions, but modified to create a more userfriendly application. After the modification, users only need select the “Easy Excel Multiple Regression” button on the spreadsheet, rather than follow the conventional procedure to activate the built-in functions embedded within Excel. The modification is assisted with subroutine “Easy Excel Multiple Regression,” along with two simple user-defined functions which are used to convert an integer into text, e.g., “1” becoming “A.” Fig. 3 shows the spreadsheet interface for this modified application. Note that this modification aims to create a more user-friendly operation, but does not actually modify the computation executed by the built-in function. That approach would not be possible since the source code of Excel is not made available for modification. As a result, the output given by this application is in the same format as the built-in function. The complete code of this application is provided in the Appendix. 3.3. Verification Both applications were verified with a benchmark example given in a statistics and probability textbook (Chang, 1994). The results presented in Figs. 2 and 3 show that the calculations are

3

identical to the benchmark example, demonstrating the application's reliability. 3.4. Option one or option two? Although both applications perform the same task we, as the developers, prefer the format of Option One over Option Two, simply because Option One is nothing but a usual programming task to code the given algorithms step by step. In contrast, Option Two was created to work with a specific Excel function “ATPVBAEN.XLA!Regress” and it is just not straightforward to apply. Although the coding in Option Two is shorter than Option One, more effort was required to develop Option Two than Option One, largely because of the time spent searching for the specific command in Excel and then learned how to correctly apply it. To the best of our knowledge, Option One is compatible with all versions of Excel, whereas Option Two is only compatible with Excel 2003, and possibly earlier versions. This is because “ATPVBAEN.XLA!Regress” is no longer available in Excel 2007 and Excel 2010 for creating applications such as Option Two.

4. Earthquake time histories The next task in this study involved developing regression analyses from earthquake time histories. For this task we downloaded 90 earthquake time histories, summarized in Table 1, from the NGA (Next Generation Attenuation) database managed by the Pacific Earthquake Engineering Research Center (PEER, 2011). This database contains a total of around 3000 earthquake time histories and has been used for previous earthquake studies (Campbell and Bozorgnia, 2010; Chiou et al., 2008; Wang et al., 2012a).

5. On-site relationships with multiple regression analysis Following a suggested correlation between early ground displacement (GD) and peak ground velocity (PGV), this study evaluated the levels of correlation between GD1 and PGV,…, up to GD4 and PGV. (For example, GD1 denotes the ground displacement after one second.) Next, with multiple regression analysis,

Fig. 3. Screenshot of the Option Two application for multiple regression analysis; the right and left parts of the spreadsheet are inputs and outputs, respectively.

4

J.P. Wang et al. / Computers & Geosciences 58 (2013) 1–7

Table 1 Summary of earthquake time histories used in this study; in the first column from the left, “A” denotes the M5.5 Baja earthquake in 1987, “B” is the M4.92 Big Bear City earthquake in 2003, “C” is the M6.63 Borrego Mountain earthquake in 1968, “D” is the M5.77 Chalfant Valley earthquake in 1986, “E” is the M6.36 Coalinga earthquake in 1983, “F” is the M5.74 Coyote Lake earthquake in 1979, and “G” is the M5.7 California Gulf earthquake in 2001. All these events occurred in California. Earthquake

Station name

Source-to-site dist. (km)

A1 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15 B16 B17 B18 B19 B20 B21 B22 B23 B24 B25 C1 C2 D1 D2 D3 D4 D5 E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13 E14 E15 E16 E17 E18 E19 E20 E21 E22 E23 E24 E25 E26 E27 E28 E29 E30 E31 E32 E33 E34 E35 E36

Cerro Prieto Anza—Tripp Flats Training Cabazon Colton—Kaiser Medical Clinic Devore—Devore Water Company Forest Falls Post Office Highland Fire Station Indio-Jackson Road Leona Valley-Fire Station #1 Los Angeles-Acosta Residence Mentone Fire Station #9 Mill Creek Ranger Station Morongo Valley North Palm Springs Fire Sta #36 Palmdale Fire Station Paradise Springs-Camp Office Pasadena—USGS/NSMP Office San Bernardino—Co Service-Freefield San Bernardino—Del Rosa Wk Sta San Bernardino—Fire Sta. #10 San Bernardino—Fire Sta. #7 San Bernardino—Fire Sta. #9 San Bernardino—Lincoln School San Bernardino—Mont. Mem Pk San Bernardino—N Verdemont Sch San Bernardino—Serrano School LA—Hollywood Stor FF San Onofre-So Cal Edison Benton Bishop—LADWP South St Bishop-Paradise Lodge Lake Crowley—Shehorn Res. Zack Brothers Ranch Cantua Creek School Parkfield—Cholame 12W Parkfield—Cholame 1E Parkfield—Cholame 2E Parkfield—Cholame 2WA Parkfield—Cholame 3E Parkfield—Cholame 3W Parkfield—Cholame 4AW Parkfield—Cholame 4W Parkfield—Cholame 5W Parkfield—Cholame 6W Parkfield—Cholame 8W Parkfield—Fault Zone 1 Parkfield—Fault Zone 10 Parkfield—Fault Zone 11 Parkfield—Fault Zone 12 Parkfield—Fault Zone 14 Parkfield—Fault Zone 15 Parkfield—Fault Zone 16 Parkfield—Fault Zone 2 Parkfield—Fault Zone 3 Parkfield—Fault Zone 4 Parkfield—Fault Zone 6 Parkfield—Fault Zone 7 Parkfield—Fault Zone 8 Parkfield—Fault Zone 9 Parkfield—Gold Hill 1W Parkfield—Gold Hill 2E Parkfield—Gold Hill 2W Parkfield—Gold Hill 3E Parkfield—Gold Hill 3W Parkfield—Gold Hill 4W Parkfield—Gold Hill 5W Parkfield—Gold Hill 6W Parkfield—Stone Corral 2E Parkfield—Stone Corral 3E

4.5 78 43.1 50.6 51.7 25.5 38.9 84.1 135.1 119.1 36.9 31.7 36.9 49.7 118.7 87.5 118.7 46.4 40.4 47.4 43.5 51.4 45.4 49.3 48.7 38.9 222.4 129.1 24.3 23.5 15.1 24.4 6.4 24 55.8 43.7 42.9 44.7 41 45.7 47.6 46.4 48.7 50.2 51.8 42 31.6 28.5 29.3 29.5 29.4 27.7 39 37.2 34.6 32.9 31.2 29.9 31.7 36.1 33.1 37 30.1 39.1 41.1 43.6 47.9 36.4 34

Table 1 (continued ) Earthquake

Station name

E37 E38 E39 E40 E41 E42 E43 E44 E45 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 G1 G2

Parkfield—Stone Corral 4E Parkfield—Vineyard Cany 1E Parkfield—Vineyard Cany 1W Parkfield—Vineyard Cany 2W Parkfield—Vineyard Cany 3W Parkfield—Vineyard Cany 4W Parkfield—Vineyard Cany 6W Pleasant Valley P.P.-yard Slack Canyon Coyote Lake Dam (SW Abut) Gilroy Array #1 Gilroy Array #2 Gilroy Array #3 Gilroy Array #4 Gilroy Array #6 Halls Valley SJB Overpass—Bent 3 g.l. SJB Overpass—Bent 5 g.l. San Juan Bautista—24 Polk St Bonds Corner Calexico Fire Station

Source-to-site dist. (km) 31.6 26.4 29.1 30.4 32.2 34.5 40.9 8.4 27.5 6.1 10.7 9 7.4 5.7 3.1 33.8 20.7 20.7 19.7 79.7 85.6

Table 2 Summary of regression models with different combinations of precursors. SD denotes the model's standard deviation. Regression models PGV ¼ 1:55−6:20  GD1 þ 24:46  GD2−7:27 GD3 þ 16:95  GD4 PGV ¼ 1:53 þ 22:19  GD2−7:56  GD3 þ 16:90  GD4 PGV ¼ 3:59−5:61  GD3 þ 20:98  GD4 PGV ¼ 3:88 þ 15:99  GD4 PGV ¼ 2:52−5:64  GD1 þ 28:45  GD2 þ 8:86  GD3 PGV ¼ 2:50 þ 26:38  GD2 þ 8:55  GD3 PGV ¼ 5:35 þ 15:88  GD3 PGV ¼ 2:34 þ 3:37  GD1 þ 40:25  GD2 PGV ¼ 5:35 þ 15:88  GD2 PGV ¼ 5:35 þ 15:88  GD1

SD

R2

6.682 0.713 6.667 7.460 7.478 7.310 7.286 8.344 8.056 8.016 10.00

0.711 0.634 0.628 0.652 0.651 0.536 0.573 0.572 0.334

we calculated the correlation between PGV and many possible combinations of GD1 through GD4. The results of all the regression analyses are summarized in Table 2. Fig. 4(a) shows the model's R2 for four single regression analyses. It was found that the correlation between GD4 and PGV has the highest R2 value. But note that with this pool of data the R2 for the GD2–PGV correlation was higher than the GD3–PGV correlation. Fig. 4(b) shows the model's R2 for regression analyses all including GD4. The results show that the more precursors were used, the higher the R2 was achieved in the regression analysis. Fig. 4(c) shows similar calculations for GD3, which also supports the conclusion of improved R2 with more precursors in a multiple regression analysis. With four seconds selected as the “time threshold”, we suggest the following relationship to predict PGV, because it is associated with the highest R2 value among other relationships containing GD4 PGV ¼ 1:55−6:20  GD1 þ 24:46  GD2 −7:27  GD3 þ 16:95  GD4 7 6:68; R2 ¼ 0:71

ð1Þ

where the term 76.68 (cm/s) is the model error. For the time threshold of three seconds, the best relationship to predict PGV

J.P. Wang et al. / Computers & Geosciences 58 (2013) 1–7

5

6. Discussions 6.1. Model development With these findings, we expect that R2 would increase further with additional information from GD5, GD6, … incorporated together to predict PGV. But as mentioned previously, there is a cost associated with the loss of warning time when more information in the earthquake time history is required for the correlation. Therefore, more studies should be conducted to find an optimum solution between the model's reliability (i.e., R2) and how many seconds of an earthquake time history should be used in the correlation. Nevertheless, one suggestion from this study is that all information before a given time threshold should be utilized altogether, most likely resulting in a correlation with improved R2 values. 6.2. Model error or uncertainty No matter what kinds of precursor combinations are used, the regression model will never be perfect (i.e., R2 ¼1) owing to natural randomness of earthquakes. In other words, the model's error is attributed to the inevitable scattered data that is generated by earthquakes. But it must be noted that the variable's dependency quantified by regression analysis is judged by R2, not by model uncertainty. On the other hand, model uncertainly can be addressed by calculating the exceedance probability of peak motions given the model's error, as demonstrated in a study associated with a conventional on-site system (Wang et al., 2012b). This should be also viable for the new systems utilizing multiple regression analysis, because the same fundamentals involve in both single and multiple regression analyses. Fig. 5 shows an example for integrating this option with the new system for on-site earthquake early warning. Given GD1 to GD4 being 0.64, 1.04, 3.46, and 3.46 cm in an earthquake motion, the mean value of PGV is equal to 56.4 cm/s with the new empirical equation (i.e., Eq. (1)), but note there is a 10% probability, for example, that the actual PGV can exceed 65 cm/s based on the fundamentals of regression analysis. 6.3. Sample size and future work The suggested relationships for on-site earthquake early warning should include a sufficient quantity of earthquake samples. Being a statistical analysis, the correlation should become more convincing with more data. With this in mind, we hope this paper

Fig. 4. The distribution of the regression model’s R2 on a variety of conditions: (a) regression analyses between GD1 and PGV,…, GD4 and PGV, (b) regression analyses containing GD4 to predict PGV, and (c) regression analyses containing GD3 to predict PGV. (The integer in the parenthesis denotes the number of precursors.). (a) Single precursor to predict PGV. (b) Precursors containing GD4 to predict PGV. (c) Precursors containing GD3 to predict PGV.

based on this pool of data was found to be PGV ¼ 2:52−5:64  GD1 þ 28:45  GD2 þ8:86  GD3 7 7:31; R2 ¼ 0:65

ð2Þ

Fig. 5. Probability distribution of PGV based on the new empirical function, given GD1 to GD4 equal to 0.64, 1.04, 3.46, and 3.46 cm.

6

J.P. Wang et al. / Computers & Geosciences 58 (2013) 1–7

can motivate more studies to refine empirical relationships for onsite earthquake early warning, which is considered a practical solution to earthquake hazard mitigation.

7. Conclusion Excel has been increasingly used for geoscience studies because of its user-friendly interface and low cost. However, matrix functions in Excel are not as user-friendly as others, possibly lowering the value of spreadsheets to multiple regression analysis governed by matrix-based algorithms. As a result, this paper presents two spreadsheet applications for Excel which improve the user-friendly operation of multiple regression analyses. More importantly, based on a series of regression analyses that were performed, this study suggested a new framework for on-site earthquake early warning based on multiple regression analysis.

Acknowledgments We appreciate the comments from the editor and reviewer which have helped to improve many aspects of this paper.

Appendix A. Multiple regression algorithms and subroutine “Multiple Regression” ^ of a With the least-square algorithm, the parameters (β) regression function can be obtained with the following governing equation (Chang, 1994). (Notation in bold denotes a matrix.) β^ ¼ ðX0 XÞ−1 X0 y

ðA:1Þ 0

where X and y are the two sample matrices, and X is the transpose matrix of X. Take the benchmark calculation (Figs. 2 and 3) for example, X and y are as follows: 2 3 2 3 1 −4 16 2 6 1 −2 4 7 6 −1 7 6 7 6 7 6 7 6 7 6 1 −1 1 7 6 −1 7 6 7 6 7 X¼6 ; y ¼ 6 4 7 1 7 61 1 7 6 7 6 7 6 7 41 2 4 9 5 4 5 1

4

6

19

After β^ is calculated with Eq. (A.1), the difference (denoted as e) between observations and estimates in Y can be expressed as follows: e ¼ y−y^ ¼ y−Xβ^

ðA:2Þ

and the model's standard deviation (s) is equal to s2 ¼

e′e SSE ¼ n−p−1 n−p−1

ðA:3Þ

where n and p are the sample size and the number of dependent variables (i.e., Xis). The model's R2 related to Sum Square Total (SST) and Sum Square Error (SSE) is R2 ¼ 1−

SSE SST

ðA:4Þ

where SSE was given in Eq. (A.3), and SST is calculated as follows: n

SST ¼ ∑ ðyi −yÞ2 i¼1

ðA:5Þ

where y is the mean value of the series of Y's observations. With the algorithms, the codes in the subroutine “Multiple Regression” for executing the calculation step-by-step are given in the following.

Line index/script 1 2 3 4 5 6 7 8 9 10 11 12

Sub multiple_regression() ' this macro is to perform multiple regression ' summary of variables: ' n¼sample size ' m ¼no. of dependent variable (X)+1 ' xmat¼ X matrix ' xmat_tp ¼X transpose matrix ' xxmat_inv¼inverse matrix of X'X, or (X'X)-1 ' ymat ¼Y matrix ' xtp_y ¼X'Y ' v¼variance ' programming start nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn

13 n ¼Application.WorksheetFunction.Count(Range(''k21: k1000'')) 14 m ¼Application.WorksheetFunction.Count(Range(''k21: az21'')) 15 16 ReDim xmat(n, m) 17 For i ¼1 To n 18 For j¼1 To m 19 xmat(i, j)¼ Range(''k21'').Cells(i, j).Value 20 Next 21 xmat(i, 1) ¼ 1 22 Next 23 24 ReDim xmat_tp(m, n) 25 xmat_tp ¼ Application.WorksheetFunction.Transpose(xmat) 26 27 ReDim xxmat(m, m) 28 xxmat ¼Application.WorksheetFunction.MMult(xmat_tp, xmat) 29 30 ReDim xxmat_inv(m, m) 31 xxmat_inv¼Application.WorksheetFunction.MInverse (xxmat) 32 33 ReDim ymat(n, 1) 34 For i ¼1 To n 35 ymat(i, 1) ¼Range(''k21'').Cells(i, 1).Value 36 Next 37 38 ReDim xtp_y(m, 1) 39 xtp_y ¼Application.WorksheetFunction.MMult(xmat_tp, ymat) 40 41 ReDim beta(m, 1) 42 beta ¼ Application.WorksheetFunction.MMult(xxmat_inv, xtp_y) 43 For i ¼1 To m 44 Range(''k10'').Cells(1, i).Value¼ beta(i, 1) 45 Next 46 47 ReDim yest(n, 1) 48 yest¼Application.WorksheetFunction.MMult(xmat, beta) 49 50 ReDim e(n, 1) 51 For i ¼1 To n 52 e(i, 1) ¼ymat(i, 1)-yest(i, 1) 53 Next 54 55 ReDim etp(1, n) 56 For i ¼1 To n

J.P. Wang et al. / Computers & Geosciences 58 (2013) 1–7

57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77

etp(1, i) ¼e(i, 1) Next sse ¼0 For i ¼1 To n sse ¼sse+(e(i, 1))^2 Next v¼sse/(n-m) Range(''l13'').Value ¼ v^0.5 ymean ¼ Application.WorksheetFunction.Average(ymat) sst¼0 For i ¼1 To n sst¼sst+(ymat(i, 1)-ymean)^2 Next r2¼(sst-sse)/sst Range(''k13'').Value ¼r2 End Sub

Appendix B. Subroutine “Easy Excel Multiple Regression”

Line index/ Script 1 2 3 4 5 6

Sub easy_excel_multiple_regression() ' this macro is to modify the existing toolkit ' summary of variable: ' n¼ sample size ' k ¼column name in the right-most column ' programming start nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn

7 8

Range(''a20:j40'').ClearContents n¼Application.WorksheetFunction.Count(Range(''k21: k65000'')) 9 k ¼Application.WorksheetFunction.Count(Range(''k21: dz21'')) 10 k ¼add_col(k+10) 11 12 Application.Run ''ATPVBAEN.XLA!Regress'', Range(''k21:k'' & 20+n), _ 13 Range(''l21:'' & k & n+20), False, False, , Range(''a20''), _ 14 False, False, False, False, , False 15 16 End Sub

Appendix C. Supporting information Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.cageo.2013.04.018.

7

References Allen, R.M., Kanamori, H., 2003. The potential for earthquake early warning in Southern California. Science 300, 786–789. Biddle, D.L., Percival, H.J., Chittleborough, D.J., 1995. An interactive spreadsheet for graphing miniral stability diagrams. Computers & Geosciences 21, 175–185. Campbell, K.W., Bozorgnia, Y., 2010. A ground motion prediction equation for the horizontal component of cumulative absolute velocity (CAV) based on the PEER-NGA strong motion database. Earthquake Spectra 26 (3), 635–650. Chang, J.B., 1994. Statistics (in Chinese). San Min Book Co., Ltd., Taipei445–470. Chiou, B., Darragh, R., Gregor, N., Silva, W., 2008. NGA project strong-motion database. Earthquake Spectra 24 (1), 23–44. Grossenbacher, K., Bahat, D., Karasaki, K., 1996. Triangulator: excel spreadsheets for converting relative bearings to XYZ coordinates, with applications to scaling photographs and orienting surfaces. Computers and Geosciences 22 (10), 1053–1059. Hsiao, N.C., Wu, Y.M., Zhao, L., Chen, D.Y., Huang, W.T., Kuo, K.H., Shin, T.C., Leu, P.L., 2011. A new prototype system for earthquake early warning in Taiwan. Soil Dynamics and Earthquake Engineering 31, 201–208. Kanamori, H., Hauksson, E., Heaton, T., 1997. Real-time seismology and earthquake hazard mitigation. Nature 390, 461–464. Keskin, M., 2002. FC-modeler: a Microsoft Excel spreadsheet program for modeling Rayleigh fractionation vectors in closed magmatic systems. Computer and Geosciences 28, 919–928. Martin, J.D., 1996. EQMIN, a Microsoft Excel spreadsheet to perform thermodynamic calculations: a didactic approach. Computers and Geosciences 22 (6), 639–650. Mayborn, K.R., Lesher, C.E., 2011. MagPath: an excel-based visual basic program for forward modeling of mafic magma crystallization. Computers and Geosciences 37, 1900–1903. Ozcep, F., 2010. Soil Engineering: a Microsoft Excel spreadsheet program for geotechnical and geophysical analysis of soils. Computers and Geosciences 36, 1355–1361. PEER, 2011. Users Manual for the PEER Ground Motion Database Web Application, Pacific Earthquake Engineering Research Center. University of California, Berkeley, CA. Satriano, C., Lomax, A., Zollo, A., 2008. Real-time evolutionary earthquake location for seismic early warning. Bulletin of the Seismological Society of America 98 (3), 1482–1494. Tindle, A.G., Webb, P.C., 1994. PROBE-AMPH – a spreadsheet program to classify microprobe-derived amphibole analysis. Computers and Geosciences 20, 1201–1228. Wang, J.P., Huang, D.R., 2012. Rosenpoint: A Microsoft Excel-based program for the Rosenblueth point estimate method and an application in slope stability analysis. Computers and Geosciences 48, 239–243. Wang, J.P., Huang, D.R., Yang, Z.J., 2012a. Deterministic seismic hazard map for Taiwan developed using an in-house Excel-based program. Computers and Geosciences 48, 111–116. Wang, J.P., Wu, Y.M., Lin, T.L., Brant, L., 2012b. The uncertainty of a Pd3-PGV onsite earthquake early warning system. Soil Dynamics and Earthquake Engineering 36, 32–37. Wang, J.P., Huang, D., Cheng, C.T., Shao, K.S., Wu, Y.C., Chang, C.W., 2013. Seismic hazard analysis for Taipei City including deaggregation, design spectra, and time history with Excel applications. Computers and Geosciences 52, 146–154. Wu, Y.M., Kanamori, H., 2005a. Experiment on an onsite early warning method for the Taiwan early system. Bulletin of the Seismological Society of America 95, 347–353. Wu, Y.M., Kanamori, H., 2005b. Rapid assessment of damaging potential of earthquakes in Taiwan from the beginning of P waves. Bulletin of the Seismological Society of America 95, 1181–1185. Wu, Y.M., Kanamori, H., 2008a. Exploring the feasibility of on-site earthquake early warning using close-in records of the 2007 Noto Hanto earthquake. Earth Planets and Space 60, 155–160. Wu, Y.M., Kanamori, H., 2008b. Development of an earthquake early warning system using real-time strong motion signals. Sensors 8, 1–9. Wu, Y.M., Kanamori, H., Allen, R.M., Hauksson, E., 2007. Determination of earthquake early warning parameters, τc and Pd, for southern California. Geophysical Journal International 170, 711–717.