Business Analytics Predictive Modeling using Linear Regression

Comment

Report 3 Downloads 103 Views

Business Analytics Predictive Modeling using Linear Regression

© Pristine

© Pristine – www.edupristine.com

Agenda  Introduction  Data

 Basic Statistics  Predictive modeling using Linear Regression

© Pristine

1

4. Correlation and Regression I.

Covariance and Correlation coefficient

II.

Regression

© Pristine

2

4a. Correlation I.

Covariance and Correlation coefficient

i. Definition

ii. Sample and population correlation iii. Illustrative example iv. Statistical significance test for sample correlation coefficient

© Pristine

3

4a. Covariance and Correlation Coefficient  Covariance is a statistical measure of the degree to which the two variables move together.  The sample covariance is calculated as n

(X

i

 X )(Y iY )

cov xy  i 1 n 1  Correlation coefficient

• It is a measure of the strength of the linear relationship between two variables • The correlation coefficient is given by:

 xy 

cov xy

 x isy denoted by ρ (rho) • Population correlation • Sample correlation is denoted by r. It is an estimate of ρ same way as – S2 (sample variance) is an estimate of σ2 (population variance) and – (sample mean) is an estimate of μ(population mean) • Features of ρ and r X – Unit free and ranges between -1 and 1 – The closer to -1, the stronger the negative linear relationship – The closer to 1, the stronger the positive linear relationship – The closer to 0, the weaker the linear relationship © Pristine

4

4a. Example: Covariance and Correlation of the S&P 500 and NASDAQ Returns given a sample Closing Index Value Date

S&P 500

NASDAQ

12/2/2011

1,244.28

2,626.93

12/5/2011

1,257.08

2,655.76

12/7/2011

1,261.01

2,649.21

12/8/2011

1,234.35

2,596.38

12/9/2011

1,255.19

2,646.85

12/12/2011

1,236.47

2,612.26

© Pristine

5

4a. Solution: Covariance and Correlation of the S&P 500 and NASDAQ Returns given a sample Closing Index Value Date

Returns

Deviation

S&P 500

NASDAQ

S&P 500

NASDAQ

S&P 500

12/2/2011

1,244.28

2,626.93

Xi

Yi

Xi- X

Yi- Y

(Xi-X )*(Yi- Y )

12/5/2011

1,257.08

2,655.76

1.03%

1.10%

1.14%

1.20%

0.0137%

12/7/2011

1,261.01

2,649.21

0.31%

-0.25%

0.43%

-0.15%

-0.0006%

12/8/2011

1,234.35

2,596.38

-2.11%

-1.99%

-2.00%

-1.89%

0.0378%

12/9/2011

1,255.19

2,646.85

1.69%

1.94%

1.80%

2.05%

0.0369%

12/12/2011

1,236.47

2,612.26

-1.49%

-1.31%

-1.38%

-1.21%

0.0166%

X

Y

-0.12%

-0.10%

Total

0.1044%

sx

sy

Standard Deviation

0.01630504

0.01633798

Covariance

0.000261013

Correlation

0.979811179

Mean

© Pristine

NASDAQ

6

4a. Examples of Approximate r Values

y

y

y

r = -1

x

r = -0.6

y

r=0

x

y

r = +.3 © Pristine

x

x

r = +1

x 7

4.b.Case- Multivariate Linear Regression (Revisited) Adam, an Analytics consultant works with First Auto Insurance company. His manager gave him data having "Loss" amount and policy related information and asked him to "identify" and "quantify" the factors responsible for losses in a multivariate fashion. Adam has no knowledge of running a multivariate regression. Now suppose, he approaches you and request for your help to complete the assignment. Lets help Adam in carrying out the multivariate regression.

© Pristine

8

4a. Testing the significance of the correlation coefficient  Test whether the correlation between the population of two variables is equal to zero • Null hypothesis, H0: r = 0  Assuming that the two populations are normally distributed, we can use a t-test to determine whether the null hypothesis should be rejected.  The test statistic is computed using the sample correlation, r, with n – 2 degrees of freedom (df ) • t = r √(n-2) √(1- r2)  Calculated test statistic is compared with the critical t-value for the appropriate degrees of freedom and level of significance  Reject H0 if t > tcritical or t

Recommend Documents

Business Analytics Predictive Modeling using Linear Regression

Business Analytics Multivariate Linear Regression (Using Ms-Excel ...

Predictive Linear Regression Model for Microinverter Internal ...

Predictive Analytics

predictive analytics

Fraud detection using predictive modeling