Simple Linear Regression

Report 1 Downloads 341 Views
Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

In this Module In this module, you will learn:  Simple Linear Regression (revisited)  Decomposing the sum-of- squares  Computations - Variance of Estimate - Standard Error of Estimate (RMSE) - Coefficient of Determination - Coefficient of Alienation  Tests of Significance for Regression: r2 and b  Assumptions  Outlier Diagnosis

Simple Linear Regression

Introduction In this module, we revisit Simple Linear Regression from Statistics I.

Remember, regression is used when we want to make a prediction about a dependent or outcome variable (or dependent variable) from a single predictor (or independent variable).

Simple Linear Regression

Case 1: Predicting International Baccalaureate GPA

A school district in the State of Florida is trying to determine the best predictor for success in an International Baccalaureate (IB) program. The school currently requires a Middle School GPA of 2.5. The school district is interested in determining how well Middle School GPA could be used to predict IB-GPA for High School students. In order to determine whether Middle School GPA will predict IB-GPA, the school district has admitted all 100 applicants who meet the minimum requirements. the data set is available for download from the Attachments tab

Simple Linear Regression

Case 1: Data Set GPA

IB-GPA

GPA

IB-GPA

GPA

IB-GPA

GPA

IB-GPA

GPA

IB-GPA

2.9

2.33

2.87

2.35

3.23

3.35

3.67

3.86

2.97

2.8

3.16

2.86

3.06

3.31

3.07

3.42

3.42

3.74

2.86

3.47

2.65

2.4

2.78

1.92

3.6

3.38

3.55

3.43

3.6

2.92

3.39

3.49

2.87

2.69

3.61

3.61

2.82

3

3.19

2.83

3.14

2.79

2.52

2.82

3.88

3.82

2.87

2.88

3.31

3.63

3.28

2.94

2.5

2.78

3.28

3.71

3.57

3.26

3.22

3.12

2.65

2.56

3.13

2.89

2.82

2.74

3.43

3.94

4

4

3.18

2.95

3.16

2.77

3.65

3.32

3.37

2.63

3.81

4

2.53

2.14

4

4

3.45

2.45

3.61

3.44

3.61

3.9

2.9

2.36

2.8

3.2

3.4

3.59

3.77

3.74

3.46

3.14

4

4

3.65

2.38

3.5

4

3.15

3.7

4

4

3.35

3.02

2.7

2.88

3.85

3.54

3.5

3.8

3.33

2.91

2.67

2.53

4

3.25

3.73

3.33

3.34

2.83

4

4

2.65

2.79

2.93

2.32

2.75

2.5

3.45

3.19

3.81

3.93

2.98

3.41

2.77

3.68

3.57

3.77

3.97

2.92

3.58

3.07

3.58

4

3.41

3.55

2.84

2.61

3.38

4

4

3.6

3.5

2.97

3.28

2.72

3.51

3.51

2.9

1.91

3.59

3.28

3.08

2.98

3.04

2.77

2.85

2.29

3.08

3.35

3.95

3.31

3.76

4

3.09

3.81

2.91

2.55

4

3.94

3.96

4

2.55

2.13

3.47

2.77

3.62

4

3.69

3.34

2.91

3.01

Simple Linear Regression

Regression Equation Remember, a regression equation looks like this:



is the predicted value of the criterion variable

Yˆ = a + bX

“a” and “b” are some constants, where “a” is the intercept and “b” is the slope

“X” is a predictor or regressor

Simple Linear Regression

Case 1: Regression Equation for GPA

In Case 1, the regression equation where GPA is a predictor is illustrated below:

Where IB-GPA is the criterion variable, 0.17473 is the intercept, 0.91110 is the slope, and GPA is the predictor or regressor.

Show Details

Simple Linear Regression

How We Arrived at the Regression Equation

Simple Linear Regression

Decomposition of Observed Values of Y The observed values of “Y” can be decomposed into two parts— one predictable and one not predictable. The same is true for the variability in Y:

Deviation from the mean

Regression Deviation

Residual Deviation

Simple Linear Regression

Decomposition of IB-GPA Using our mean score and predicted scores for IB-GPA, we can calculate and sum the total deviation scores, the regression deviation scores, and the residual deviation scores:

What do we do when our deviation scores sum to zero?

Simple Linear Regression

Decomposition of IB-GPA Squaring and summing across observations:

Total Sum-of-Squares for Y (SStotal)

=

Regression Sum-of-Squares (SSreg)

+

Residual Sum-of-Squares (SSresid)

Simple Linear Regression

Decomposition of IB-GPA Decomposing the Sample sum-of-squares:

Simple Linear Regression

Variance of Estimate Variance of Estimate (MSresidual):

The variance of estimate is the variance of the residuals, or the variance of the data points around the regression line. The more precise our prediction is, the smaller this variance will be.

Simple Linear Regression

Standard Error of Estimate Standard Error of Estimate (RMSE):

The Standard Error of Estimate (or Root Mean Square Error) indicates the size of a typical prediction error, or how far observations tend to fall from the regression line. In our case, the predicted IB-GPA is typically 0.41531 points away from the actual IB-GPA.

Simple Linear Regression

Coefficient of Determination The Coefficient of Determination is equal to the regression sumof-squares divided by the total sum-of-squares:

This is the proportion of the variance of the criterion variable that is predictable by the predictor. In our case, the amount of variability in IB-GPA that is predicted by Middle School GPA. It allows us to determine how certain we can be in making predictions based on our results.

Simple Linear Regression

Coefficient of Alienation The Coefficient of Alienation is equal to one minus the Coefficient of Determination or the residual sum-of-squares divided by the total sum-of-squares:

The Coefficient of Alienation represents the proportion of variability in the criterion variable that is NOT accounted for by the predictor. In our case, it represents the amount of variability in IB-GPA that is not predicted by Middle School GPA.

Simple Linear Regression

2  Test of Significance:

We use an F-statistic to test if we are accounting for a statistically significant portion of the variance in the outcome.

That is, we wish to test the null hypothesis: H0: (rho square)  2 =0

F

MS reg MS resid



( SS reg / df reg ) ( SS resid / df resid )



( SS reg / 1) ( SS resid / N  2)

Simple Linear Regression

Test of Significance:  2 Let’s try this:

Since the p-value for this F Value is