In this Module In this module, you will learn: Simple Linear Regression (revisited) Decomposing the sum-of- squares Computations - Variance of Estimate - Standard Error of Estimate (RMSE) - Coefficient of Determination - Coefficient of Alienation Tests of Significance for Regression: r2 and b Assumptions Outlier Diagnosis
Simple Linear Regression
Introduction In this module, we revisit Simple Linear Regression from Statistics I.
Remember, regression is used when we want to make a prediction about a dependent or outcome variable (or dependent variable) from a single predictor (or independent variable).
Simple Linear Regression
Case 1: Predicting International Baccalaureate GPA
A school district in the State of Florida is trying to determine the best predictor for success in an International Baccalaureate (IB) program. The school currently requires a Middle School GPA of 2.5. The school district is interested in determining how well Middle School GPA could be used to predict IB-GPA for High School students. In order to determine whether Middle School GPA will predict IB-GPA, the school district has admitted all 100 applicants who meet the minimum requirements. the data set is available for download from the Attachments tab
Simple Linear Regression
Case 1: Data Set GPA
IB-GPA
GPA
IB-GPA
GPA
IB-GPA
GPA
IB-GPA
GPA
IB-GPA
2.9
2.33
2.87
2.35
3.23
3.35
3.67
3.86
2.97
2.8
3.16
2.86
3.06
3.31
3.07
3.42
3.42
3.74
2.86
3.47
2.65
2.4
2.78
1.92
3.6
3.38
3.55
3.43
3.6
2.92
3.39
3.49
2.87
2.69
3.61
3.61
2.82
3
3.19
2.83
3.14
2.79
2.52
2.82
3.88
3.82
2.87
2.88
3.31
3.63
3.28
2.94
2.5
2.78
3.28
3.71
3.57
3.26
3.22
3.12
2.65
2.56
3.13
2.89
2.82
2.74
3.43
3.94
4
4
3.18
2.95
3.16
2.77
3.65
3.32
3.37
2.63
3.81
4
2.53
2.14
4
4
3.45
2.45
3.61
3.44
3.61
3.9
2.9
2.36
2.8
3.2
3.4
3.59
3.77
3.74
3.46
3.14
4
4
3.65
2.38
3.5
4
3.15
3.7
4
4
3.35
3.02
2.7
2.88
3.85
3.54
3.5
3.8
3.33
2.91
2.67
2.53
4
3.25
3.73
3.33
3.34
2.83
4
4
2.65
2.79
2.93
2.32
2.75
2.5
3.45
3.19
3.81
3.93
2.98
3.41
2.77
3.68
3.57
3.77
3.97
2.92
3.58
3.07
3.58
4
3.41
3.55
2.84
2.61
3.38
4
4
3.6
3.5
2.97
3.28
2.72
3.51
3.51
2.9
1.91
3.59
3.28
3.08
2.98
3.04
2.77
2.85
2.29
3.08
3.35
3.95
3.31
3.76
4
3.09
3.81
2.91
2.55
4
3.94
3.96
4
2.55
2.13
3.47
2.77
3.62
4
3.69
3.34
2.91
3.01
Simple Linear Regression
Regression Equation Remember, a regression equation looks like this:
Yˆ
is the predicted value of the criterion variable
Yˆ = a + bX
“a” and “b” are some constants, where “a” is the intercept and “b” is the slope
“X” is a predictor or regressor
Simple Linear Regression
Case 1: Regression Equation for GPA
In Case 1, the regression equation where GPA is a predictor is illustrated below:
Where IB-GPA is the criterion variable, 0.17473 is the intercept, 0.91110 is the slope, and GPA is the predictor or regressor.
Show Details
Simple Linear Regression
How We Arrived at the Regression Equation
Simple Linear Regression
Decomposition of Observed Values of Y The observed values of “Y” can be decomposed into two parts— one predictable and one not predictable. The same is true for the variability in Y:
Deviation from the mean
Regression Deviation
Residual Deviation
Simple Linear Regression
Decomposition of IB-GPA Using our mean score and predicted scores for IB-GPA, we can calculate and sum the total deviation scores, the regression deviation scores, and the residual deviation scores:
What do we do when our deviation scores sum to zero?
Simple Linear Regression
Decomposition of IB-GPA Squaring and summing across observations:
Total Sum-of-Squares for Y (SStotal)
=
Regression Sum-of-Squares (SSreg)
+
Residual Sum-of-Squares (SSresid)
Simple Linear Regression
Decomposition of IB-GPA Decomposing the Sample sum-of-squares:
Simple Linear Regression
Variance of Estimate Variance of Estimate (MSresidual):
The variance of estimate is the variance of the residuals, or the variance of the data points around the regression line. The more precise our prediction is, the smaller this variance will be.
Simple Linear Regression
Standard Error of Estimate Standard Error of Estimate (RMSE):
The Standard Error of Estimate (or Root Mean Square Error) indicates the size of a typical prediction error, or how far observations tend to fall from the regression line. In our case, the predicted IB-GPA is typically 0.41531 points away from the actual IB-GPA.
Simple Linear Regression
Coefficient of Determination The Coefficient of Determination is equal to the regression sumof-squares divided by the total sum-of-squares:
This is the proportion of the variance of the criterion variable that is predictable by the predictor. In our case, the amount of variability in IB-GPA that is predicted by Middle School GPA. It allows us to determine how certain we can be in making predictions based on our results.
Simple Linear Regression
Coefficient of Alienation The Coefficient of Alienation is equal to one minus the Coefficient of Determination or the residual sum-of-squares divided by the total sum-of-squares:
The Coefficient of Alienation represents the proportion of variability in the criterion variable that is NOT accounted for by the predictor. In our case, it represents the amount of variability in IB-GPA that is not predicted by Middle School GPA.
Simple Linear Regression
2 Test of Significance:
We use an F-statistic to test if we are accounting for a statistically significant portion of the variance in the outcome.
That is, we wish to test the null hypothesis: H0: (rho square) 2 =0