Regression Models Course Project

Report 8 Downloads 98 Views
Regression Models Course Project George Lwevoola July 21, 2016 Executive Summary This Project assignment examines the use of exploratory tools in kick-starting the process of identifying relevant variables to include in a model given an outcome and a number of possible regressors or explanatory varaibles. The "strength" of these explanatory variables on the outcome is progressively tested until the most approriate set of variables are selected. Residual plots and diagnostics are used to establish the "goodness" of fit of the selected model. An attempt is also made to quantify the uncertainty through the use of inference tools. 1.

Introduction

We are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome) and interested in the following two questions: i. ii.

"Is an automatic or manual transmission better for MPG" "Quantify the MPG difference between automatic and manual transmissions"

This assignment seeks to answer the questions as to whether an automatic or manual transmission offers more miles per gallon MPG and then proceed to quantify the MPG difference between automatic and manual vehicles. We start by assuming that all the variables have an effect on mpg and try to determing the extent of this effect. 2.

Exploratory Data Analysis

Visualizing the data by using exploratory graphs will help us understand the data better as well as unearth patterns that may be crucial in developing the regression model. General observations can be made when mpg is plotted altermnately with the other variables. These can be seen in the appendix at the end of this report. When we examine the un-adjusted estimate of mpg as outcome regressed against transmission type, we see the results below: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 17.147368 1.124603 15.247492 1.133983e-15 ## factor(am)1 7.244939 1.764422 4.106127 2.850207e-04

Note that the t-test for null hypothesis: coefficient of the transmission type variable = 0 versus the alternative hypothesis coefficient of transmission type is not equal to 0 is significant since 0.0002857 is greater than 0.05.

3.

Linear Model Selection

Using the analysis of variance anova function we try to select the best suited model as follows: ## Analysis of Variance Table ## ## Model 1: mpg ~ factor(am) - 1 ## Model 2: mpg ~ cyl + factor(am) - 1 ## Model 3: mpg ~ disp + cyl + factor(am) - 1 ## Model 4: mpg ~ hp + disp + cyl + factor(am) - 1 ## Model 5: mpg ~ drat + hp + disp + cyl + factor(am) - 1 ## Model 6: mpg ~ wt + drat + hp + disp + cyl + factor(am) - 1 ## Model 7: mpg ~ qsec + wt + drat + hp + disp + cyl + factor(am) - 1 ## Model 8: mpg ~ vs + qsec + wt + drat + hp + disp + cyl + factor(am) ## 1 ## Model 9: mpg ~ gear + vs + qsec + wt + drat + hp + disp + cyl + factor(am) ## 1 ## Model 10: mpg ~ carb + gear + vs + qsec + wt + drat + hp + disp + cyl + ## factor(am) - 1 ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 30 720.90 ## 2 29 271.36 1 449.53 64.0039 8.231e-08 *** ## 3 28 252.08 1 19.28 2.7452 0.11241 ## 4 27 216.37 1 35.71 5.0849 0.03493 * ## 5 26 214.50 1 1.87 0.2663 0.61121 ## 6 25 162.43 1 52.06 7.4127 0.01275 * ## 7 24 149.09 1 13.34 1.8999 0.18260 ## 8 23 148.87 1 0.22 0.0309 0.86214 ## 9 22 147.90 1 0.97 0.1384 0.71365 ## 10 21 147.49 1 0.41 0.0579 0.81218 ## --## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model 6 appears to offer the best fit for predicting mpg as it has the lowest p-value. This lower p-value indicates wehther all the included variables are necessary or not. Below we look at the model more closely and interpret the coefficients Coefficients: summary(fit6)$coef ## ## ## ## ## ## ## ##

wt drat hp disp cyl factor(am)0 factor(am)1

Estimate -3.27472256 0.48585981 -0.02887012 0.01256722 -1.03334930 36.04938339 37.42444735

Std. Error t value Pr(>|t|) 1.15684830 -2.8307277 9.033089e-03 1.49494905 0.3250009 7.478845e-01 0.01444162 -1.9990916 5.658028e-02 0.01195130 1.0515352 3.030714e-01 0.72404890 -1.4271816 1.658993e-01 7.60552882 4.7398918 7.311525e-05 7.77506288 4.8133948 6.043025e-05

For the first coefficient we observe that every additional increase in weight of 1000lbs leads to a decline in mpg of 3.27, holding all the other variables constant. The second cofficient indicates an increase in mpg of 0.485 for every unit increase in rear axle ratio, holding all the other variables constant. The third cofficient indicates a decrease in mpg of 0.02 for every unit increase in horsepower, holding all the other variables constant. The fourth cofficient indicates an increase in mpg of 0.01 for every cubic inch increase in displacement, holding all the other variables constant. The fifth cofficient indicates a decrease in mpg of 1.033 for every additional cylinder for a vehicle, holding all the other variables constant. The sixth and seventh cofficients indicate a decline in mpg equivalent to (37.4-36.0=1.4) as we compare an automatic and manual transmissions respectively, holding all the other variables constant. A residual plot for this model is displayed below par(mfrow = c(2, 2)) plot(fit6)

The residual plots seem to show a good fit model as indicated above. In addition the confidence intervals for our slope coefficients are give as below

confint(fit6) ## ## ## ## ## ## ## ##

4.

2.5 % 97.5 % wt -5.65729623 -0.8921488900 drat -2.59304539 3.5647650123 hp -0.05861318 0.0008729516 disp -0.01204695 0.0371813863 cyl -2.52455592 0.4578573192 factor(am)0 20.38550357 51.7132632199 factor(am)1 21.41140560 53.4374891072

Conclusion

As indicated from the exploratory graphs above, the selected linear models as well as residual plots and confidence intervals, we can generally deduce that manual transmissions offer better miles per gallon (mpg) compared to automatic transmissions taking into account the known variables.

5.

Appendix - Exploratory Graphs