Automatic and manual transmission in fuel consumption Fernando López 12 de junio de 2016
Sinopsys A magazine: ´Motor Trend´ have data from 32 automobiles, and want to know which kind of transmission, automatic or manual, is better for fuel economy, and to quantify the difference. The result shows that is possible that manual transmission is better, but the conclusion is not definitive because of lack of significance of the transmission variable in the sample.
Introduction The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models): mpg - Miles/(US) gallon, cyl - Number of cylinders, disp - Displacement (cubic inches), hp - Gross horsepower, drat - Rear axle ratio, wt - Weight (1000 lbs), qsec - 1/4 mile time, vs - Engine (0 = V engine, 1 = Straight engine), am - Transmission (0 = automatic, 1 = manual), gear - Number of forward gears, carb - Number of carburetors
Modeling There is no good reason to put all variables in a linear regression, but omitting a correlated regressor can bias estimates of a coefficient. The model must be reasonable. Let´s see the theory. 1. More cylinders spend more fuel per revolution, so involves more fuel consumption. 2. Engine capacity (displacement) is proportional to fuel economy. 3. Horsepower is an important factor in an automobile’s fuel consumption. In simple terms, horsepower describes how quickly the engine work can be done. Compared with a tractor, car achieves the same power with comparatively low torque but significantly higher revolutions per minute (RPM). 4. The optional rear axel ratios offered by just about every manufacturer don’t have as much of an adverse effect on fuel economy as one might expect. - See more at: http://www.hardworkingtrucks. com/understanding-axle-ratios/#sthash.yu8GR0I9.dpuf 5. More weigth implies more fuel consumption. 6. Higher qsec means more acceleration, so is fuel consuming. 7. There is no direct relation between engine type (V or straight) and fuel consumption. 8. Basically the gearbox is the intermediate mechanism which allows the engine and wheels to run at different RPM. More forward gear do not have a necessary relation with fuel consumption. 9. Carburetor mechanisms control the flow of air being pulled into the engine. The speed of this flow, and therefore its pressure, determines the amount of fuel drawn into the airstream. So there is no direct relation with fuel consumption. So, the model can include: cyl, disp, hp, wt, qsec and am. We are interested in the influence of am variable with mpg. There is no sense to introduce drat, vs, gear and carb.
1
0.66
cyl
0.78
0.83
disp
0.89
0.79
0.9
qsec
−0.17
−0.71
−0.59
−0.43
qsec
mpg
−0.87
−0.78
−0.85
−0.85
0.42
mpg
am
−0.69
−0.24
−0.52
−0.59
−0.23
0.6
cyl
hp
hp
disp
wt
Data exploration
am and mpg have a positive correlation. There is a big positive correlation between wt, disp, cyl and hp, and negative correlation of them with mpg. 150 200 400
disp
250 300
300
cyl 8
200
7 100
6 2
3
4
wt
5
5 4
The graph shows the positive correlation between wt, disp, cyl and hp. Omitting a correlated regressor can bias estimates of a coefficient. Factor analysis or principal component analysis can convert regressors to an equivalent uncorrelated set, but may make interpretation difficult. Anyway, I will leave the four variables as they are because we are not interested in them.
2
Fitting the model With an analysis of variance, the null hypothesis is that the added regressors are not significant. So, I model several linear regressions in order to select the adequate model: ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
Analysis of Variance Table Model 1: mpg ~ factor(am) Model 2: mpg ~ factor(am) + cyl Model 3: mpg ~ factor(am) + cyl + hp Model 4: mpg ~ am + cyl + hp + wt Model 5: mpg ~ factor(am) + cyl + hp + wt + disp Model 6: mpg ~ factor(am) + cyl + hp + wt + disp + qsec Res.Df RSS Df Sum of Sq F Pr(>F) 1 30 720.90 2 29 271.36 1 449.53 74.4306 5.774e-09 *** 3 28 220.55 1 50.81 8.4126 0.007660 ** 4 27 170.00 1 50.56 8.3706 0.007793 ** 5 26 163.12 1 6.88 1.1388 0.296104 6 25 150.99 1 12.13 2.0082 0.168795 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Variables ´disp´ and ´qsec´ are not significant. So the useful model is fit4: lm(mpg ~ am + cyl + hp + wt, data = mtcars). You can see the summary of that model in “Interpretation of coefficients” chapter. Intially Transmission type and weight were significant variables at 90% significant level with the five covariates, but without ´disp´ and ´qsec´, transmission is not significative. But first, analysis of variance is valid if model residuals are approximately normal. If they are not, we could get a small p-value for that reason. The Shapiro-Wilk test is a test of normality for fit4 model. ## ## Shapiro-Wilk normality test ## ## data: fit4$residuals ## W = 0.94042, p-value = 0.07695 The null-hypothesis of this test is that the population is normally distributed. The p-value is more than 0.05, means that I fail to reject the null hypothesis.
3
Residual plot and some diagnostics. Residual plot
6
Residuals vs Fitted Toyota Corolla Fiat 128
2 0 −4
Residuals
4
Chrysler Imperial
10
15
20
25
Fitted values lm(mpg ~ am + cyl + hp + wt)
Plot shows 3 value that could be outliers. Without that data, the model shows homoskedasticity, but leaving Toyoto Corrola and Fiat128 could be heteroskedasticity. It is important to evaluate leverage of that values Leverage
lm(mpg ~ am + cyl + hp + wt) Residuals vs Leverage
Scale−Location 1.5
Toyota Corolla Fiat 128
Toyota Corolla
2
1
Chrysler Imperial
Chrysler Imperial
1.0
Standardized residuals
1
0.5
0 −1
Toyota Corona 0.5
Cook's distance
0.0
0.1
0.2
0.3
0.4
0.0
Standardized residuals
0.5
10
Leverage
15
20 Fitted values
4
25
Leverage plot shows that there are not points which distort the regression. Scale location plot shows that there is a little heteroskedasticity. So, some type of correction should be better, but the heteroscedasticity is mild.
Interpretation of the coefficients ## ## ## ## ## ##
(Intercept) am cyl hp wt
Estimate 36.14653575 1.47804771 -0.74515702 -0.02495106 -2.60648071
Std. Error 3.10478079 1.44114927 0.58278741 0.01364614 0.91983749
t value 11.642218 1.025603 -1.278609 -1.828433 -2.833632
Pr(>|t|) 4.944804e-12 3.141799e-01 2.119166e-01 7.855337e-02 8.603218e-03
We know that “am” mean transmission with 0 = automatic, and 1 = manual, so 1,47 is the increase attributable to a manual transmission in reference to an automatic one. Since it is positive, manual transmission is better for fuel economy. The value imply that manual transmission increases 1.47 miles per gallon in average. But there is not significance of transmission variable in the model, so we have a hint that manual transmission is better, but it is necessary a greater sample in order to confirm it.
Annex ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ##
R version 3.2.4 (2016-03-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 10586) locale: [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C [5] LC_TIME=Spanish_Spain.1252 attached base packages: [1] stats graphics grDevices utils
datasets
methods
other attached packages: [1] car_2.1-2 ggplot2_2.1.0 corrplot_0.77 loaded via a namespace (and not attached): [1] Rcpp_0.12.4 knitr_1.12.3 [4] splines_3.2.4 MASS_7.3-45 [7] lattice_0.20-33 colorspace_1.2-6 [10] stringr_1.0.0 plyr_1.8.3 [13] parallel_3.2.4 nnet_7.3-11 [16] grid_3.2.4 nlme_3.1-124 [19] mgcv_1.8-11 quantreg_5.21 [22] htmltools_0.3 yaml_2.1.13 [25] digest_0.6.9 Matrix_1.2-3 [28] formatR_1.2.1 evaluate_0.8 [31] labeling_0.3 stringi_1.0-1 [34] SparseM_1.7
5
magrittr_1.5 munsell_0.4.3 minqa_1.2.4 tools_3.2.4 pbkrtest_0.4-6 gtable_0.2.0 MatrixModels_0.4-1 lme4_1.1-12 nloptr_1.0.4 rmarkdown_0.9.2 scales_0.4.0
base