Inference on transformed variables

Report 1 Downloads 69 Views
DataCamp

Inference for Linear Regression

INFERENCE FOR LINEAR REGRESSION

Inference on transformed variables Jo Hardin Professor, Pomona College

DataCamp

Interpreting coefficients - linear Y = β0 + β1 ⋅ X + ϵ, where ϵ ∼ N (0, σϵ ) E[Y X ] = β0 + β1 ⋅ X E[Y X+1 ] = β0 + β1 ⋅ (X + 1) β1 = E[YX+1 ] − E[YX ]

Inference for Linear Regression

DataCamp

Interpreting coefficients - nonlinear X Y = β0 + β1 ⋅ ln(X) + ϵ, where ϵ ∼ N (0, σϵ ) E[Y ln(X)] = β0 + β1 ⋅ ln(X) E[Y ln(X)+1 ] = β0 + β1 ⋅ (ln(X) + 1) β1 = E[Yln(X)+1 ] − E[Yln(X) ]

Inference for Linear Regression

DataCamp

Interpreting coefficients - nonlinear Y ln(Y ) = β0 + β1 ⋅ X + ϵ, where ϵ ∼ N (0, σϵ ) E[ln(Y ) X ] = β0 + β1 ⋅ X E[ln(Y ) X+1 ] = β0 + β1 ⋅ (X + 1) β1 = E[ln(Y ) X+1 ] − E[ln(Y ) X ]

Inference for Linear Regression

DataCamp

Interpreting coefficients - both nonlinear ln(Y ) = β0 + β1 ⋅ ln(X) + ϵ, where ϵ ∼ N (0, σϵ ) E[ln(Y ) ln(X)] = β0 + β1 ⋅ ln(X) E[ln(Y ) ln(X)+1 ] = β0 + β1 ⋅ (ln(X) + 1) β1 = E[ln(Y ) ln(X)+1 ] − E[ln(Y ) ln X ]

Inference for Linear Regression

DataCamp

Inference for Linear Regression

Interpreting coefficients - both natural log (special case) ln(Y ) = β0 + β1 ⋅ ln(X) + ϵ, where ϵ ∼ N (0, σϵ ) E[ln(Y ) ln(X)] = β0 + β1 ⋅ ln(X) E[ln(Y ) ln(X)+1 ] = β0 + β1 ⋅ (ln(X) + 1) β1 = E[ln(Y ) ln(X)+1 ] − E[ln(Y ) ln X ] OR (when X and Y are both transformed using natural log): β1 = percent change in Y for each 1% change in X

DataCamp

Inference for Linear Regression

INFERENCE FOR LINEAR REGRESSION

Let's practice!

DataCamp

Inference for Linear Regression

INFERENCE FOR LINEAR REGRESSION

Multicollinearity Jo Hardin Professor, Pomona College

DataCamp

Regressing dollar amount on coins head(change) # A tibble: 6 x 7 # Coins Qrts Dimes Nickels Pennies Small Amount # # 1 2 1 1 0 0 1 0.35 # 2 3 3 0 0 0 0 0.75 # 3 2 0 0 2 0 2 0.10 # 4 4 4 0 0 0 0 1.00 # 5 2 2 0 0 0 0 0.50 # 6 13 3 4 2 4 10 1.29

Inference for Linear Regression

DataCamp

Amount vs. coins - plot

Inference for Linear Regression

DataCamp

Amount vs. coins - linear model lm(Amount ~ Coins, data = change) %>% tidy() # term estimate std.error statistic p.value # 1 (Intercept) 0.1449 0.0902 1.61 1.13e-01 # 2 Coins 0.0945 0.0063 14.99 6.01e-22

Inference for Linear Regression

DataCamp

Amount vs. small coins - plot

Inference for Linear Regression

DataCamp

Amount vs. small coins - linear model lm(Amount ~ Small, data = change) %>% tidy() # term estimate std.error statistic p.value # 1 (Intercept) 0.4225 0.1244 3.40 1.22e-03 # 2 Small 0.0989 0.0118 8.38 1.10e-11

Inference for Linear Regression

DataCamp

Amount vs. coins and small coins ^ Amount = −0.00554 + 0.25862 ⋅ Coins − 0.21611 ⋅ Small Coins lm(Amount ~ Coins + Small, data = change) %>% tidy() # term estimate std.error statistic p.value # 1 (Intercept) -0.00554 0.02735 -0.202 8.40e-01 # 2 Coins 0.25862 0.00682 37.917 3.95e-43 # 3 Small -0.21611 0.00864 -25.021 4.17e-33

Inference for Linear Regression

DataCamp

Inference for Linear Regression

INFERENCE FOR LINEAR REGRESSION

Let's practice!

DataCamp

Inference for Linear Regression

INFERENCE FOR LINEAR REGRESSION

Multiple linear regression Jo Hardin Professor, Pomona College

DataCamp

Bathrooms negative coefficient lm(log(price) ~ log(bath), data=LAhomes) %>% tidy() # term estimate std.error statistic p.value # 1 (Intercept) 12.23 0.0280 437.2 0.00e+00 # 2 log(bath) 1.43 0.0306 46.6 9.66e-300 lm(log(price) ~ log(sqft) + log(bath), data=LAhomes) %>% tidy() # term estimate std.error statistic p.value # 1 (Intercept) 2.514 0.2619 9.601 2.96e-21 # 2 log(sqft) 1.471 0.0395 37.221 1.19e-218 # 3 log(bath) -0.039 0.0453 -0.862 3.89e-01

Inference for Linear Regression

DataCamp

Bathrooms non-significant coefficient lm(log(price) ~ log(bath), data=LAhomes) %>% tidy() # term estimate std.error statistic p.value # 1 (Intercept) 12.23 0.0280 437.2 0.00e+00 # 2 log(bath) 1.43 0.0306 46.6 9.66e-300 lm(log(price) ~ log(sqft) + log(bath), data=LAhomes) %>% tidy() # term estimate std.error statistic p.value # 1 (Intercept) 2.514 0.2619 9.601 2.96e-21 # 2 log(sqft) 1.471 0.0395 37.221 1.19e-218 # 3 log(bath) -0.039 0.0453 -0.862 3.89e-01

Inference for Linear Regression

DataCamp

Price on bed and bath lm(log(price) ~ log(bath) + bed, data=LAhomes) %>% tidy() # term estimate std.error statistic p.value # 1 (Intercept) 11.965 0.0384 311.67 0.00e+00 # 2 log(bath) 1.076 0.0465 23.14 2.38e-102 # 3 bed 0.189 0.0193 9.82 4.01e-22

Inference for Linear Regression

DataCamp

Large model on price lm(log(price) ~ log(sqft) + log(bath) + bed, data=LAhomes) %>% tidy() # term estimate std.error statistic p.value # 1 (Intercept) 1.5364 0.2894 5.310 1.25e-07 # 2 log(sqft) 1.6456 0.0454 36.215 6.27e-210 # 3 log(bath) 0.0165 0.0452 0.365 7.15e-01 # 4 bed -0.1236 0.0167 -7.411 2.03e-13

Inference for Linear Regression

DataCamp

Inference for Linear Regression

INFERENCE FOR LINEAR REGRESSION

Let's practice!

DataCamp

Inference for Linear Regression

INFERENCE FOR LINEAR REGRESSION

Summary Jo Hardin Professor, Pomona College

DataCamp

Linear regression as model it estimates an underlying population model it might be linear or might need variable transformations all of LINE conditions should be checked other variable relationships should be carefully considered

Inference for Linear Regression

DataCamp

Inference for Linear Regression

Linear regression as an inferential technique hypothesis testing using a mathematical model (t-tests) hypothesis testing using randomization tests confidence intervals using a mathematical model confidence intervals using bootstrapping

DataCamp

Inference for Linear Regression

INFERENCE FOR LINEAR REGRESSION

Let's practice!