Logistic Regression

Report 1 Downloads 274 Views
Path Analysis

Logistic Regression Modeling a Categorical Outcome Variable

Path Analysis

In This Module In this module, you will learn: – Descriptive statistics to summarize binary data – Logistic transformation or logit – Logistic regression model for a binary outcome – Computation of probabilities from logistic regression equations – Interpretation of the logistic regression model using the odds and odds ratio

Path Analysis

Examples of Binary Responses • • • • • •

Head or tail from a coin flip Correct/incorrect for a math test item High school dropout (yes/no) Drug use (yes/no) Illness or cancer (yes/no) Smoking (yes/no)

Path Analysis

How to summarize the binary responses • Probability of success p(y=1) e.g., probability of being a smoker = .75 • Odds 𝑝(𝑦 = 1) 1 − 𝑝(𝑦 = 1) e.g., odds of being a smoker = .75/.25 = 3 • Odds ratio for a measure of association 𝑝 𝑦 = 1 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝 𝑦 = 1 𝑚𝑎𝑙𝑒 / 1 − 𝑝 𝑦 = 1 𝑚𝑎𝑙𝑒 1 − 𝑝 𝑦 = 1 𝑓𝑒𝑚𝑎𝑙𝑒 = odds of male /odds of female e.g., probability of a male being a smoker = .75 probability of a female being a smoker = .50 OR = (.75/.25)/(.50/.50) = 3

Path Analysis

Logit • Log of odds 𝑝 𝑦=1 log⁡( ) 1−𝑝 𝑦 =1 • Logit transformation 𝑝 𝑦 = 1 = 𝑎 + 𝑏𝑥

logit

logit[𝑝 𝑦 = 1 ] = 𝑎 + 𝑏𝑥

x

x

Path Analysis

Logistic Regression logit 𝑝 𝑦 = 1

= 𝑎 + 𝑏1 𝑥1 + 𝑏2 𝑥2 + ⋯ + 𝑏𝑘 𝑥𝑘

 Modeling a linear relationship between log odds or logit of being a success with a set of predictors

logit 𝑝 𝑦 = 1

= 6.7441 − 0.2654𝑎𝑔𝑒 + 1.6549𝑚𝑎𝑙𝑒

where y=1 means being a smoker and females are the reference.

 How do we interpret the intercept and regression coefficients?

Path Analysis

Interpretation of Logistic Regression Parameters logit 𝑝 𝑦 = 1

= 6.7441 − 0.2654𝑎𝑔𝑒 + 1.6549𝑚𝑎𝑙𝑒

 Antilog of intercept 𝑒 6.74 = 845.56: The odd of being a smoker when age = 0 for a female (not meaningful and not interpretable; the minimum age of the sample is 20)  Antilog of regression coefficients: Odds ratio 𝑒 −0.2654 = 0.767: the odds of being a smoker is 0.767 times lower if age increases by one year 𝑒1.6549 = 5.233: the odds of being a smoker is 5.233 times higher for males than for females

Path Analysis

Predicted Probability 𝑒 𝑎+𝑏1 𝑥1 +𝑏2 𝑥2 +⋯+𝑏𝑘 𝑥𝑘 𝑝 𝑦=1 = 1 + 𝑒 𝑎+𝑏1 𝑥1 +𝑏2 𝑥2 +⋯+𝑏𝑘 𝑥𝑘 logit 𝑝 𝑦 = 1

= 6.7441 − 0.2654𝑎𝑔𝑒 + 1.6549𝑚𝑎𝑙𝑒

 The predicted probability of being a smoker for a male of age 30 𝑒 6.7441+(−0.2654∗30)+(1.6549∗1) 𝑝 𝑦=1 = = .607 6.7441+(−0.2654∗30)+(1.6549∗1) 1+𝑒

Path Analysis

SAS Input: data smoker; If male is treated as categorical, use input smoke male age; CLASS statement. By default, the last category of a categorical predictor is cards; the reference. In this example, male is 0 0 20 treated as continuous. 0 0 24 0 0 36 … By default, the first 1 1 28 category of the outcome (0 1 1 27 in this case) is the event of 1 1 30 success. Change the default ; using (event=‘1’). proc logistic data = smoker; model smoke(event=‘1’)= age male; output out= predict p= pi_hat; run; Save the predicted probabilities.

Path Analysis

SAS Output: The LOGISTIC Procedure Model Information Data Set Response Variable Number of Response Levels Model Optimization Technique

WORK.SMOKER smoke 2 binary logit Fisher's scoring

Number of Observations Read Number of Observations Used

41 41

Response Profile

Ordered Value

smoke

Total Frequency

1 2

0 1

23 18

Probability modeled is smoke=1.

Path Analysis

SAS Output: Analysis of Maximum Likelihood Estimates

Parameter Intercept age male

DF 1 1 1

Standard Estimate 6.7441 -0.2654 1.6549

Wald Error Chi-Square

2.6565 0.0914 0.8325

6.4449 8.4311 3.9517

Pr > ChiSq 0.0111 0.0037 0.0468

Odds Ratio Estimates

Effect

Point Estimate

age male

0.767 5.233

95% Wald Confidence Limits 0.641 1.024

0.917 26.751

If confidence limits do not include 1, that indicates statistical significance at alpha = .05.