Logistic Regression Modeling a Categorical Outcome Variable
Path Analysis
In This Module In this module, you will learn: – Descriptive statistics to summarize binary data – Logistic transformation or logit – Logistic regression model for a binary outcome – Computation of probabilities from logistic regression equations – Interpretation of the logistic regression model using the odds and odds ratio
Path Analysis
Examples of Binary Responses • • • • • •
Head or tail from a coin flip Correct/incorrect for a math test item High school dropout (yes/no) Drug use (yes/no) Illness or cancer (yes/no) Smoking (yes/no)
Path Analysis
How to summarize the binary responses • Probability of success p(y=1) e.g., probability of being a smoker = .75 • Odds 𝑝(𝑦 = 1) 1 − 𝑝(𝑦 = 1) e.g., odds of being a smoker = .75/.25 = 3 • Odds ratio for a measure of association 𝑝 𝑦 = 1 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝 𝑦 = 1 𝑚𝑎𝑙𝑒 / 1 − 𝑝 𝑦 = 1 𝑚𝑎𝑙𝑒 1 − 𝑝 𝑦 = 1 𝑓𝑒𝑚𝑎𝑙𝑒 = odds of male /odds of female e.g., probability of a male being a smoker = .75 probability of a female being a smoker = .50 OR = (.75/.25)/(.50/.50) = 3
Modeling a linear relationship between log odds or logit of being a success with a set of predictors
logit 𝑝 𝑦 = 1
= 6.7441 − 0.2654𝑎𝑔𝑒 + 1.6549𝑚𝑎𝑙𝑒
where y=1 means being a smoker and females are the reference.
How do we interpret the intercept and regression coefficients?
Path Analysis
Interpretation of Logistic Regression Parameters logit 𝑝 𝑦 = 1
= 6.7441 − 0.2654𝑎𝑔𝑒 + 1.6549𝑚𝑎𝑙𝑒
Antilog of intercept 𝑒 6.74 = 845.56: The odd of being a smoker when age = 0 for a female (not meaningful and not interpretable; the minimum age of the sample is 20) Antilog of regression coefficients: Odds ratio 𝑒 −0.2654 = 0.767: the odds of being a smoker is 0.767 times lower if age increases by one year 𝑒1.6549 = 5.233: the odds of being a smoker is 5.233 times higher for males than for females
The predicted probability of being a smoker for a male of age 30 𝑒 6.7441+(−0.2654∗30)+(1.6549∗1) 𝑝 𝑦=1 = = .607 6.7441+(−0.2654∗30)+(1.6549∗1) 1+𝑒
Path Analysis
SAS Input: data smoker; If male is treated as categorical, use input smoke male age; CLASS statement. By default, the last category of a categorical predictor is cards; the reference. In this example, male is 0 0 20 treated as continuous. 0 0 24 0 0 36 … By default, the first 1 1 28 category of the outcome (0 1 1 27 in this case) is the event of 1 1 30 success. Change the default ; using (event=‘1’). proc logistic data = smoker; model smoke(event=‘1’)= age male; output out= predict p= pi_hat; run; Save the predicted probabilities.
Path Analysis
SAS Output: The LOGISTIC Procedure Model Information Data Set Response Variable Number of Response Levels Model Optimization Technique
WORK.SMOKER smoke 2 binary logit Fisher's scoring
Number of Observations Read Number of Observations Used
41 41
Response Profile
Ordered Value
smoke
Total Frequency
1 2
0 1
23 18
Probability modeled is smoke=1.
Path Analysis
SAS Output: Analysis of Maximum Likelihood Estimates
Parameter Intercept age male
DF 1 1 1
Standard Estimate 6.7441 -0.2654 1.6549
Wald Error Chi-Square
2.6565 0.0914 0.8325
6.4449 8.4311 3.9517
Pr > ChiSq 0.0111 0.0037 0.0468
Odds Ratio Estimates
Effect
Point Estimate
age male
0.767 5.233
95% Wald Confidence Limits 0.641 1.024
0.917 26.751
If confidence limits do not include 1, that indicates statistical significance at alpha = .05.