Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: September 2013
Influential Cases in Regression Leverage: hij the influence of any given observed value (Yi) on any specific predicted value (Pj) Cook's distance: the change in the regression coefficients attributable to the deletion of case j. Dfbeta: the change in individual coefficients that occur when a case is deleted. Mahalanobis’s distance: the distance between an observations’s value on the predictor variables compared to the mean of all cases; multivariate stat
Influential Cases
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: September 2013
Cook’s D
dfBetas for β0 and β1
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: September 2013
Techniques for dealing with outliers 1. Keep it and treat it as any other point 2. Trimming 3. Winsorizing Type I: Assign it a value closer to the center, often 95th percentile or 2 std deviations Type II: Assign it a lesser weight
4. Transformations 5. Robust Statistics
Transformations
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: September 2013
Robust Statistics 1. Rank based L-estimators Median Median Absolute Deviation: MAD{ki} = median{|ki} - median{ki}|} Quantile Regression
2. Trimmed statistics k% trimmed Mean and Std Dev
3. Maximum Likelihood based M-estimators Huber weighting: IRLS
4. Resampling techniques Bootstrap & Jackknife
Advantages and Disadvantages • • • • • •
Trimming & Winsorizing both create bias in parameter estimates and standard errors and undervalue the outlier Winsorizing puts more weight on the full distribution, better in symmetric distributions Effects in the full data set can appear or disappear in winsorized or trimmed data In Trimming data, new outliers can appear Transformations can make interpretation difficult, but full data are retained Many robust statistics are still being researched and/or may not be available for all methods
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: September 2013
Resources and Further Reading Robust Regression: http://www.stata-journal.com/sjpdf.html?articlenum=st0173