Model Selection

Report 6 Downloads 72 Views
Model Selection

Model Selection

Model Selection

In This Module In this module, you will learn:  Model Selection in Regression Analysis - Substantive knowledge - Sequential methods (Forward, Backward, Stepwise)  Example of Model Building Algorithms  The phenomenon of R2 shrinkage and methods to estimate it  Methods to describe the accuracy of predictions

Model Selection

Introduction As soon as a researcher employs two or more predictor variables, the models expand to what are generally referred to as multiple regression. As we have seen, regression analysis generates a new variable 𝑌, which is simply a linear combination or composite of the predictor variables, a composite that is maximally correlated with the criterion Y. Multiple regression applications can be divided into those with goals of prediction and those with goals of explanation.

Model Selection

Introduction The rationale behind the multiple predictor approach is an intuitively appealing one in that more detailed information about individuals or sampling units typically permits a researcher to make more precise predictions about performances than could be done with one piece of information. It is assumed that several predictors are correlated with the criterion in such a manner that more efficient prediction can be obtained by using all or some of the predictors as a set, rather than any one of them alone. Statistical criteria for evaluating the effectiveness of this assumption is the topic of this module.

Model Selection

Distinguishing Prediction from Explanation: In Regression

Prediction

Explanation

 Focus on R2

 Focus on regression weights (b)

 Does not directly require theory

 Intimately involved with theory

 More conceptual closure: One knows (more or less) how well one has predicted or will predict

 Little conceptual closure: the current explanation is only good until a better theory comes along

 Pragmatic intentions

 Practical applications are not necessary (enhanced understanding for its own sake)

Model Selection

Distinguishing Prediction from Explanation: In Regression

Prediction

Explanation

Goal:

Goal:

Obtain the largest value of R2 with the smallest number of predictors.

Identify the important variables and estimate the magnitude of the processes related to an outcome.

VS

Model Selection

Variable Selection Research has sufficiently demonstrated that some variables, alone or in combination, have greater influence on regression estimates than others. The degree to which we can improve the prediction depends on the intercorrelations that exist among the predictors themselves, in addition to the correlation that exists between each predictor and the criterion of interest.

Model Selection

Variable Selection When a number of variables in a regression analysis do not appear to contribute significantly to the predictive power of the model, we can try to find a suitable subset of important or useful variables; that is, those variables that produce the minimum error sum of squares, or equivalently, the maximum R2. This could be accomplished by examining all the possible subsets of predictors. However, this procedure, for k independent variables would require computing statistics for

2k equations!

Model Selection

Prediction Problem Take for instance the following example for predicting achievement scores in 4th grade.

Special Psychological Assessments  Attitude toward school in grade 3  Locus of control in grade 3  Self-concept in grade 3

1024 possible regression equations! (N of equations = 2K for finding optimum subsets for all subset sizes 210 = 1024).

Model Selection

Multiple Regression in Predictive Research There are some procedures that have been developed for assessing predictor variables’ contributions to the regression equation:  Substantial Knowledge (Theory)

 Sequential Methods (Empirical) - Forward Selection - Backward Selection - Stepwise Selection

Model Selection

Substantive Knowledge The researcher’s knowledge of the area under study is the most important tool when selecting a subset of variables for use in a model (Weisberg, 1985). The researcher needs to be judicious in the selection of predictors. Many researchers have abused selection procedures by “throwing everything in the hopper,” often merely because the variables are available.

Model Selection

Substantive Knowledge: Working with Small Number of Predictors

Cohen (1990) alerts against the use of indiscriminate use of variables. There are several good reasons for generally preferring to work with a small number of predictors: 

Principle of scientific parsimony



Reducing the number of predictors improves the n/k ratio (providing a more stable equation)



After a certain point, the incremental validity of new variables is usually very low

Model Selection

Sequential Methods The usefulness of variables in a predictive study may be empirically determined. Forward, Stepwise, and Backward selection procedures involve a partialling-out process; that is, they look at the contribution of a predictor with the effect of the other predictors partialled out or held constant.

These are step-type procedures that add or delete variables one at a time until, by some criterion, a reasonable stopping point is reached.

Model Selection

Semi-partial Correlation Recall from the last module, the semipartial correlation. In multiple regression, we wish to partial the independent variables (or predictors) from one another but not from the dependent variable. In other words, we want to leave the dependent variable intact and not partial any variance attributable to the predictors. The squared semi-partial correlation between y and variable X1, with variable X2 partialled from variable X1.

Y

X1

X2

The squared multiple correlation between y and predictor variables, X1 and X2.

The squared correlation between y and variable X2

Model Selection

Semi-partial Correlation

R2y.12…k

denotes the squared multiple correlation for the k predictors. Consider the case of one dependent variable and three predictors. If the predictors correlate exactly zero with each other, the squared multiple correlation is the sum of the squared Pearson Product Moment Correlation Coefficients (PPMCCs) between each predictor and the criterion variables:

Y

X1

X3 X2

2

2

2

2

R y .1 2 3 = r y 1 + r y 2 + r y 3

Model Selection

Semi-partial Correlation If the predictors are correlated with each other, the squared multiple correlation is obtained as the sum of a series of sequentially higher ordered squared semi-partial correlations. The squared correlation between y and variable 1 The squared multiple correlation between y and variables 1, 2, and 3

The squared semi-partial correlation between y and variable 2, with variable 1 partialled only from variable 2.

The squared semi-partial correlation between y and variable 3, with variables 1 and 2 partialled only from variable 3.

Model Selection

Forward Regression Forward selection begins by finding the variable that produces the optimum onevariable model.

Start with the single best predictor

In the second step, the procedure finds the variable that, when added to the already chosen variable, results in the largest reduction in the residual of sumof-squares (largest increases of R2).

Add the best available predictor given what is already in the equation

*Quit

NO

Significant  R2?

YE S

The third step finds the variable that when added to the two already chosen, given the minimum residual sum of squares (maximum R2).

*The non-significant predictor is not included

The process continues until no variable considered for addition to the model provides a reduction in sum of squares considered statistically significant at a level specified by the researcher. When the change in R2 is not statistically significant the last variable should be removed because it did not contribute statistically to the prediction. With this method, once a variable contributes significantly to the R2, it stays in the model.

Model Selection

Forward Regression Algorithm: Step 1

Select the predictor with the largest squared PPMCC:

Model Selection

Forward Regression Algorithm: Step 2

Select the predictor with the largest squared first-order semipartial correlation, partialling the predictor already present. We now have a two-predictor equation, predicting y from X1 and X2.

Model Selection

Forward Regression Algorithm: Step 3

Select the predictor with the largest squared second-order semipartial correlation, partialling the predictors already present:

Continue adding variables until you fail to reject on the 2 test. A maximum of k regression models will be estimated and examined. With 10 predictors this would lead to 10 equations (at most) vs. 1024 equations if all possible equations were considered.

Model Selection

Backward Regression Backward selection begins by computing the regression with all independent variables specified in the model statement (full regression equation). The procedure deletes from that model the variable whose coefficient has the largest p value (smallest partial F value).

Start with all predictors

Remove the predictor with the smallest (k-1)th order squared semipartial correlation (based on the number of predictors in the equation)

*Quit

YE S

* The significant predictor is put back into the model

Significant  R2?

The resulting equation is examined for the variable now contributing the least, which is deleted, and so on.

NO

The procedure stops when all coefficients remaining in the model are statistically significant at a level specified by the researcher. With this method, once a non-significant variable has been deleted, it is deleted permanently.

Model Selection

Backward Regression Algorithm: Step 0

Select all regressors in a regression equation:

Model Selection

Backward Regression Algorithm: Step 1

Select the smallest of the k different (k-1)th order squared semipartial correlations (i.e., for 10 regressors, there are 10 different 9th order semi-partials):

Model Selection

Backward Regression Algorithm: Step 2

Select the smallest of the k-1 different (k-2)th order squared semi-partial correlations (i.e., for 9 remaining regressors, there are 9 different 8th order semi-partials):

Continue removing variables until you reject on the 2 test. Rejecting this test means there is not a variable that can be removed without statistically significantly decreasing the ability to predict the criterion variable. A maximum of k regression models will be estimated and examined. 10 equations (at most) vs. 1024 equations.

Model Selection

Stepwise Regression The Stepwise selection begins like forward selection, but after a variable has been added to the model, the resulting equation is examined to see if any coefficient has a sufficiently large p value to suggest that a variable should be dropped.

Start with the single best predictors

Add the best available predictor given what is already in the equation

NO

*Quit

* The non-significant predictor is not included

Significant  R2?

YE S

Are all other predictors still significant?

NO

Remove non-contributing predictors

YE S

Select the predictor with the largest squared first-order semi-partial correlation, partialling the predictor already present. This procedure continues until no additions or deletions are indicated according to significant levels chosen by the researcher.

Model Selection

Stepwise Regression Algorithm: Step 1

Select the predictor with the largest squared PPMCC:

Model Selection

Stepwise Regression Algorithm: Step 2

Select the predictor with the largest squared first-order semipartial correlation, partialling the predictor already present:

Model Selection

Stepwise Regression Algorithm: Step 3

Select the predictor with the largest squared second-order semipartial correlation, partialling the predictors already present:

Continue adding variables until you fail to reject on the

test.

Model Selection

Case 1: Prediction of Self-Concept Scores

A researcher is interested in predicting student selfconcept scores in 7th grade, based on information gathered while the child is in kindergarten. The researcher obtained a sample of 200 kindergarten children, and measured each child on a variety of variables.

      

Self concept in kindergarten Family cohesion Pre-reading inventory IQ Fundamental math concepts Fundamental science concepts Physical dexterity

When these 200 children were in the 7th grade, the researcher administered a self-concept inventory to obtain values of the criterion variable.

Model Selection

Case 1: Analysis

Use the forward, backward and stepwise regression algorithms to compute the “best” equation predicting 7th grade self-concept scores. In actual research, only one method is typically used, but we will use all three to give us practice. Research questions: 1. Which variables should be used to predict 7th grade self-concept? 2. What is the regression equation that should be used for prediction? 3. What is the value of R2 in this sample?

Model Selection

Case 1: Analysis

Since the computations are complex, we will not attempt to illustrate the computational (algorithms) procedures as we have been doing previously. Instead, we will emphasize computer printouts to help us in the analysis and interpretation. We will run these analyses in SAS, implementing these procedures as options in PROC REG. Let’s start with the full regression model.

Model Selection

SAS Code: Full Regression data one; input Concept7 6-9 conceptK 15-18 fam_coh 24-27 Pre_read 33-36 IQ 42-45 math 51-54 science 60-63 pd 69-71; Label concept7 = '7th Grade Self Concept' conceptk = 'Kindergarten Self Concept' fam_coh = 'Family Cohesion' pre_read = 'Pre-Reading Score' IQ = 'Estimated IQ' Math = 'Pre Math Score' Science = 'Early Science Thinking' PD = 'Physical Dexterity‘; Cards; proc reg simple; model Concept7 = conceptK fam_coh Pre_read IQ math science pd / stb; run;

simple requests simple descriptive statistics for all variables in the analysis model criterion variable = independent variables / options / stb requests that the standardized regression coefficient be printed

Model Selection

SAS Output: Descriptive Statistics

Before interpreting the results of PROC REG, it is important to review descriptive statistics to help verify that no errors were made in keying data or writing the input statement.

The simple statistics table at the top of the output provides means and standard deviations for the eight variables analyzed.

Model Selection

SAS Output: Full Regression for All Variables The 7 predictors account for about 56% of the variance of Concept7 (R2=.5588). Our F statistic would be F(7,192)=34.73, p.05; IQ has the largest p-value and is the variable identified to be removed.

Model Selection

SAS Output: Backward Elimination Step 1

Based on the criterion used, IQ is deleted first, resulting in R2 =.5585. The process will be repeated for the each of the remaining predictors that do not meet the criterion for retention (pre_read and pd); each one will be analyzed to determine which would lead to the smallest reduction of R2 when deleted from the equation.

Model Selection

SAS Output: Backward Elimination Step 2

Based on the criterion used, pre_read is deleted from the equation. Note that R2 =.5556 The process will be repeated to identify an additional predictor that may be deleted. Note that pd does not meet the criterion for retention and will be deleted next. The deletion process will continue as long as predictors identified for deletion results in a loss in predictability deemed not statistically significant.

Model Selection

SAS Output: Backward Method Step 3 Again, based on the criterion used, pd is removed from the equation. Note R2 = .5516

The remaining predictors have probabilities