What is stepAIC in R?

Ashutosh Tripathi
3 min readJun 16, 2019

In R, stepAIC is one of the most commonly used search method for feature selection. We try to keep on minimizing the stepAIC value to come up with the final set of features. “stepAIC” does not necessarily mean to improve the model performance, however, it is used to simplify the model without impacting much on the performance. So AIC quantifies the amount of information loss due to this simplification. AIC stands for Akaike Information Criteria.

If we are given two models then we will prefer the model with lower AIC value. Hence we can say that AIC provides a means for model selection. AIC is only a relative measure among multiple models.

AIC is similar adjusted R-squared as it also penalizes for adding more variables to the model. the absolute value of AIC does not have any significance. We only compare AIC value whether it is increasing or decreasing by adding more variables. Also in case of multiple models, the one which has lower AIC value is preferred.

So let's see how stepAIC works in R. We will use the mtcars data set. First, remove the feature “x” by setting it to null as it contains only car models name which does not carry much meaning in this case. Also then remove the rows which contain null values in any of the columns using na.omit function. It is required to handle null values otherwise stepAIC method will give an error. Then build the model and run stepAIC. For this, we need MASS and CAR packages.

The first parameter in stepAIC is the model output and the second parameter is direction means which feature selection techniques we want to use and it can take the following values:

  • “both” (for stepwise regression, both forward and backward selection);
  • “backward” (for backward selection) and
  • “forward” (for forward selection).

At the very last step stepAIC has produced the optimal set of features {drat, wt, gear, carb}. stepAIC also removes the Multicollinearity if it exists, from the model which I will explain in the next coming article.

So in the previous post, Feature Selection Techniques in Regression Model we have learnt how to perform Stepwise Regression, Forward Selection and Backward Elimination techniques in detail. StepAIC is an automated method that returns back the optimal set of features.

This article first appeared on the “Tech Tunnel” blog at https://ashutoshtripathi.com/2019/06/07/feature-selection-techniques-in-regression-model/

Recommended Articles:

--

--