# What is Multicollinearity?

• What is Multicollinearity?
• How Multicollinearity is related to correlation?
• Problems with Multicollinearity.
• Best way to detect multicollinearity in the model.
• How to handle/remove Multicollinearity from the model?

# Multicollinearity

Multicollinearity occurs in a multilinear model where we have more than one predictor variables. So Multicollinearity exists when we can linearly predict one predictor variable (note not the target variable) from other predictor variables with a significant degree of accuracy. It means two or more predictor variables are highly correlated. But not the vice versa means if there is a low correlation among predictors then also multicollinearity may exist.

# Multicollinearity vs Correlation

Correlation coefficient tells us that by which factor two variables vary whether in the same direction or in a different direction. in other words, correlation coefficient tells us that whether there exists a linear relationship between two variables or not and the absolute value of correlation tells how strong the linear relationship is. correlation coefficient zero means there does not exist any linear relationship however these variables may be related non linearly.

# Problems with Multicollinearity

When multicollinearity exists in the model, it could not calculate the regression coefficient confidently. Means there could be multiple options for regression coefficient which will not have statistically any meaning.

• y = x1 + x2 ?
• y= 2×1 ?
• y=2×2?
• y=2.5×1 — .5×2?

# Best way to detect multicollinearity

Stepwise Regression prevents multicollinearity problem to a great extent, however, the best way to know if multicollinearity exists is by calculating variance inflation factor (VIF).

• x1 = b1 + b2x2 + b3x3
• x2 = b1 + b2x1 + b3x3
• x3 = b1 + b2x1 + b3x2

--

--