What is Multicollinearity?

Definition: Multicollinearity is a statistical phenomenon in which multiple independent variables show high correlation between each other. In other words, the variables used to predict the independent one are too inter-related.

What Does Multicollinearity Mean?

What is the definition of multicollinearity? Collinearity is an undesired situation for any statistical regression model since it diminishes the reliability of the model itself. If two or more independent variables are too correlated, the data obtained from the regression will be disturbed because the independent variables are actually dependent between each other.

Multicollinearity has different causes: one of the most common is the inclusion of variables that result from mathematical operations between two or more of the other variables in the model, e.g. net profit, which is computed by deducting total expenses from total revenues. Also, if the same kind of variable is used for the model, collinearity will always appear e.g. if you are measuring sales in both units and monetary figures the variable has the same kind.

Here’s a practical example of the concept.

Example

Medical Researchers LLC is a firm that provides research services for the health care industry. Most of its services are statistical; their goal is to identify patterns and trends in diseases by designing statistical models that allows them to understand the impact that different variables have in the occurrence of the disease.

The company is currently investigating skin cancer and as independent variables for the study they used current age, weight, height, profession, and age of appearance. After running the model, they identified some collinearity issues between the variables. The company that paid for the research project is asking the Project Manager to explain this phenomenon?

As we stated above, collinearity is the situation in which two or more dependent (predictor) variables are too related to each other, this phenomenon will reduce the reliability of the statistic model and it should be addressed before the model is put to use for real life situations. Thus, the studies should adjust the variables used in order to get a more accurate result.

Summary Definition

Define Multicollinearity: Multi-collinearity means a correlation between two variables that causes confusion in a study because the variables are too closely related.


error: Content is protected !!