This post presents the ordinary least squares assumptions. The assumptions are important in understanding when OLS will and will not give useful results. The objective of the following post is to define the assumptions of ordinary least squares, another post will address methods to identify violations of these assumptions and provide potential solutions to dealing with violations of OLS assumptions.
ASSUMPTION #1: The conditional distribution of a given error term given a level of an independent variable x has a mean of zero.
This assumption states that the OLS regression errors will on average be equal to zero. This still allows for over and underestimations of Y, but that the OLS estimates will fluctuate around the true value of Y.
ASSUMPTION #2: (X,Y) for all n are independently and identically distributed.
The second assumption assumes that the observations of X and Y are not systematically chosen in a way that is biased. Typically randomly selected samples of X and Y are considered to be independent and identically distributed. This assumption is important when considering cases where the aim of the regression analysis is to look at the effects of a treatment X on an outcome Y. If the treatment isn’t randomly assigned there is no guarantee that the outcome Y is caused by X. Suppose that one is evaluating a program that provides job-training to prisoners and would like to evaluate the its success. If the program is voluntary then it is likely that the treatment X isn’t randomly assigned with respect to other factors that effect the success of prisoners after the program. If married first-offenders with children are more likely to participate in the program and are also more likely to have success in the job market after prison then X is not independently and identically distributed which violates this assumption.
ASSUMPTION #3: Large outliers are unlikely.
Outliers are values in the data that are far outside of the general range of the data. The presence of large outliers can make the regression results misleading. The OLS regression results weigh each pair of X,Y equally thus an outlier can greatly effect the slope and intercept of the regression line.
ASSUMPTION #4: No perfect multicollinearity.
Multicollinearity occurs in a multiple regression analysis when one of the independent variables is a linear combination of the other. This makes it so that the estimation of the individual regression parameters impossible. Fundamentally one is asking the regression analysis to answer an unanwserable question, namely what is the effect of variable X on another variable Y after holding a third variable Z constant that is a linear combination of X.