# The Least Squares Assumptions Useful Books for This Topic:

This post presents the ordinary least squares assumptions. The assumptions are critical in understanding when OLS will and will not give useful results. The objective of the following post is to define the assumptions of ordinary least squares. Another post will address methods to identify violations of these assumptions and provide potential solutions to dealing with violations of OLS assumptions.

ASSUMPTION #1: The conditional distribution of a given error term given a level of an independent variable x has a mean of zero.

This assumption states that the OLS regression errors will, on average, be equal to zero. This assumption still allows for over and underestimations of Y, but the OLS estimates will fluctuate around Y’s actual value.

ASSUMPTION #2: (X,Y)  for all n are independently and identically distributed.

The second assumption assumes that the observations of X and Y are not systematically biased. Typically randomly selected samples of X and Y are considered to be independent and identically distributed. This assumption is essential when considering cases where the regression analysis aims to examine the effects of a treatment X on an outcome Y. If the treatment is not randomly assigned, there is no guarantee that the X is causing Y. Imagine evaluating a program that provides job training to prisoners and would like to assess its success. If the application is voluntary, likely, treatment X is not randomly assigned. If married first-offenders with children are more likely to participate in the program and are also more likely to have success in the job market after prison, then X is not independently and identically distributed, which violates this assumption.

ASSUMPTION #3: Large outliers are unlikely.

Outliers are values in the data that are far outside of the full range of the data. The presence of significant outliers can make the regression results misleading. The OLS regression results weigh each pair of X, Y equally; thus, an outlier can significantly affect the slope and intercept of the regression line.

ASSUMPTION #4: No perfect multicollinearity.

Multicollinearity occurs in multiple regression analysis when one of the independent variables is a linear combination of the other. This correlation between inputs makes it so that the estimation of the individual regression parameters impossible. Fundamentally one is asking the regression analysis to answer an unanswerable question, namely, the effect of variable X on another variable Y after holding a third variable Z constant that is a linear combination of X.

Book List: 