**Introduction**

In regression analysis, variables can be endogenous for several reasons, including omitted variable bias, measurement error, and simultaneity / reverse causation. One example from the previous post was that of unobserved ability in the determination of wages. Overestimation for the returns occurs when omitting unobserved ability in the analysis of education’s impact on wages.

**Useful books for understanding material: **

- Mostly Harmless Econometrics
- Introductory Econometrics: A Modern Approach
- Applied Econometrics with R

**The Hausman Test**

The Hausman Test for endogeneity can help us determine whether or not there is some form of omitted variable bias in this regression:

Since there is a suspicion that education (educ) suffers from omitted variable bias in the form of unobserved ability, we choose fathers’ and mothers’ schooling as instrumental variables. Parent’s knowledge is likely not to affect the wages of their children, but your parent’s education is a good predictor of your education and genetic transmission of intellectual ability. Thus parent’s education is potentially a useful instrumental variable. We can test this assumption that father and mother education are reliable instruments by running a reduced form regression, with educ as the independent variable and all exogenous variables, including the instruments and the explanatory variables.

The F-test above shows that, in fact, fathers’ and mother’s education is both statistically significant in determining their offspring’s educational attainment. The next step is to take the residuals of the reduced form equation and those residuals back into the structural equation. The structural equation is the first relationship that we care about Testing the statistical significance of the coefficient on the residuals in the structural equation is the Hausman Test.

**Interpreting the Hausman Test**

The null-hypothesis is that ‘resid’ is zero and that therefore education is exogenous. This hypothesis is rejected at the 10% level, but not the 5% level. This outcome is a border-line case, but for the sake of completeness, we will use the 10% significance level to reject the null hypothesis that ‘resid’ is zero and thus that education is exogenous. In other words, there is evidence that education is endogenous.

Given that we have selected what we believe to be a useful instrument:

- Parent’s education is related to offspring education
- Parent’s education is unlikely to be related to their offspring’s wages.

**Estimating the Instrumental Variable Regression Model**

The next step is to estimate the model using parent’s education as instruments for people in the sample who earn wages since we rejected the null-hypothesis that ‘resid’ was zero at the 10% level in the previous regression.

**Concluding Remarks**

The Hausman Test can determine whether or not one of the explanatory variables in a regression suffers from endogeneity (omitted variable biases, measurement error, or reverse causality). The Hausman test found such endogeneity in the form of omitted variable bias.

The correct regression to run is the instrumental variable regression if you reject the null hypothesis at the 10% level. Running the IV regression, one finds that each year of education increases wages by 6%.

If one believes that the 10% level is too generous, then decide on using the 5% significance level, we would not reject the null hypothesis that ‘resid’ is zero; thus, we would not reject the hypothesis that education is exogenous. This alternative conclusion would lead us to use the original OLS estimate of an 11% yearly return to schooling.

**Useful books for understanding material: **

- Mostly Harmless Econometrics
- Introductory Econometrics: A Modern Approach
- Applied Econometrics with R

**About the Author**

JJ Espinoza is Senior Full Stack Data Scientist, Macroeconomist, and Real Estate Investor. He has over ten years of experience working in the world’s most admired technology and entertainment companies. JJ is highly skilled in data science, computer programming, marketing, and leading teams of data scientists. He double-majored in math and economics at UCLA before going on to earn his master’s in economics, focusing on macro econometrics and international finance.

**More on JJ:**