How Is Education Related to How Many Children a Woman Bears: IV Variables with Panel Data

A woman’s earning potential increases with increased education.  Higher wages increases the opportunity cost of having children.  Obviously, taking a year off to raise children when you are a brain surgeon earning $250,000 requires a greater monetary sacrifice than if you were earning $20,000.  Thus, theoretically at least, higher educational attainment should be correlated with lower number of children.  There is an econometric issue that must be dealt with to ensure the robustness of these estimates.  There can be reverse causality, more children might be causing lower levels of education and thus lower wages.  Our hypothesis is that higher education causes higher wages and thus lower childbirth per woman.  In order to deal with this potential bias in the econometrics, IV variables will be used.  Using the parent’s education of the mothers in the survey’s we can eliminate the effects of this reverse causuality by Instrumental Variable Regression.  The goal of this post is to quantify the decrease in childbirth per woman given an extra year of education.  My estimates find that:   If 100 women earn an extra year of education, as a group, they would be expected to have 12 fewer children over the span of their lifetime.

Using 1,129 observations over the course of 12 years from a data set contained in Introductory Econometrics:  A Modern Approach (Fertil1), this post calculates the impact of education on the number of children a woman bears while controlling for age, race, and differences across time.  In addition to these control variables geographical, rural vs urban, and other city characteristics are also controlled for.

The model will be estimated using ordinary least squares with time dummy variables.  Then the potential issue of education endogeneity will be explored and controlled for by using parent’s education as an instrumental variable.

The OLS regression result for the correlation between education and the number of children a woman has is negative, as expected, and statistically significant.  The interpretation of the coefficient is as follows;  If 100 women earn an extra year of education, as a group, they would be expected to have 12 fewer children over the span of their lifetime.  This is a large impact, but the reverse causality problem linger, so the next regression will use IV of parent’s education to potentially control for this problem.

In order to estimate an IV variable regression using parent’s education of a woman of child bearing age we need to establish instrument endogeneity and relevance. We can argue that the education of a woman’s parents has nothing to do with the number of children she bears.  Using this logic the parental education variables are endogenous in the equation estimating their daughters child bearing factors.  To establish relevance, the potentially endogenous variable, will have to be regressed will all other endogenous variables in the equation.

The regression table above shows that individually, mother’s education and father’s education are positively related to a woman’s education.  These two factors and statistically significant individually.  Next, would be to test the joint statistical significance of parent’s education variables to fully establish instrument relevance.

The F-Stat above shows that we can reject the hypothesis that parent’s education isn’t correlated with a child’s education at any reasonable significance level. The next step is to run the instrumental regression to see how the estimates change relative to the first OLS regression.

Using instrumental variables to control for reverse causality increased the coefficient of education by about 25% in absolute value.  The interpretation would now be stated this way, if 100 women earned 1 more year of education then, as a group, they would be expected to have 15 fewer children.  This is 3 fewer than the 12 estimated by OLS regression above.  The question is, which one of these models is better?  This can be answered by using a Hausman Test.  First, we run the reduced form and take it’s residuals, then we place these residuals in the structural equation.

First, the reduced form (Potentially endogenous on all exogenous variables)

Second, take the residuals of the regression above…

Finally, use these residuals as regressors in the structural equation (original OLS equation)…

Under the null hypothesis 2SLS is the better estimate.  We cannot reject the null-hypothesis that there is no endogeneity, thus we are fairly confident that there is reverse causality in the education variable with respect to child bearing.  The 2SLS estimate was a better model for this regression model.  Thus an extra year of education for 100 women would reduce the number of children they would have by about 12.