Marriage and Its Impact on Wages: An Application of Arrelano-Bond Dynamic Panel-Data Estimation

Previous research has found that there is a wage premium associated with marriage and/or belonging to a union.  Marriage might make a man more likely to take work more serious thus increasing his desire for increased productivity and wages.  Children might come into the picture and push a man to work harder for promotions and titles to provide for his children.  Both of these arguments imply that marriage might cause a man to earn higher wages, but what if it was higher wages that made a man more likely to get married in the first place?  The reverse causality problem arises in determining if marriage raises wages because it is very likely that financially established men are more attractive husbands.  Thus the correlation between marriage and wages, is realized because successful men are more likely to get married than those who are struggling with their finances.  Another thing to consider are unobserved or unmeasurable characteristics that can lead to both higher changes of being married and higher wages.  Previous research indicates that people rated as above average looking tend to make more money.  The methodology used in this post will address and account for these econometric and statistical issues and provided unbiased estimates of the marriage premium despite the concerns expressed above.  Dynamic panel regression methods have been created especially do deal with these kinds of problems.

According to my regression estimates,  being married increases a man’s wages by about 16.4%.  The methodology used to derive this estimate is free from the reverse causality discussed earlier, serial correlation in the wage equation which might biased our estimates of the marriage premium, and and time invariant differences in men that might cause higher wages and higher probability of marriage.

Using a data from 651 individuals over the course of 7 years, found in Introductory Econometrics:  A Modern Approach, this post will use dynamic panel methods to analyze the impact that marriage has on wages.  This model will control for some of the problems and issues expressed in the previous paragraph by using lags as instruments in the famous Arellano-Bond Estimator.

Even though we have a panel data model, we can’t use fixed or random effects because wages have a high level of autocorrelation:  their past values are highly correlated with future values.  Since wages is going to be our independent variable, we can use it’s lag to correct for this problem.  Technically, this model takes the first difference of the dependent variable on the first difference of explanatory variables AND the first difference of the lagged dependent variable as a regressor, as seen in the equation below:

The idea is that we can continue to add lags to eliminate heterogeneity, but we must find a limiting mechanism for the number of lags used as instrumental variables.  There is a two step Generalized Method of Moments method to getting the correct number of lags, here are the steps:

  1. Modeling the variance-covariance matrix under homoskedasticity, but also incorporates the obvious serial correlation in the first difference of the error term.
  2. Using the residuals from the first step to optimize the weighting matrix which will be used to weight the second stage of the regression

To begin let’s estimate a pure time series model:

Pure Time Series Model


Notice that the output list 22 instruments, these are the lagged coefficients of the dependent variables across 6 years.  There are only 6 years worth of data because first differencing eliminates one of the observations:  At time 7 we can use all the previous 5 lags (in first difference) as instrumental variables.  Similarly, at time 6 we can use 4 lagged values (in first difference) as instruments.  We can see that the number 21 comes from adding up all the lagged terms used as instruments at each time t-1, 6+5+4+3+2+1=21. Also notice that the model used robust standard errors to capture the serial correlation in the estimate.

Two Step General Method of Moments estimator

Notice, that with the two step estimation the coefficient on lagged wages became smaller, but it’s statistical significance changed.  Now for the interesting part, using what we did in the regression above but adding binary variables for union and marriage as strictly exogenous variables.  This will get us a better estimate of how marriage impacts wages after controlling for the premiums granted to union members.  Using this methodology will also account for serial correlation in the error term and unobserved heterogeneity.

In the regression above a two step GMM method was used called the Arellano-Bond Dynamic Panel-Data Estimation.  The maximum number of lag for the independent variable was restricted to 2 so with 7 years of data the regression has 14 instruments in the form of lagged dependent variables.  The standard errors are clustered by individual to provide a further buffer for heterogeneity or other serial correlation.

According to my regression estimates,  being married increases a man’s wages by about 16.4%.  The methodology used to derive this estimate is free from reverse causality because higher earning men are more also likely to be married, serial correlation in the wage equation which might biased our estimates of the marriage premium, and and time invariant differences in men that might cause higher wages and higher probability of marriage.

Returns to Education: IV Regression to Correct for Omitted Variable Biased

Education enables many people to earn more money and obtain employment that is reserved for college graduates.  University graduates normally obtain higher salaries upon graduation and typically enjoy higher life time wages with lower rates of unemployment compared to people who didn’t attend college. If one runs looks at the correlation between education and wages a pretty strong statistical relationship can be established.  An estimate can be made of the returns to education based on this simple regression, but there is a problem with this simplistic approach to the problem.  The model below which represents a simple log-linear regression of education on the logarithm of wages suffers from omitted variable biased.

The omitted variable biased is in the form of unobserved ability in people.  Unobserved ability is correlated with the level of education that an individual attains;  higher intellectual ability makes the cost of education lower and thus increases the probability of graduating from college.  Also individuals with more ability also tend to be higher wages.  A more formal treatment of this omitted variable biased is show in the equations below.

Empirical Analysis (Data Source-Introduction to Econometrics)

Description of the variables and summary statistics are provided in Figure 1 and Figure 2

Figure 1:  Description of Variables (Click to Enlarge)

Figure 2:  Summary Statistics (Click to Enlarge)

Ordinary Least Squares Regression of biased equation (1)

Figure 3:  Biased OLS Regression on percentage increase in wages per year of education (Click to Enlarge)

Choosing an Instrumental Variable

Instrumental variables need to be correlated with with one of the exogenous variables, in this case education, but must not be correlated with the independent variable in this case wages.  A decent instrumental variable for unobserved ability would be a persons father’s education.  Father’s education could be related to his off springs education since unobserved ability can be a based on genetically AND one can argue that a father’s education has no impact on his off springs wages, these conditions are shown in equations the covariance conditions below.

Figure 4:  Regression of Explanatory Variable (X) on Instrumental Variable (Z) and test for their zero correlation (click to enlarge)


Figure 5:  Instrumental Regression to correct for Omitted Variable Biased.

Major Revalation from IV Regression to Correct for Omitted Variable Biased

Return to education in the form of wages was positive and statistically significant in the biased OLS estimate, but after using a wage earners’ father’s information as an instrumental variable to correct for omitted variable biased in the form of unobserved ability, the return to education in the form of wages became statistically insignificant.