Education and Experience’s Impact on Wages: Heckman Correction for Non-Random Sampling

There are many problems in economics where a randomized control group cannot be constructed or a seemingly random assignment is fraught with biased introduced by the researcher. A famous example was the Truman vs. Dewey election poll conducted by Gallup Polls.  The poll showed that Dewey would win by a landslide, a statistically significant margin of victory was virtually guaranteed by the statisticians at Gallup.  The election turned out heavily in favor of Truman, by a landslide!  What caused such a huge miss?  The pollsters, well versed in random sampling techniques, started using the telephone to randomly call voters and gather the sample they required for hypothesis testing. The problem was that in 1948, there was still a high correlation between income and telephone ownership.  People who had done well under Dewey had telephones, thus there was a huge bias introduced into the sample that had Dewey winning by a landslide, but the real population of voters that turned out were not like the telephone owners.  Random sampling is crucial to hypothesis testing, but even under the most apparent random assignment one must be weary that some kind of selection biased has not crept in unbeknown to the researcher.

In order to deal with this estimation biased, James Heckman from the University of Chicago devised an ingenious way of dealing with this problem.  His method, which won him the Nobel Prize in 2000, uses a two-stage estimation method to deal with a known selection biased.  The objective of this post is to explain the Heckman Error correction model in the context of labor economics problem which plagued researchers until the formulation of this method.  Before the econometric estimates, background of the problem and a theoretical construct of Heckman’s model will be introduced.

Background on the Selection Problem in the Labor Market

Trying to understand how education and experience are correlated with wages is complicated because of non-random selection in the labor market.  People work because the wages they are offered are greater than what economist call their reservation wage.  The reservation wage is the minimum wage a person would be willing to work for, if wages are below this amount then people would leave the labor market.  This leaves researchers with wages only of those people who are offered higher wages than their reservation wages, but this can introduce non-random selection biased.  Since education and experience are related to the wages people are offered, we are selecting people into the labor market with higher education and experience than what is present in the total population.  The people with less experience and education make up a larger part of those unemployed or completely out of the labor force.  This causes problems when trying to estimate the impact of education and experience on wages, theoretically it would mean that estimates for the correlation between education and experience would be biased upward.  This assertion and result will be tested in the empirical section which follows the theory.

Theory of Sample Selection Biased and Heckman Error Correction Model

Empirical Example in the Labor Market

OLS regression shows that the return for one year of education is approximately 10%

Estimation with the Heckman two-step estimator shows a different picture…

  • Notice the upper right hand side of the output; 325 of the women out of 753 are out of the labor force.
  • The “Select” section is a probit estimate of observing wages given a set of explanatory variables
  • The “lwage” section is the estimate using Heckman; education’s impact on wages has increased by almost a .2 percentage points.
  • Notice that the regressors of the “lwage” equation are a strict subset of the “select” equation.
  • Lambda is positive as expected; a positive correlation between the error term in the log (wage) equation and the selection equation for wage observation.
  • The hypothesis that the coefficient on lambda is not-statistically significant from zero, z = .31, hence the sample selection biased that was expected  did not materialize in this data set.  OLS regression would seem appropriate in this case.

Who is More Likely to Get Called for a Job Interview – Lakisha or Emily?-: Quasi Experimental Design and Analysis

Discrimination in the labor force is a problem that economist are interested in and have done substantial research in.  The inefficiencies created by discriminating against prospective employees based on factors not related to productivity damages the very businesses who engaged in this practice.  Not to mention the undue burden of  longer unemployment, less offers for an interview, and potentially less pay for groups who are discriminated against.  Dr. Bertrand and Dr. Mullainathan from the University of Chicago conducted a randomized experiment where they sent close to 5,000 resumes to potential employers.  The resumes were virtually identical except that some resumes had “black” sounding names like Lakisha or Jamal, while others had “white” sounding names like Emily and Gregory, they sent the resumes and waited to see how many offers for an interview were given to those people with black sounding names compared to white sounding names.  They published their findings in the American Economic Review in 2004, in a paper titled “Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination”.  I have accessed this data from a popular introductory econometrics textbook,  this post replicates pieces of their study for instructional purposes.

As mentioned earlier the data were collected from Stock and Watson’s Introductory Econometrics website, http://wps.aw.com/aw_stock_ie_2/50/13016/3332253.cw/index.html, a detailed description of their research methodology and variables can be found in this link http://wps.aw.com/wps/media/objects/3254/3332253/datasets2e/datasets/Names_Description.pdf.

The data includes 4,870 resume submissions including qualitative variables about the resumes, the employer’s add, and whether or not there was a follow up call from the employer on the resume.  This post will use this data to answer the following  three questions:

  1. What is the call-back rate for whites?  For African American?  Is the difference statistically significant?  Is it large in a real world sense?
  2. Is the African American/white call -back rate differential different for men than for women?
  3. What is the difference in call-back rates for high-quality versus low-quality resumes?  What is the high quality/low quality difference for white applicants? For African American applicants? Are these differences statistically significant?

What is the call-back rate for whites?  For African American?  Is the difference statistically significant?  Is it large in a real world sense?

The average call back rates for applicants with white sounding names was about 9.65% with a standard deviation of 29.53%.  The 95% confidence interval for the average call back rates for applicants with white sounding names is between -48.22% and 67.52%, or essentially between 0 and 67.52% since a negative probability doesn’t have a mathematical interpretation.

 

Applicants with black sounding names weren’t called back as often, only 6.45% of the time with a standard deviation of 24.56%.  The 95% confidence interval for a call-back for a black applicant is between 0 and 54.57%.  The differences between the call-back rate applicants with white and black soundings names are statistically significant, despite the overlap.  In a real world sense, the probability of getting called back if the applicant had a white-name is nearly 50% greater than if the applicant had a white name.

 

Is the African American/white call -back rate differential different for men than for women?

 

The table above ha the call-back rate for males with white sounding names and those with black sounding names.  The difference in call-back rates favors those males with white sounding names by about about 3.04%.

 

The table above compares the call-back rate for females with white sounding names to those with black sounding names.  The difference favors the white sounding name by about 3.26% compared to 3.04% in the previous table for men,  providing some evidence that the penalty for having a black name is greater for women than for men.

 

What is the difference in call-back rates for high-quality versus low-quality resumes?  What is the high quality/low quality difference for white applicants? For African American applicants? Are these differences statistically significant?

 

Applicants with a high quality resume who had white sounding names were likely to get called for an interview 10.79%, but these same high-quality resumes were discarded more often if you had a black sounding name, as evidenced by the 6.70% call back rate for people in this category. The difference is statistically significant.

 

Regression Results

Finally, a Probit regression also demonstrates similar patterns, and also adds a couple of more dimensions.  After controlling for the number of of jobs on a resume, years of experience, special skills listed, college education and special items on a resume such as military, volunteer work, and honors people with black sounding names were 3% less likely to be called for a job interview than similar applicants with white sounding names.  The results is statistically significant and supports the findings by the University of Chicago about racial discrimination in the labor market based on the ethnicity of an applicants name.

 

Discrimination appears to be most apparent in business services and least in manufacturing and transportation, below are the marginal affect after a probit regression only on applicants who applied to jobs in the business service sector, where having a black sounding name, after controlling for resume specific factors, reduces the chances of being called into an interview by 3.6%: