SAS Programming:Multiple Regression Analysis and Racial Discrimination in Soda Prices

This objective of this post is to quantify if there is a difference in soda prices in areas with a high concentration of blacks.  This analysis uses the data from a paper titled, “Do Fast-Food Chains Price Discriminate on the Race and Income Characteristics of an Area?” which appeared in the Journal of Business and Economic Statistics.  It consists of 409 zip-code level observations in New Jersey and Pennsylvania.  There are several variables representing race, income, competition and marginal costs variables in the data set that will be used to control for socioeconomic conditions which may be driving price differences. The multiple variable regression will be similar to that  found in the paper.  The results show that blacks are charged higher prices for soda in New Jersey and Pennsylvania in fast food restaurants even after controlling for income, competition, and cost structure differences across zip – codes .  These differences may be attributable to factors such as price elasticity, competition, and heterogeneous cost structures that are correlated with the proportion of blacks in a zip – code. However, one cannot rule out the case that there may be  systematic racial discrimination in the price of sodas in fast food restaurants, according to the analysis presented in the paper.  The following statistics and regression analysis is representative of those found in the research paper, although simplified a bit for the sake of brevity.


The following SAS code uses the “proc means” command to summarize two of the important socioeconomic variables that are going to be used in this analysis:  the percentage of blacks and the median wage earned per zip code.

The mean percentage of black in the data set are 11.3% with a low of 0 to a high of 98.2%.  The median family income is $47K a year with a standard deviation of 13K.


The regression equation controls for median income, proportion under poverty, density, and the crime rate per zip code.The variables in the model above are at the zip code level and their descriptions are below:

  • lsoda = natural logarithm for the price of soda.
  • prpblck = proportion of the black population
  • income = median income
  • propov = proportion under the poverty line
  • ldensity = natural logarithm of density
  • crmrte = crime rate

Using the “proc reg” code in SAS this model can be easily solved:

The SAS code above produces the tables below that describe the model and estimate the coefficients of the regression:

The “Parameter Estimates” above show that the coefficient estimate for “prpblck” is .065 and is statistically significant after controlling for other socioeconomic variables.  This means that if the proportion of blacks increases in a zip-code by 50% (50 percentage points) then one can expect to see soda prices increase by approximately 3.25%.  This is consistent with the results found in “Do Fast-Food Chains Price Discriminate on the Race and Income Characteristics of an Area?”.  The conclusion of the paper ends with this statement, warning of possible short-comings of this analysis and direction for further research:

“These results need not be evidence of discrimination but may reflect unmeasured cost differences across areas that are correlated with the proportion of the population that is black.  Explanations based on price discrimination, either due to differences in elasticities, differences in competition, or a taste for discrimination, cannot be excluded however.  More research into the incidence, cause and effects of price differences that are correlated with race would appear to be warranted.”