What Deters Crimes?: Dealing with Arbitrary Forms of Serial Correlation and Heteroskedasticity

Crime is imposes several different cost of society.  Besides the untold human cost of changed lives and property damages, the resources used to deter crimes and enforce laws can add up to a substantial cost to society.  Understanding what factors reduce crime can channel societies resources into their most effective use in reducing crime.  This post will analyze the impact that the probability of arrest, probability of conviction,  probability of receiving a prison sentence, the average prison sentence and the number of police per capita have on crime rates.  All of these variables account for the expected cost to criminals of engaging in their activities and as a society, we try and increase these cost while still holding on to a sense of just conviction and punishment.  So, which one of these factors has a greater impact at reducing crime?

Using panel data and differencing with more than two time periods this post calculates a regression model that is robust to serial correlation and arbitrary forms of heteroskedasticity.  The data comes from an introductory econometrics text book and is a real data set that was used by researchers. The data contains data from over 90 counties with data from over 7 years, from 1981 to 1987, containing several variables useful for a statistical analysis on crime deterrence.

Differencing with More Than Two Periods:  STATA based econometric analysis

1)  Specify panel data with STATA by specifying the individual and time variables

2) Estimate the elasticity of crime rates relative to the deterrence variables including dummy variables with the base year as 1982

Description of Variables

  • prbarr = probability of arrest
  • prbconv = probability of conviction
  • prbpris = probability of prison sentence
  • avse = average prison sentence in days
  • polpc = number of police per capita

First Difference Model as OLS Regression

An “l” in front of the variables above means that the variable has been transformed by taking the natural logarithm.  Placing a “d.” in front of a variable tells STATA to take the first difference of this variable.


Notice that a 10% increase in the probability of arrest decrease the per capita crime rate by 32.7%.  The order in which deterrence variables reducing crime in descending order are the probability of arrest, probability of conviction, probability of receiving a prison sentence, and the length of the average sentence.  Notice the positive coefficient on “lpolpc” is positive, there is a potential for reverse causation that is causing this perplexing results.  The result is perplexing is because the calculations seems to say that an increase in the police force would have an increase crime rates, but one would suspect that the causality runs the other way,  higher crime rates prompts a response from the community for more police on the streets.  Another problem that could be influencing the calculations is serial correlation because this data set deals with panel data with more than two time periods.

Test for Serial Correlation in the Error Term

Testing for serial correlation is important when dealing with panel data with more than two time periods.

The statistical significance of the coefficient on the lagged residuals is high.  This makes a strong case for serial correlations in the OLS estimate.  In order to correct for this serial correlation and heteroskedasticity is to use clustered standard errors.  These errors are clustered by individual, in this case by county.


Using Clustered Standard Errors to Eliminate Serial Correlation and Heterskedasticity

The estimates using clustered standard errors to control for serial correlation and heteroskedasticity have the same sign as the OLS estimates, but the coefficients have slightly changed as well as the standard errors.

According to the estimates above, the probability of arrest seems to have the greatest impact on reducing crime.  Surprisingly, the length of the average prison sentence is not statistically significant as a deterrent for crimes per capita, as expected, the probability of conviction and of getting a prison sentence reduces the crime rate per capita.  The final estimates are that increasing the probability of arrest by  by 10% is expected to reduce the crime rate by 3.2%.  A extension of this analysis is to account for the cost of these deterrence variables, and choose the level of deterrence where the marginal cost of the deterrence equals the marginal benefit of reducing the per capita crime rate.








Does a Strong Presence of College Students Affect Rental Rates?: Pooling Cross Sections Across Time

In a competitive environment, rents are determined by the intersection of supply and demand.  This post will analyze how, after controlling for average income and population in a city,  the percentage of colleges students in a city impact rental rates.  There are several ways one can think about how the relationship between college students and can manifest itself.  College students tend to be tough on the housing stock, you can imagine how maintenance can differ in a apartment with college students relative to retirees.   The potential is that this higher level of maintenance can be partially passed down to the students themselves in the form of higher rent or absorbed by the property owner in the form of lower profit. On the other hand, college students tend to be out of the labor force, and thus have lower earnings.  This analysis will control for that by using the average income per city as a control variable.  Low income can still manifest itself in another form, through the political process.  Students can join together and use the political process to enact rent control provisions in their city’s regulations.  This has happened in several college towns, including UC Berkeley and UCLA, and is another way in which college students can influence rental rates in a city.  One can think of several other arguments for the direction of the relationship between college students and rental rates in a city, so the question’s answer departs from a theoretical stanpoint and lends itself to empirical analysis.  The analysis conducted here suggest that a 10% increase in the percentage of college students increases rent in a city by about 11%.

The first regression that was estimated was a simple pooled OLS regression.  The model was improved by removing serial autocorrelation in the error term by taking the first difference of all variables then running OLS.  After first differencing, there was an opportunity to remove heteroskedasticity by using heteroskedasticity robust standard errors in the first differenced OLS model.

Pooling independent cross sections is data from different time periods placed in the same data set and used to analyze a problem.  A cross sectional data set is a sample from the population and if there are several of these pooled together then we have Pooled Cross Sections over Time.  The figure below shows how pooled cross sections should be organized for analysis on STATA from a data set on rental rates found in Introductory Econometrics:  A Modern Approach.

The variable “city” indicates  the number of a city under observation.  The variable “year” indicates the year the observation was made and the variable pop, for example, shows the population in during the observation.  One can see that the population in city one went from 75,211 in 1980 to 77,759 in 1990.  This way of organizing the data is typical and allows most statistical packages to handle calculations fairly easily.

Simple Panel Data Regression Methods

The regression above suggest that the relationship between college students and rental rates is positive.  A 10% increase in colleges students is expected to increase the rents in a city by about 5% even after controlling for average income, population, and a time trend in rents between 1980 and 1990. There can be a problem with this regression in the form or auto-correlation in the error term, thus first differencing the data would eliminate this issue, and results in the regression result below.

Notice that after using regressing using first differences one can see that the OLS on the level data underestimated how large the relationship between college students and rents are.  In our new model, after controlling for income and population, a 10% increase in college students increases rent by about 10% in a city. The final issue could be heteroskedasticity in the regression, so the final model uses heteroskedasticity robust standard errors, and finds that the a 10% increase in the percentage of college students increases rental rates by about 11%.  These results are shown in the final regression below: