The choice to smoke is a personal one for many smokers, but there are factors that can be useful in predicting how likely a person is to smoke. Probit and Logit regression are ideal for finding correlations when you have a binary dependent variable, such as smoker or non-smoker. The explanatory variables are going to be income in $10,000, age, age squared (to account for diminishing returns), education, and whether or not a person is white or not.

In these models the tentative conclusion is that age and education are the most statistically significant variables in determining the probability that a person smokes. For every year after high school that a person has the probability of being a smoker is reduced by 3.16%. Age increases the probability that a person is a smoker, but at a decreasing rate. For the average person in the sample whose age was 43 an extra year of life reduces the probability of being a smoker by 2.3%. The highest probability of a person being a smoker occurs at age 34. Younger people tend to smoke and as people age smokers are removed from the population. Income and whether or not a person is white reduces the probability of being a smoker, but only in small and statistically insignificant ways.

**Probit Model (Based on the Cumulative Normal Distribution)
**

**Figure 1:** Probit regression in non-linear form

**Figure2: **Marginal changes in probability given explanatory variables for the average person using the probit model. Note the average persons variables are interpreted in the x column in the end.

**Logit Model (Based on the Cumulative Logistic Distribution)**

**Figure 3**: Probit regression in non-linear form

**Figure 4:** Marginal probability estimated for the average person, based on explanatory values.

**Concluding Remarks**

Probit and Logit models estimate probabilities at a point on the curve. This point is the mean for all variables, or the average person based on the explanatory variables. The models are similar in their coefficient estimates and in their determination of statistical significance. The most important variables which increase the likelihood of a person being a smoker are education and age.