Optimizing Marketing Investment to Reach Communication Goals


The code and data for the marketing optimization found below can be found on my GitHub account by clicking here.

Useful books for understanding material: 


There are different ways to answer the question of how to optimize marketing budgets. The goal of this post is to explain how to minimize advertising investment, given a minimum communication goal for a given set of target populations. This post will leverage a constrained optimization framework to answer a common marketing problem, namely: how can we minimize the marketing investment required and still reach our communication goals? Linear Programming and the Simplex Algorithm will be used to answer these marketing questions.


ScreenHunter_95 Nov. 17 13.34

The data above represent the media channels available for the marketing campaign: Television and Magazines. The reach of each one unit of advertising per media channel (e.g., one unit of TV reaches 5 million Boys, 1 million women, and 3 million Men). The unit cost of each media channel (e.g., TV 600 and Magazine 500) and finally the marketing targets for the advertised product in millions of people (e.g., 24 million Boys).

R reads these data from a Google sheet.

Optimization Model

The following questions represent a standard linear programming model specification, which is similar to the specification we plan on using in the empirical calculations in this post:

Linear Function to be maximized

ScreenHunter_104 Nov. 17 15.23

The code is also in the Github repository. What follows is an explanation of this code, which solves the marketing problem described above.We are importing the data into R using the RCurl library and processing using the foreign library.

ScreenHunter_96 Nov. 17 13.59

Objective function and Constraints

ScreenHunter_97 Nov. 17 14.28

Solving the Linear Programming Problem

ScreenHunter_99 Nov. 17 14.31

Marketing Recommendations

The optimal solution is one that hits the target audience at the lowest costs. The algorithm recommends investing in 2.7 units of Television and 5.3 units of Magazine advertisement to hit the marketing goals of reaching at least 24 million Boys, 18 million women, and 24 million Men. The total cost of this marketing campaign is $4,266, note that this is the minimum costs associated with the cost-minimizing allocation.

ScreenHunter_100 Nov. 17 14.34


ScreenHunter_101 Nov. 17 14.36

How many people did this marketing campaign reach? Recall that the targets were a minimum requirement per target audience, so what is the real reach?

ScreenHunter_102 Nov. 17 14.38

The minimum communication goal of the campaign was reached precisely, 24 million Boys and 24 million Men. However, 34 million Women were reached, with this marketing plan, when the minimum communication goal was only 18 million Women reached. The reason is that the optimization has to simultaneously reach all the targets and do it at a minimum cost, having said that one can rest assured that this is the cheapest way of reaching the communication goals.

Ideas for Extending this Analysis

Many more women were reached by the campaign than the communication goal intended. Adding additional marketing tactics besides Television and Magazines, especially ones that are exceptionally efficient at targeting women, will most likely hit all the targets at a lower price.

Time is undoubtedly a factor in marketing effectiveness; here is a previous post on measuring marketing effectiveness. Understanding not only how many people were reached but also how effective Television and Magazines are at different time horizons would likely improve this analysis.

This optimization does not take into account the non-linear or synergistic effects of marketing, which again adds complexity but is undoubtedly worth exploring.

Expanding the marketing goals to include not only the reach but the frequency as one is likely to hit the same people multiple times via Television and Magazines. The rate of exposure to advertising has increased ad recall and brand awareness in several academic studies.

Despite all the limitations to this approach, it still provides a mathematically, precise way of creating a marketing budget that meets a set of specific goals. It is an excellent place to start introducing rigorous and proven algorithms to answer some fundamental marketing questions.

Useful Books:  


Screening Stocks Based on Value & Optimizing Portfolio to Minimize Variance


The goal of this post is to introduce Fundamental Stock Analysis, specifically this post will focus on introducing key financial, operational, and equity based measures to select a handful of stocks out of thousands. The selection process aims to find a small group of stocks that should be considered as invest-able based on their fundamental performance.

We identify healthy companies whose stocks price is consistent and offers potential for security and growth by using the rules outlined in the book “Computational Finance” by Argimiro Arratia which is based on previous work on the topic conducted by Graham’s work from 1973. Graham’s rules have been adjusted adjusted for today’s financial climate (e.adjusted for inflation)

1) Adequate size of enterprise: The recommendation is to exclude companies with low revenues, consider only companies with more than $1.5 billion in revenue.

2) Strong financial condition: Use the current ratio (current assets/current liabilities) to eliminate companies who are in a weak short-term financial condition, consider only companies with a current ration of 2 or greater.

3) Earnings stability: Consider only companies with positive earnings in each of the past 10 years.

4) Dividend record: Consider only companies with uninterrupted payments of dividends for at least the past 20 years.

5) Earnings growth: Invest in companies that have growth rates of  3% or higher in earnings per share (EPS) over the past 10 years.

6) Price-to-Earning ratio: Purchase stock if the stock is adequately priced, a good range for a P/E ratio is 10-15, beware of stocks priced too cheap or too expensive relative to earnings.

7) Price-to-Book ratio: The price-to-book ratio should be no more than 1.5.

Using these criteria at the time of this post and leveraging Google Stock Screener as the filtering mechanism we have only 5 stock that meet these strict criteria for investable equities.

FCX, HP, HFC, RS, and TS

Once we have narrowed down our choices to these strong companies we must allocate our funds in a way that makes the most sense. One way to allocate funds in these stocks is to purchase the portfolio that minimizes variance (risk), this is called the Minimum Variance Portfolio and was the subject of a previous post:


Updating the code in the post above to include the ticker symbols for the 5 strong companies and running the algorithm yields the optimal allocation if one wants to minimize risk while investing in a strong portfolio of stocks:.

ScreenHunter_393 Dec. 13 18.06

ScreenHunter_394 Dec. 13 18.09

Minimizing Risk in a Portfolio of Assets

goldfish jumping out of the water


There are many instances in business where a portfolio of assets must be evaluated in terms of risk and rewards.  The key questions may be:

“How much should we invest?”

“What should we not invest in?”

“What is the risk of different budget allocations and what are the expected rewards?”

“What is the optimum allocation if we want to minimize risk?”


Similarly the concept of an asset portfolio can take the form of:

Assortment of clothing for a retailer

Chargebacks for a credit card processor

Scholarship recipients

Movies for a Hollywood studio

Collection of stocks


The objective of this post is to introduce the concept of the Minimum Variance Portfolio.  The Minimum Variance Portfolio is an optimum allocation of funds across risky assets where the risk (variance) is minimized in the optimization.  The simplest example would be a 2 asset portfolio, such as a portfolio consisting of an ice cream shop businesses and a coffee shop businesses.  In this scenario, during the summer people will buy more ice cream but coffee sales will be lower during warm temperatures but during winter the opposite will be true.If the mix of stores in this portfolio is chosen in a way to reduce variance in revenue due to weather, it is theoretically possible to hedge against the risk of weather.  Basically, if one chooses the right number of coffee and ice cream store to minimize revenue risk, the weather risk is also minimized or reduced so revenue is the same regardless of weather.

In order to understand the risks of a portfolio of ice cream and coffee stores we need to understand the variability of each business individually. However in a portfolio situation understanding the individual risks isn’t enough, we also need to understand how the sales of ice cream and sales of coffee shops are correlated with each other.  This is the concept of the variance of a portfolio, it is the big picture few of variability of a collection of assets.  Once the portfolio variance is understood and quantified, the next step may be to minimize this portfolio variance to hedge against risk. One can extend the concept of portfolio variance beyond 2 assets to any number of assets.


The following example consist of 2 assets to illustrate the concept of minimizing the variance of a portfolio of assets, here is the formula that describes the expected portfolio return and the portfolio variance that we are trying to minimize.

ScreenHunter_317 Nov. 08 13.45

To minimize the portfolio one would need to set up a Lagrangian Optimization problem with the constraint being that the weights of the investment sum to 1, to return a budget with the % of funds are allocated in a meaningful way.  This is a normalization technique, but if necessary one could change the weights to represent a budget constraint of say $1,000, but the results will be the same in terms of % allocated, so we will stick to this convention.

ScreenHunter_318 Nov. 08 13.57


The following R code solves the problem above by downloading data from the web and running a quadratic programming problem that solves the Lagrangian optimization problem above to return the Minimum Variance Portfolio, click the images to see a larger view or click here to download the code.

ScreenHunter_320 Nov. 08 14.01

ScreenHunter_321 Nov. 08 14.03


The optimum allocation based on the minimum variance portfolio is

ScreenHunter_322 Nov. 08 14.09

ScreenHunter_324 Nov. 08 14.12

The expected annual return of this allocation is 22%.  Note that past performance does not guarantee future performance.

Measuring Marketing Effectiveness: Cobb-Douglas Production Functions


Introduction, Data, and Program

Measuring the effectiveness of a marketing channel is difficult due to the large amount of variables and other confounding factors. The field of Marketing Mix Modelling was first developed by econometricians to accurately estimate the impact of marketing on consumer packaged goods, since manufacturers of those goods had access to good data on sales and marketing support.

This post is going to use concepts from microeconomics and econometrics to understand the effectiveness of Television (TV), Newspaper, and Radio on the sales of a good. These data come from the the textbook “An Introduction to Statistical Learning with Applications in R”.  I have provided these data along with the R program used to derive the marketing estimates derived in this post, please see the links below:


Market Mix Modelling R Program


Marketing Production Function

Production functions are used in economics to model the relationship between inputs and outputs.  Production functions are very flexible and have been used in various branches of economics.  Agricultural economists use production function to model how different inputs effect crop yields, educational production functions have been used to model how different classroom inputs effect children’s learning, and macroeconomists have used production functions to understand how labor and capital inputs effect the total national output. I’m going to use a production function to model how different marketing inputs effect sales, per the following equation:


ScreenHunter_196 Sep. 03 08.41


The majority of inputs that go into production experience diminishing marginal returns, therefore I take the multiplicative form of the production function and take natural logarithms to both sides of the equation.  This is the famous translog equation. The translog equation has the nice property of converting the multiplicative form of the production function into a linear model that can be estimated using Ordinary Least Squares (OLS Regression). Another nice property of the translog equation is that the coefficient (betas)  or a regression analysis can be interpreted as elasticities.  Elasticities are measures of % change in the outcome variable (sales) as a result of a % change in one of the input marketing variables.  The stand alone variable ‘alpha’ captures all non-marketing variables that effect sales, this is called the baseline in Marketing Mix Modelling (MMM).  In this post I will not use other explanatory variables (store traffic, seasonality, other promotions, etc.) to keep things simple, but a robust analysis of the effectiveness of marketing should include additional variables to control for these factors.


Statistical Estimates


The simple set of scatter plots show that television appears to have the strongest impact on sales.  Radio has a modest effect on sales, but newspaper appears to be weakly correlated with sales. The data also support the notion of marginal diminishing returns, which further motivates the logarithmic transformation of the production function.


Nespaper Radio Tv


Scatter plots can only reveal so much to do a proper analysis economists use econometric estimates of the translog function described above.  This ensures that we are controlling for other factors when measuring the impact of each of the marketing variables, and is shown below:

ScreenHunter_194 Sep. 02 16.26


The results shows that for every 1% increase in TV advertising you’d expect to get a .34% increase in sales.  A 1% increase in the Newspaper budget only increases sales by .01%, the smallest of all the elasticity estimates.  A 1% increase in the Radio budget accounts for a modest .17 increase in sales.

How much should the company spend on each form of advertising? The data in this example doesn’t show how many impressions or people where reached with the money spent.  In order to provide a proper optimal allocation one would need to know the cost-per-impression, but assuming the cost per impression (CPM) are constant one can simply take the ratio of each elasticity relative to the sum of all elasticities to come up with the optimal marketing mix.


ScreenHunter_195 Sep. 02 16.37

A following post will do a proper optimization using Lagragian Optimization of the production function, which will take into account the total cost of advertising on each channel.

2013 in review

The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 50,000 times in 2013. If it were a concert at Sydney Opera House, it would take about 19 sold-out performances for that many people to see it.

Click here to see the complete report.

Outliers: Statistically detecting influential observations in R.

Isolation 5

One of the difficulties in accessing the quality of an econometric or regression models is determining if any of the key regression assumptions have been violated.  Regression analysis contains several key assumptions in order for the results to actually be in accordance with reality.  In regression analysis one is trying to measure the impact of certain variables on an outcome that we are interested in understanding or influencing. In order to determine with a fair degree of accuracy how strong these relationships are a few assumptions must be made.  If any of these assumptions are violated then the precision of the estimates can come into question.  The goal of this posts is to explain what these assumptions are and most importantly how to test and potentially correct violated regression assumptions to obtain the most accurate measure of the phenomenon we are trying to measure. In particular, this post will focus on outliers, subsequent post will address other issues that can arise in regression analysis.


Outliers are observations that have a particularly large influence on the mean or average of numbers.  Regression after all is just an algorithm for estimating the conditional mean or the average impact of one variable on another.  Typically, when people speak of outliers they are talking about a one dimensional outlier, for example a really high priced home. However, regression analysis is a multidimensional in nature, so a home being really high priced might not be an issue given the number of bedrooms, bathrooms, location, neighborhood amenities, etc.  An economist talking about an outlier is referring to a value, that even after accounting for major driving factors, is still inexplicably large or small. It could be because of a data error or simply just an anomaly, here is how one can test for outliers.

#Imports libraries necessary for analysis along with data on public school expenditures.

ps <- na.omit(PublicSchools)

#One way of quickly examining outliers is to plot a scatter plot. 

plot(Expenditure ~ Income, data = ps, ylim = c(230, 830))
ps_lm <- lm(Expenditure ~ Income,data = ps)

Outlier Scatter Plot

#This method may work fine with a simple regression, but if you have a multiple regression then plotting is less useful.  The alternative is to plot standardized residuals and a statistic called leverage which measures the influence of a point on the slope of a regression line. One popular measure of influence is Cook’s Distance which is defined as.

Cooks Distance

This can be thought of as the sum of the squared difference in prediction of outcomes Y based on deleting observation i divided by the mean standard error of the regression multiplied by the number of parameters estimated in a model. In other words, the higher this number the more influential an observation is, and this is based on how much a model’s estimates change relative to how variable the regression estimates are naturally. Here is Cook’s distance graphed with the residual error term of a regression. Researchers have suggested several cutoff levels or upper limits as to what is the acceptable influence an observation should have before being considered an outlier.


#Graphing is nice, but what if there are millions of observations or you’d like to measure outliers in a different way? There are multiple ways of defining outliers and quantifying their influence.  Lucky for us R has built in functions to that can help us identify influential points using various statistics with one simple command.



#Using several measures one can see that Alaska is an outlier. It may skew the interpretation of the relationship between expenditure on education and income in a state.  Typically, one can demonstrate these statistics and report both a regression with all data included and one with the outliers removed.