Clustering to Improve Merchandise Allocation, Testing, and Forecasting: An Application of the K-Medians Algorithm


Localization of merchandise assortment has become popular among retailers in recent years.  The recognition that store population’s heterogeneity is an important consideration in the testing, allocating and pricing of merchandise  has boosted the profits when the theory is correctly put to practice.  The issues concerning localization revolve around what stores should be grouped together?  What variables and methodology should be used to group stores to ensure the minimization of differences for stores within a group for all groups created?

Dr. Fisher and Dr. Rajaram from the University of Pennsylvania and UCLA found that using a k-medians clustering methodology based on sales, minimized testing and stock out costs and improved the accuracy of chain-wide forecasts based on small store sample merchandise testing.   The paper they wrote appeared in the Journal of Marketing Science Vol. 19, No. 3, in the Summer of 2000 and its titled, “Accurate Retail Testing of Fashion Merchandise:  Methodology and Application”. In their research they found that when it came to merchandise testing, clustering based strictly on sales was superior to clustering based on sales, location, average temperature, and ethnicity of of the neighborhood in the stores general area.  Less variables was more when it came to clustering for merchandise testing and forecasting.   Although, they did find that from all of these descriptor variables, climate proved to be an important factor in assessing merchandise testing, but not significant enough to beat out clustering based strictly on sales.  This analysis was conducted on a national retail chain that specialized in women’s specialty apparel and two other national shoe retailers with over 100o stores; one of the shoe retailers was Nine West.

The objective of this post is to use k-medians clustering to segment stores into volume groups.  Volume groups that divide store populations based on sales or projected sales data to drive the allocation of merchandise.  The type of k-medians clustering used in this post will be slightly different that that used by Fisher in Rajarma.  The objective of their clustering was to find stores to conduct tests, so they clustered stores in fairly even groups much like the picture above.  Clustering used for allocation should not have an equal number of stores, but instead create clusters that identify the differences and similarities among stores with an emphasis on the top performing stores.    If there is a high performing store that is a large outlier then it should be in a group by itself, luckily STATA has an option for k-medians clustering that does this kind of grouping after doing some simple sorting.


The objective of k-median clustering is to partition data into k-clusters which are less than or equal to the n observations to minimize the within cluster sum of squares for every k cluster created.  In the retail example, there are n-observations of d-dimensional vectors would be analogous to saying there are d-store descriptors (climate, sales, ethnicity in location…) for a fleet of n (or 1000) stores.


The Greek letter “mu” above represents the median of each cluster.  The internal sum represent the sum of squares of the difference between observation x in cluster s and the median of cluster s.  The outer sum indicates that the sum for each cluster  from i to k is totaled to get a single number which the is be minimized.


1. Assignment Step

The initial step is to assign the initial k-medians to the data.  First the sales data are ranked from highest to lowest then the initial k-medians are assigned to the top k stores.  In the assignment step each observation is grouped according to a cluster based on its distance to the median at time t, this means that in the first step all stores are grouped to the kth median:

2.  Update Median Step

Once an every observation x has been assigned the kth median which minimizes the distance between the  observations and medians in each potential cluster.  The next step is to update and calculate a new median from the newly formed clusters, which mathematically is equivalent to:

The bars around the cluster S indexed with i at time t represent the norm of the vector.  In the case of a one dimensional vector this would be analogous to the sum of the variable used to cluster divided by the number of observations within the cluster.  The algorithm would then calculate the objective function and repeat this process until the minimization problem is solved.  This problem only has a solution if the cluster sum of squares is convergent to a minimum value. In orther words there should be no cycling or saddle points that throw a wrench in the optimization algorithm.


Using an ARIMA forecast of a national retailers projected unit sales a k-medians clustering will be devised that minimizes the difference among the stores in the group. This clustering can be used to allocate inventories across the nation to reduce the markdown cost related to overstocking and the missed sales from stockouts.

1) First copy and past store level data (sorted from largest to smallest)  into STATA when go to the following drop down menus Statistics->Multivariate Analysis->Cluster Analysis->Cluster data->k-medians (not shown)

2) Then you will be presented with a screen like the one shown below;  select the sales variable which in this example is named annualized retail 2009.  Next choose “k” which corresponds to the number of clusters that are required, in this case the data will be clustered into 7 groups. Leave the (Dis)similarity measure as Euclidean since that is what the researchers did in the paper “Accurate Retail Testing of Fashion Merchandise:  Methodology and Application”. Hit the Options button…

3) In the options tab, select “First K observations”, since the data is sorted in ascending (descending) order the natural breaks and grouping of stores will occur by selecting this option. Then hit “OK”…

STATA then breaks up the stores into neat clusters based on minimizing the dissimilarity withing groups while adhering to the natural breaks in the data.  The table below summarize the number of stores per cluster the mean, standard deviation, minimum and maximum annualized retail volume of stores within the clusters.