SAS Programming: Generating Data with DO Loops


“Do Loops” in SAS allow one to complete repeated commands in a more efficient way.

Consider the following code that calculates the amount of interest earned on a thousand dollar investment paying a fixed interest rate after one year:

The do loop saved several lines of code and generated this dataset:

Note that the month variable has a value of thirteen.  This is because at the end of the loop the month variable is increased by one in the twelfth month, then the execution step checks the condition “do month = 1 to 12” which ceases to be true when month is thirteen.  Therefore the loop doesn’t execute on the thirteenth month but the variable is still read to the final dataset.  This can be taken care of by adding a counter variable called “counter” and dropping it at the end of the datastep.

loop now just generates the variable of interest and uses the variable “counter” as merely an internal tracking variable that is dropped at the end of the data execution step.

Notice how the code only generates the final output, that is the final iteration of the “do loop” is read.  One can also explicitly have SAS display the output of each “Do Loop” execution and the subsequent change in values. To do this you simply add an “output” statement before the end of the loop.

This code generates the following output:

One can also “nest do loops” to create even more interesting calculations.  Suppose that each year for 20 years the same amount of capital is added.  The following code ensures that the total value in the end accounts for the reinvestment amount and the monthly compounding of interest.   This code with its nested loop generates the total capital at the end of each year which includes reinvestment and interest earnings.

There are many instances when one needs to execute a loop an unknown number of times until a condition is meet.  If someone wanted to know how long it would take their investment to earn $50,000 given a certain annual investment and interest rate, then a “do until loop” is recommended.

One can take it one step further and specify and ‘either or’ condition to execute the loop.  Suppose a person wanted to invest for 10 years or until their capital was greater than or equal to $50,000.  This is how one would code these pair of conditions.

The code above stops executing after only two loops because the capital condition was meet.

The final application of ‘do loops’ featured in this post generates a random sample from a dataset. Using the “point” command along with a ‘do loop’ generates a sample of a master dataset.