SAS Programming: Generating Data with DO Loops



“Do Loops” in SAS allow one to complete repeated commands in a more efficient way.

Consider the following code that calculates the amount of interest earned on a thousand dollar investment paying a fixed interest rate after one year:

The do loop saved several lines of code and generated this dataset:

Note that the month variable has a value of thirteen.  This is because at the end of the loop the month variable is increased by one in the twelfth month, then the execution step checks the condition “do month = 1 to 12” which ceases to be true when month is thirteen.  Therefore the loop doesn’t execute on the thirteenth month but the variable is still read to the final dataset.  This can be taken care of by adding a counter variable called “counter” and dropping it at the end of the datastep.

loop now just generates the variable of interest and uses the variable “counter” as merely an internal tracking variable that is dropped at the end of the data execution step.

Notice how the code only generates the final output, that is the final iteration of the “do loop” is read.  One can also explicitly have SAS display the output of each “Do Loop” execution and the subsequent change in values. To do this you simply add an “output” statement before the end of the loop.

This code generates the following output:

One can also “nest do loops” to create even more interesting calculations.  Suppose that each year for 20 years the same amount of capital is added.  The following code ensures that the total value in the end accounts for the reinvestment amount and the monthly compounding of interest.   This code with its nested loop generates the total capital at the end of each year which includes reinvestment and interest earnings.

There are many instances when one needs to execute a loop an unknown number of times until a condition is meet.  If someone wanted to know how long it would take their investment to earn $50,000 given a certain annual investment and interest rate, then a “do until loop” is recommended.

One can take it one step further and specify and ‘either or’ condition to execute the loop.  Suppose a person wanted to invest for 10 years or until their capital was greater than or equal to $50,000.  This is how one would code these pair of conditions.

The code above stops executing after only two loops because the capital condition was meet.

The final application of ‘do loops’ featured in this post generates a random sample from a dataset. Using the “point” command along with a ‘do loop’ generates a sample of a master dataset.

SAS Programming: Summarizing Categorical Data with “Proc Freq”

The freq procedure summarizes categorical data.

Typically one wants to restrict the variables that appear in the proc freq to categorical data by using the “table” command.

In addition to listing variables separately on can used a numbered range of variables to summarize data.

In addition to single variable frequency tables SAS also generates two way tables to compare two values.  The following program formats data and then creates a frequency table for two variables.

Notice how in the output there is a legend box contains information about the fields within the table.

One can also create n-way tables that table more than two variables.  The following code creates two two-way tables by gender by using the command “tables sex*weight*height;” command.

One way to make the data more manageable is to use the list option.  This produces list that are more compact and may be more clear.

One can also limit the way the output looks by specifying the “crosslist” option

The output for only the female sex is shown here:

Finally, tables can be cleaned up by  restricting some of the table information with options like “nofreq”, “nopercent”, “nocol” or “norow”.  The following code only displays the percentage by restricting all other output.

SAS Programming: Computing Statistics Using “Proc Means”

The “proc means” procedure can be used to generate summary statistics for numerical data.   Typing the following code generates the count, mean, standard deviation, minimum and maximum values for the diabetes data set.

Note:  This dataset can be downloaded by clicking on “help” and selecting “learning sas programming” from the SAS dropdown menu.

The following code limits the output to the minimum and maximum values for the data in the “diabetes” data set.   There are many other keywords that specify statistics that can be displayed by the proc means procedure, but we will limit this example to a simple minimum and maximum.  The last option also limits the number of decimals that the output displays which cleans up the output.

There may be instances when one wants to select only specific variables to view.  The proc means command can restrict the number of variables through the “var” option.

In addition to listing variables you can use a numbered range of variables.

There are other times when one needs statistics for a group of observations instead of the entire dataset.  This can be accomplished by using the “class” statement in the means procedure.

Like the “class” statement, the “by” statement also groups statistics categorically. The only difference is that the “by” statement requires that the dataset be sorted with the sort procedure before running the means procedure.

The “class” statement is easier to use when categories contain only a few levels; the “by” command has the advantage when categorical data with many levels are to be summarized.

One an also created summarized datasets using the means procedure using the “output out” statement and specifying the statistics/names of the summarized variables in the data set.

Programming Excel VBA: Event Triggered Programs

Excel VBA is capable of delivering event triggered programs.  There are many different events that a developer can specify that will trigger a program:  opening a workbook, activating a sheet, saving a workbook, closing a workbook, day of the week, etc.  Writing and using event triggered programs can save much time an energy.  The objective of this post is to write an event driven program that opens up other related workbooks.  In my experience I have found that several related Excel workbooks are often opened up together to get a task accomplished.  For example a bookkeeper may want to open up the general ledger, income statement, balance sheet, and cash flow statement every month to generate the next set of financial records.  This repetitive task can involve digging through multiple files/folders.  One can quickly and easily program Excel to complete this tasks.  Obviously the possibilities are endless. One can use event triggered programs to:  print after closing a file, update certain fields on given days, format before printing, etc.


Open up Excel VBA by pressing Alt+F11.  In the drop down window select “Workbook” and “Open” to create a program that runs one the file is opened.

Note that you must write the Excel VBA program in a module under the file you want this program to appear.


It is often good practice to make sure you inform/ask the user of the program that is about to be run.  You might now want to open up a whole collection of financial spreadsheets, you may just want to take a peak at a particular transaction or report.  This can be accomplished by writing the program with an If-then statement where the condition tested is selected by the user through a message box.

If the user selects “yes” then the following code runs…

This is a quick and easy program that can be built upon to automate many of the repetitive tasks analyst working on Excel.

R programming: Getting GDP by State Data into R

R is a popular statistical package that is both powerful and free.  The objective of this post is to examine the different ways that one can import and store data in R.  Specifically this post will cover how to read in numeric and text data by 1) manually imputing the data, 2) reading the data as a vector with the “scan” function, 3) importing CSV files with the “read.table” function, and 4) exporting data in R into a text file.

The data will come from the Bureau of Economic Analysis and consist of total state level GDP for 2010.


For obvious reasons inputting the data manually would not be preferred, but there may be cases were the manual entry of data may be required.  The following R code inputs the data for the first few states (in alphabetical order) into R:

The first line of code with the “#” is ignored by R and merely comments on what the code is doing.  Comments can be useful reminders of why things were done in a program a certain way. The next line of code creates a vector called “gdp” that is used in conjunction with the scan function.  The scan function tells R that data is about to be imported into the “gdp” vector that was created.  Then the values are manually typed in for each observation.  Once we reach the final observation and hit return twice R gives use a message, “Read 3 items”, this shows that the data was read in correctly.  One can know use the vector “gdp” in a program.  The simplest command is to type “gdp” and hit enter; this command prints the data in the vector onto the screen. One can do the same for the names of the states with the following command:

In this case the data is not numeric so the (what=” “) function inside the scan function specifies that the vector “state” only contains character data.


In the code above a data set is created called “stategdp.csv” that is imported using the “read.csv” function.  This function is especially designed to read in comma separated variable files into R.  The parameter “header = T” indicates that the data contains headers for the variable names which in this case are “State” and “Year_2010”.


The code above exports the data state GDP data into a text file.  The default option for the “write.table” function is to export with the name of the rows and quotes for string characters.  It is very common not to want these either of these in the export.  The following code at the end of the file path accomplishes that task.

Quick PasteSpecial: Recording, Examining, and Testing VBA Macro Code

The objective of this post is to introduce the powerful programming language used in Microsoft application called VBA.  More specifically this post will introduce the “macro recorder” that can replay steps taken while the recorder is on.  The recorder is a great way to familiarize oneself with VBA code.  However, further post will build on the understanding of the VBA language which can be used in conjunction with the recorder to create useful programs.  These programs can quickly and efficiently complete repetitive task without the risk of human error, besides the risk that is present in writing a program with errors.  Once a developer has an error free program it can be leveraged time and time again to do anything that a person can do with a mouse and keyboard but with all the advantages of having a program execute the steps instead of a person.

The steps below will convert selected formulas to their current values.


The following formulas were typed into an excel spreadsheet,

which yield the following values.

One useful set of procedures that Excel users is to select a range of cells with formulas and then use the past special command to change the selected range to values.  The following steps outline how to record these steps in an Excel macro.

1.  Select the range of cells that contain the formulas.

2. View – > Macro – > Record New Macro.

3. Enter the name for the macro, enter shift+C as the shortcut key combo, and click OK.

4. Do a “copy and paste special” on the values like you normally would on Excel.

5. View – > Macros – > Stop Recording.

The steps above created an Excel VBA macro.  One can test the macros by creating new formulas on the sheet, selecting them, and pressing the Ctrl+Shift+C shortcut specified in the steps above.


The steps above actually recorded some Excel VBA code in the form of a macros.  One can see and modify this code by following these steps:

1)  Hit Alt+F11 to bring up the Visual Basic Editor (VBE).

2) Select the project that corresponds to the workbook in which you recorded the macro, if you haven’s saved the workbook it should be in this location called Module 1:

3. Click on the “Module 1” icon above to get the see the Excel VBA macros.

The code created a macro called “ConvertFormulas” which takes no inputs.  The text in green is commented out with an apostrophe.  Comments can include some useful information, but for the sake of the program they are ignored.  Next, the second line of code copies whatever cells are selected.  The following three lines of code executes  the paste special commands with the numerous options.  Finally the last line of code before the “End Sub” line deselects the selected data.

Next, a line of code will be inserted that will prompt the user to confirm that they in fact want to run this macros, before the program executes and steps.  This is a great way to ensure macros are not accidentally run or that an incorrect program is not called by the user.  Finally, the last step will be to add the name of the author and date to the macros was created to the program.

The code prompting the user is inserted right before the program starts and outputs this message box before executing the ConvertFormulas subroutine.

Subsequent post will examine the syntax more thoroughly and develop more complex problems.