SAS Programming: Summary Statistics and Correlations




The objective of this post is to introduce the SAS codes for summarizing data. These summaries will use the “proc univariate” , “proc means” ,  and “proc corr” commands. The data that will be imported for the purposes of this post will be the monthly civilian unemployment rate  and the percentage change in the money stock which will be obtained from the St. Louis Federal Reserve Bank’s research database (http://research.stlouisfed.org/fred2/graph/?id=M1SL#).  This post will add to the program generated in the previous post to bring some continuity to these writings and also to serve as a reference as the exercises get harder. The code that is in bold is what will be discussed in this posts.

IMPORTING DATA

The code above imports a CSV file that contains the monthly percentage change in the monetary base and the unemployment rate. An explanation of this code used to import can be found in this previous posts:  https://espin086.wordpress.com/2011/08/26/sas-programming-importing-csv-data/

SUMMARY STATISTICS


The “proc univariate” command will generate various summary statistics. In the command above the temporary data set “fed” is called to generate summary statistics for the % change in the monetary base and the unemployment rate.  The final command before the “run” statement ask that the title of the output reflect the variable names. This simple code generates a host of statistical measure about the data.  The tables below only show the summary statistics for the unemployment rate for the sake of brevity, but the code above also generates statistics for the % change in the monetary base.

The statistics include the moments, some basic statistical measures of central tendency and dispersion, hypothesis test for zero means, quantiles, and some information to that is useful in identifying outliers and missing data.

 

There is a command called “proc means” that will produce various hand-picked statistics from the table above along with other options.

The code above generates the number of observations, mean, standard deviation, minimum and maximum values for both the monetary base and the unemployment rate seen in the table below.
CORRELATIONS

The next set of codes generates some statistics about the correlation between two variables.  In addition to the correlation statistics summary statistics similar to those generated by the proc means procedure are also part of the output.


CONCLUSION

The “proc univariate” and “proc means” command are helpful in summarizing data. The more comprehensive of the two commands, “proc univariate”, can be useful in identifying missing variables and outliers which can cause problems in statistical analysis.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s