### SAS Programming: Summary Statistics and Correlations The objective of this post is to introduce the SAS codes for summarizing data. These summaries will use the “proc univariate” , “proc means” ,  and “proc corr” commands. The data that will be imported for the purposes of this post will be the monthly civilian unemployment rate  and the percentage change in the money stock which will be obtained from the St. Louis Federal Reserve Bank’s research database (http://research.stlouisfed.org/fred2/graph/?id=M1SL#).  This post will add to the program generated in the previous post to bring some continuity to these writings and also to serve as a reference as the exercises get harder. The code that is in bold is what will be discussed in this posts.

IMPORTING DATA The code above imports a CSV file that contains the monthly percentage change in the monetary base and the unemployment rate. An explanation of this code used to import can be found in this previous posts:  https://espin086.wordpress.com/2011/08/26/sas-programming-importing-csv-data/

SUMMARY STATISTICS

The “proc univariate” command will generate various summary statistics. In the command above the temporary data set “fed” is called to generate summary statistics for the % change in the monetary base and the unemployment rate.  The final command before the “run” statement ask that the title of the output reflect the variable names. This simple code generates a host of statistical measure about the data.  The tables below only show the summary statistics for the unemployment rate for the sake of brevity, but the code above also generates statistics for the % change in the monetary base.

The statistics include the moments, some basic statistical measures of central tendency and dispersion, hypothesis test for zero means, quantiles, and some information to that is useful in identifying outliers and missing data. The next set of codes generates some statistics about the correlation between two variables.  In addition to the correlation statistics summary statistics similar to those generated by the proc means procedure are also part of the output.

CONCLUSION

The “proc univariate” and “proc means” command are helpful in summarizing data. The more comprehensive of the two commands, “proc univariate”, can be useful in identifying missing variables and outliers which can cause problems in statistical analysis.