SAS Programming: Combining Data Sets and Subsetting with the “IN=” Option

There are times when a merger or two data sets requires further filtering to get to the information that is required.  The “IN=” option creates a tag for a data set that can be used to further filter after merging two data sets together.  This filtering is done with an “IF” statement.  This can be very useful when drilling down to investigate or summarize a set of data.

This post will merge together two data sets: 1) artificial derivatives sales data and 2) customer information.  Then the “IN=” and “IF” command combination will be used to compile a list of clients that have no derivatives sales data for the period under observation.  This can be used by a broker to identify which clients may need a little nudging or special attention since they have not purchased any derivatives recently. The data set is artificial and used for instructional purchases only.


The code above imports a CSV file with customer records and formats the names and phone numbers as strings with a field range of 100.  Then the “sort” command is used to sort by customer number which is the common variable between this data set and one that is imported with the code below.

Now that both data sets have been imported into SAS and sorted by “CustomerNumber”.  The following commands will create a new data set by merging the customer and sales files by the common variable “CustomerNumber”.  However, when the sales file is identified in the “merge” command the “IN=” option is added.  The command “( IN = recent)” tags the sales data with an indicator variable.  Then the “if” command tells SAS to merge the data sets and select only observations where the value for “recent” equals zero.

Armed with this list the broker can begin calling her clients to inquired on their drop off in derivatives purchases.