.
A SAS program is a collection of SAS statements that may include keywords, various names (e.g., data sets, and variables), special characters, and operators. A SAS statement may be used in a DATA step, PROC (procedure) steps, or anywhere in a SAS program.
A SAS program consists of DATA steps and PROC (procedure) steps. DATA steps handle data sets, while PROC steps actually conduct analyses.
A DATA step is used to create or modify data sets by creating and modifying variables; checking and correcting errors in data sets; and writing programs (for simulations).
SAS has following basic rules.
SAS statements used in a DATA step are either executiable (e.g., DO, INPUT, INFILE, OUTPUT) or declarative (e.g., ARRAY, DATALINES, DROP, RETAIN).
SAS has arithmetic, relational, logical, and concaternation (||) operators. But SAS does not have the modulus operatior; the MOD function is used instead.
SAS has various functions for mathematics, statistics, string, date/time, probability, and randomization.
The OPTIONS statement changes the value of SAS system options that affect SAS system initialization, hardware and software interfacing, and the input, processing, and output of jobs and SAS files. See Chapter 8 of SAS Language Reference: Dictionary (1451-1647).
Let us consider a typical DATA Step example that reads an ASCII text file "tiger.dat".
The following example reads three varaibles directly from the data stream in the DATA step.
The next illustrates an example the DATALINES4 statement, which is needed when data lines contain semicolon (;).
What is a data set in SAS? A SAS data set is a group of data values that SAS creates and processes. It contains a table of observations (rows) and variables (columns) as well as descriptor information (e.g., variable names and formats). A SAS data set is often referred to a SAS data file. A SAS data view is a virtual data set of descriptor information that points to data from other sources.
SAS has a powerful feature of data manipulation that can handle various data sources such as ASCII text, database, and spreadsheet. You may type in data and directly read them using the DATALINES (or CARDS) statement.
SAS can read ASCII text files delimited with space, comma (CSV), tab, and other characters using the INPUT/INFILE/DATALINES statements in a DATA step. The INFILE statement also reads remote data files through the SAS/ACCESS using the TCP/IP, FTP, and URL protocols.
The IMPORT procedure can read these ASCII text files, but it can also import database (dBASE III, FoxPro, Access) and spreadsheet (Excel and Lotus 1-2-3) files. SAS/SQL (PROC SQL) allows you to connect those database and spreadsheet files through the ODBC (Open Database Connection).
SAS data sets may be generated by PROC steps. For example, the MEANS procedure can produce a data set with aggregate statistics and matrices may be transformed into data sets in SAS/IML. The following PROC REG saves the residuals and predicted values to "pew_work" that includes original variables in "jeeshim.pew2004" as well.
Finally, you can generate data using functions, in particular, random number generators in a DATA step.
A SAS Data library is an alias of the collection of data sets, thereby making data management more convenient and efficient. Like a directory or folder, a library tell SAS the place where data sets exist. Unlike a directory or folder, a library is not physical but logical in a sense that library itself does not exist in any secondary memory unit.
Every data set should be referred using a library in SAS, although the default library, .WORK, is often omitted. If you want to retrieve a data set by a point-and-click method, use the SAS Enterprise Guide.
The LIBNAME statement associates a SAS data library with a library reference (specific directory or folder). It declares which directory is to be referred to the library specified. Libraries should be declared before DATA steps and PROC steps.
If you want to use the default WORK library, you do not need to declare any library. However, you should know that data sets in the WORK library remain in the RAM (primary memory units), not in the secondary memory units (e.g., hard disks or memory sticks). If you want to store data sets into physical files, you must use your libraries.
The following LIBNAME statement declares a library "jeeshim" that is associated with c:\temp. A specific SAS data file is referred using a library name and a file name divided by a period. The "jeeshim.nes2004" indicates the "nes2004.sas7bdat" in the "jeeshim" library (c:\temp).
How do you know which data sets are included in a library? Use the CONTENTS or DATASETS procedures with a system variable _ALL_.
If you need to use specific libraries frequently, declare them in the autoexec.sas, an ASCII text file in the SAS root directory. SAS automatically executes statements in the file immediately after SAS is launched. Consider the following example.
The DATA statement begins a DATA step and provides data set names. The output of a DATA step is stored into the data set specified.
A SAS DATA step can creates more than one data set. The following example creates two data sets "WORK.egov1" and "WORK.egov2" from the "jeeshim.egov." The "gov1" and "gov2" in the WORK library are identical except that the "egov1" does not include variables "state" and "msa," and has a variable "id" whose name is changed from "respid."
If a data set name is omitted, the computer will automatically name eash successive data set as WORK.data1, WORK.data2, WORK.data3, and so on. These data sets, however, may consume computing resources and slow down the access and response speed.
If you want to use a DATA step only for transactions, you may use the _NULL_ in the DATA statement to enhance memory management efficiency. The _NULL_ tells SAS not to create any data set when it execute the DATA step.
How to select and delete some observations in a data set? The IF... THEN statement can do that for you..
The following example retrieves observations from a data set "jeeshim.pew2004"; selects only male observations (male=1) and discards female observations; and stores the result into a data set "WORK.pew_work." The IF statement may add the KEEP statement to get the identical result (IF male EQ 1 THEN KEEP;).
You may use the DELETE statement that works in the reverse way. This statement removes observations that meet the conditions provided.
The REMOVE statement following the MODIFY statement in a DATA step also delete observations.
You may also select observations by specifying a range of record numbers. Use the _N_, a SAS system variable, that contains the record numbers of observations.
The first 500 observations are saved into "WORK.pew_user," while remaining observations are put into "WORK.pew_nonuser."
You may try the WHERE statement, which selects observations from an existing data set without physically removing observerations that do not meet a condition.
The above data step reads only female (male=0) from jeeshim.pew2004 and then stores them into pew_female. Note that SAS checks if observations meet the condition when executing SET, MERGE, MODIFY, and UPDATE statements.
In a data step, WHERE cannot can used together with INFILE and DATALINES. In a procedure step, this statement limits observations used in analysis.
"WHERE it_use" means selecting observations whose values of it_use is not missing nor zero.
You can select variables using the KEEP and DROP statements. The following example reads observations from "jeeshim.gss2004"; selects only four variables; and then stores them into "WORK.gss_work1."
Alternatively, you may add the KEEP option in the DATA statement to make it simple.
The following two examples excludes three variables "state", "msa" and "vote" out of "WORK.gss_work2."
Keep in mind that both KEEP and DROP statements may not be used in a DATA step. However, you may use both KEEP and DROP options in a DATA stetement.
If you want to append observations, use the SET statement to add observations in secondary data sets (jeeshim.nes2002) to the master data set (jeeshim.nes2004).
The APPEND procedure and the DATASETS procedure also append the observations from one SAS data set to the end of master data set. These procedure are useful when the master data set is huge.
If master and secondary data sets have different data structures, the FORCE option is necessary. This option, however, does not append the variables that exist only in secondary data set.
SAS MERGE and UPDATE statements can merge SAS data sets. There are two types of merging: one-to-one merging and match-merging.
The one-to-one merging mechanically puts data sets together without distinguishing one observation from others. It looks like putting a new sheet of paper over an existing paper.
The match-merging distinguishes individual observations using identification variables (e.g., id and name). Thus, it requires the BY statement that specifies the common denominators.
You may also use the UPDATE statement with the NOMISSINGCHECK option. Since this statement supports only the match-merging, the BY statement is required.
See the merge.pdf for actual examples of the MERGE statement and the UPDATE statement. For complicated merging, use SAS/SQL to take advantage of SQL statement.
Variables are created, modified, recoded, and/or deleted in DATA steps.
The RETAIN statement is useful to do various tasks. For example, you can compute the cumulative sum of a variable. Click here for details.
Recoding:You may recode a variable using the IF statement.
The following usage is very convenient despite its complexity.
This usage is equivalent to the following.
You may recode a variable in a reverse order using an array.
SAS array can also conduct more complicated tasks as follows.
Renaming:You may change variable names using the RENAME statement.
If you need to handle multiple response questions, stack up the data set using the OUTPUT statement.
Suppose that respondents are asked to pick three choices out of ten regardless of order in choices (equal weight). The choices are coded into three variables x1 through x3.
The OUTPUT statement is executed three times to generate three observations per subject.