.
The INPUT statement tells SAS how to read data by describing the arrangement of a target data. Depending on input styles, the INPUT statement may provides variable name, $ (indicating a character value), pointer control, column-specifications, informat, and line hold specifiers (i.e., @, and @@).
The DATALINES statement (replacing the old CARDS statement) indicates that data lines follow in a SAS DATA step. In order to read external data files, you have to use the INFILE statement.
There are six input styles used in the INPUT statement: list input, column input, formatted input, modified list input, named input, and mixed input.
| Variable | List (Free) | Modified | Column | Formatted |
| Length | Standard | Flexible | Nonstandard | Nonstandard |
| Delimiter | blank | Yes | n/a | n/a |
| Missing values | . | . or delimiter | blank | blank |
| Variable order, DSD | Yes | Yes | n/a | n/a |
| Re-reading | No | No | Yes | Yes |
| Format modifier | No | :, &, ~ | No | No |
| Pointer control | n/a | n/a | @n, +n, | @n, +n, #, / |
| Informat | No | No | No | $n., n.d, |
Which input style is the best? It definitely depends on your skills and characteristics of data sets. If your data set has just a few observations with several variables, the list input or the named input in general will be better than the column input or the formatted input. When data elements are not separated with a blank or other delimiters, you cannot use the list input style. When data are well arranged, the column input or formatted input will be better than the list input. Therefore, you need to carefully examine the data structure to decide the best input style. Of course, you must take this issue into account from the data coding stage.
This input style simply lists the variables separated with a blank. This style is also called the free format.
A character variable should be followed by $. Missing values should be marked with a period (.); a blank does not mean a missing value in the list input style. Do not use more than one "." for a value. The maximum length of a string variable is 8 characters (standard); that is, fixed 8bytes of memory is to be assigned to each variable. Therefore, a variable name longer than 8 characters will be trimmed. If you want to use longer variable name than 8 characters, use LENGTH, INFORMAT, or ATTRIB statements. Or you may use different input styles such as column input or formatted input.
The following example reads a ASCII text file with comma delimited. Note that the default delimiter is a blank.
The modified list style is a mixture of the list input and the formatted input. This style can deal with ill-structured data delimited. Three format modifiers enable to read complex data.
The frist example below illustrates how : and &s; work in INPUT. The "Lindblom80" in the first row was trimed since it exceeds 8 characters; only first 8 characters, as specified in the INPUT statement, were read and the last two characters "08" were ignored. In the second row, SAS reads the first four characters "Park", which are shorter than 8 characters, and then encounters a comma (delimiter); SAS stops reading data for the variable "name" and moves on to next variable. The variable "title" is defined by &s; with a maximum 50 characters. The delimiter, a comma, in the first and third row was treated as a chracter value. Two consecutive double quotation marks were read as a double quotation marks. Characters exceeding the maximum, 50 characters in this case, will be ignored.
DSD eliminates double quotation marks enclosing the character value when reading data. If you omit DSD, SAS will consider a comma in character values as a delimiter and read enclosing double quotation marks as character values. For example, the first row will be '"Still Muddlng' and the second '"Reading ""Small Is Beautiful"""'
The second example shows how ~ (tilda) and DSD work to read a string with delimiter. SAS reads a comma in the string as a character value but does not eliminate double quoatation marks enclosing the string. If you omit DSD, the title of the second row will be '"Still Muddling' because SAS treats a comma in the string as the delimiter and stops reading the character value for variable "title."
Notice that you may not ommit : after "year" in INPUT even when data are in the same fixed format. When the variable "year" is specified at the last of the list in INPUT, : is not necessary although generally recommended.
The column input style read input values from specified columns. A variable name is followed by the starting and ending columns.
This input style works good for well structured data.
The named input read data values that appear after a variable name. Variable names and data are separated by an equal sign. String data are not enclosed by double quotation marks. Like the list style, the named style supports standard length of variables only. The format provides some sorts of flexibility, but it will not be appropriate for a large data set.
The INPUT statement can contain list input, column input, formatted input, and/or named input.
The formatted input style reads input values with specified inforamts after varialbe names. Informats provide the data type and the width of an input value. Numeric variables are expressed in the w.d format, where w represents the total length of a variable and d the number of digits below the decimal point. You cannot omit d even when d = 0. The use $CHARw. or $w. format is used for character variables, while the DATEw. or DDMMYYw. format is used for the date type.
You can use parentheses to simplify expressions.
The following example illustrates how effectively the formatted input uses column holders, informats (e.g., COMMAn., DOLLarn., PERCENTn., and MMDDYY10.), and parentheses.
The formattted input can use both column and line pointer controls. See the next section for reading multiple observations from the same line or reading an observation from multiple lines.
SAS can read multiple observations in a line using the formatted input style.
Suppose individual observations have different numbers of repeatition.
SAS can read observations whose data are provided in multiple liens. The #n or / indicates a data line to be read for the variable.
Note that the INPUT may be replaced by "INPUT No 7.0 / Name $15 / Address $50 / Phone $12;" that produces the identical result.