This page briefly explains how to use SAS input styles used in an INPUT statement: list input, modified list input, column input, formatted input, named input, and mixed input.
Note that INPUT should be used in a DATA step.
This web document may not be used for any commercial purposes. This page
may contain mistakes and errors.
If you have any question and suggestion, please leave a message on SAS bulletin board
.
INPUT OVERVIEW
The INPUT statement tells SAS how to read data by describing the arrangement of a target data.
Depending on input styles, the INPUT statement may provides variable name, $ (indicating a character value), pointer control, column-specifications, informat, and line hold specifiers (i.e., @, and @@).
- Column pointer controls such as @n and +n move the input pointer to a specified column in the input buffer.
- Line pointer controls such as #n and / move the input pointer to a specified line in the input buffer.
- Column specifications specify the columns of the input record that contain the value to read.
- A informat specifies an informat to use to read the varialbe value.
- @, a single trailing @, holds an input record for the execution of the next INPUT statement within the same iteration of the DATA step. Thus, the next INPUT statement reads from the same record (line).
- @@, a double trailing @, holds the input record for the execution of the next INPUT statement across iterations of the DATA step. Thus, the INPUT statement for the next iteration of the DATA step continues to read the same record (line).
The DATALINES statement (replacing the old CARDS statement) indicates that data lines follow in a SAS DATA step.
In order to read external data files, you have to use the INFILE statement.
There are six input styles used in the INPUT statement: list input, column input, formatted input, modified list input, named input, and mixed input.
| Variable | List (Free) | Modified | Column | Formatted |
| Length | Standard | Flexible | Nonstandard | Nonstandard |
| Delimiter | blank | Yes | n/a | n/a |
| Missing values | . | . or delimiter | blank | blank |
| Variable order | Yes | Yes | n/a | n/a |
| Re-reading | No | No | Yes | Yes |
| Format modifier | No | :, &, ~ | No | No |
| Pointer control | No | No | @n, +n, | @n, +n, #, / |
| Informat | No | No | No | $n., n.d, |
Which input style is the best? It definitely depends on your skills and characteristics of data sets.
If your data set has just a few observations with several variables, the list input or the named input in general will be better than the column input or the formatted input.
When data elements are not separated with a blank or other delimiters, you cannot use the list input style.
When data are well arranged, the column input or formatted input will be better than the list input.
Therefore, you need to carefully examine the data structure to decide the best input style. Of course, you must take this issue into account from the data coding stage.

LIST INPUT
This input style simply lists the variables separated with a blank.
This style is also called the free format.
DATA listed;
INPUT name $ id score;
DATALINESS /*--1----+----2---*/;
Park 8740031 87.5
Hwang . 94.3
...
RUN;
A character variable should be followed by $.
Missing values should be marked with a period (.); a blank does not mean a missing value in the list input style. Do not use more than one "." for a value.
The maximum length of a string variable is 8 characters (standard); that is, fixed 8bytes of memory is to be assigned to each variable. Therefore, a variable name longer than 8 characters will be trimmed.
If you want to use longer variable name than 8 characters, use LENGTH, INFORMAT, or ATTRIB statements. Or you may use different input styles such as column input or formatted input.
DATA _NULL_;
LENGTH analysis $15;
INPUT analysis $;
CARDS /*--1----+----2---*/;
Regression
ANOVA
Time-Series
...
RUN;
The following example reads a ASCII text file with comma delimited. Note that the default delimiter is a blank.
DATA _NULL_;
INFILE 'a:\tiger.dat' DELIMITER=',' STOPOVER;
INPUT name $ id score
...
RUN;

MODIFIED LIST INPUT
The modified list style is a mixture of the list input and the formatted input. This style can deal with ill-structured data delimited.
Three format modifiers enable to read complex data.
- The ampersand(&) format modifier reads character data values that contain embedded delimiters until encountering more than one consecutive delimiter.
- The colon(:) reads data longer than standard 8 characters or numbers until encountering specified delimiter or reaching the variable width specified.
- The tilda(~) reads delimiters within quoted character values as characters instead of as delimiters and retains the quotation marks.
DATA modified;
INFILE DATALINES DELIMITER=',' DSD;
INPUT name : $20. year : 4.0 title ~ $50. journal : $10.;
DATALINES;
Meyer and Rowan,1977,"Institutionalized Organization",ASR
Lindblom,1979,"Still Muddling, Not Yet Through",PAR
...
RUN;
- DSD option eliminates double quotation marks.
- : (colon) reads data with flexible lengths
- ~ (tilda) reads a comma (... Muddling, Not...) as a character.
- You may not omit the : after "year" even when data are in the same fixed format.

COLUMN INPUT
The column input style read input values from specified columns.
A variable name is followed by the starting and ending columns.
DATA columned;
INPUT name $ 1-5 id 6-12 score 14-17;
CARDS /*--1----+----2---*/;
Park 8740031 87.5
Hwang9301020 94.3
...
RUN;
This input style works good for well structured data.
NAMED INPUT
The named input read data values that appear after a variable name.
Variable names and data are separated by an equal sign.
String data are not enclosed by double quotation marks.
Like the list style, the named style supports standard length of variables only.
The format provides some sorts of flexibility, but it will not be appropriate for a large data set.
DATA named;
INPUT name=$ id= grade=;
CARDS;
name=Park id=8740031 grade=89
name=Hwang id=9301020 grade=100
...
RUN;
MIXED INPUT
The INPUT statement can contain list input, column input, formatted input, and/or named input.
DATA mixed;
INPUT name $ 1-5 @7 id $7. +1 grade1 3. grade2 18-22;
CARDS /*--1----+----2---*/;
Park 8740031 89 95.1
Hwang 9301020 100 93.9
...
RUN;

FORMATTED INPUT
The formatted input style reads input values with specified inforamts after varialbe names.
Informats provide the data type and the width of an input value.
Numeric variables are expressed in the w.d format, where w represents the total length of a variable and d the number of digits below the decimal point.
You cannot omit d even when d = 0.
The use $CHARw. or $w. format is used for character variables, while the DATEw. or DDMMYYw. format is used for the date type.
DATA formatted;
INPUT name $5. id 7. score 4.1;
DATALINES /*--+----2---*/;
Park 8740031 875 /* score=87.5 */
Hwang9301020 943 /* score=94.3 */
...
RUN;
You can use parentheses to simplify expressions.
DATA formatted;
INPUT name $5. id 7. (grade1-grade3) (3.);
DATALINES /*--+----2---*/;
Park 8740031 89 95100
Hwang9301020100 93 99
...
RUN;
The following example illustrates how effectively the formatted input uses column holders, informats (e.g., COMMAn., DOLLarn., PERCENTn., and MMDDYY10.), and parentheses.
DATA formatted;
INPUT (x1-x5) ($CHAR5. 7. 3*3.0) +1 income COMMA7.;
DATALINES /*--+----2----+----3*/;
Park 8740031 89 95100 84,895
Hwang9301020100 93 99 168,579
...
RUN;
The formattted input can use both column and line pointer controls.
See the next section for reading multiple observations from the same line or reading an observation from multiple lines.
- @n, a column control, moves the input pointer to nth column
- @@, a line holder, keeps the pointer in the line and wait other data input
- +n, a column control, moves the pointer to the right by n columns
- #n, a row control, goes to the nth line
- / goes to the first column of the next line

READING MULTIPLE OBSERVATIONS
SAS can read multiple observations in a line using the formatted input style.
DATA formatted;
INPUT name $ id $ (x1-x3)(3.) @@;
CARDS /*--1----+----2----+----3----+----4----+----5-*/;
Park 8740031 89 95100 Choi 9730625 100100 95
Hwang 9301020 100 93 99 ...
RUN;
DATA rbd_block;
INPUT treat $ @@;
DO block='High', 'Medium', 'Low'; /* DO block=1 TO 3;*/
INPUT income @@; OUTPUT;
END;
DATALINES;
Drug1 34 55 34
Drug2 45 56 32
Drug3 45 56 32
RUN;
Suppose individual observations have different numbers of repeatition.
DATA repeat;
INPUT crop $ no @;
DROP no;
IF no GT 0 THEN DO;
DO trial=1 TO no;
INPUT cost benefit @;
OUTPUT;
END;
END;
DATALINES;
rice 3 54 87 98 87 57 87
bean 2 65 87 96 54
RUN;

READING MULTIPLE LINES
SAS can read observations whose data are provided in multiple liens.
The #n or / indicates a data line to be read for the variable.
DATA spanned;
INPUT #1 No 7.0 #2 Name $CHAR15. / Address $CHAR50. #4 Phone $CHAR12.;
DATALINES;
000001
Park
2451 E. 10th St. APT 311
812-857-9425
000002
...
RUN;
Note that the INPUT may be replaced by "INPUT No 7.0 / Name $15 / Address $50 / Phone $12;" that produces the identical result.

REFERENCES
- SAS Institute. 2005. SAS Language Reference: Dictionary, 2nd ed., Version 9, Volumn 3. Cary, NC: SAS Institute.
- Korea University Computer Center. 1980s. SAS Workshop Manual.
- Korea University Computer Club. 1980s. SAS User's Gudie.
- Kim, Choong Ryun. 1993. The Statistics Package Called SAS: Focusing on the Statistics Analysis and Marketing Research Methods. Seoul: Data Research.