MISSOVER:不读入下一行数据,而将未赋值的变量以缺失值填充。
TRUNCOVER:当使用column或格式化读入方式时,某些数据行长度小于其他数据行长度时,使用TRUNCOVER选项,可防止SAS读入下一行数据。
使用DATA步读入分隔符文件
在INFILE语句中使用DLM= 选项或者DSD选项可以读入以特定符号作为分隔符的原始文件。
(1)The DLM= option (i.e. DLM=’&’)
如果是以Tab作为分隔符,则使用DLM=’09’X命令
(2)The DSD option:主要有三个功能
忽略单引号内的分隔符;
不将引号作为数据读入SAS;
将一行内连续两个单引号作为一个缺失值处理。
使用IMPORT程序步读入分隔符文件
IMPORT 程序的功能
(1) 自动扫描数据文件,并确定变量的类型(数值型或字符型);
(2) 为字符型变量,自动设定变量的长度;
(3) 识别一些日期型数据;
(4) 将两个连续的分隔符作为一个缺失值读入SAS
(5) 读入引号内数据
(6) 自动将原始数据中不存在的变量赋缺失值;
PROC IMPORT DATAFILE=’filename’ OUT=data-set;
SAS根据读入文件的扩展名确定文件的类型。若读入文件没有正确的扩展名,或者是DLM文件,用户必须在IMPORT程序步中使用DBMS=option 选项。当读入数据集的名称已经存在于SAS库中,可用REPLACE选项将原数据覆盖。
PROC IMPORT DATAFILE=’filename’ OUT=data-set DBMS=identifier REPLACE;
在默认情况下,IMPORT程序步将第一行数据作为变量的名称。若第一行数据并非变量名,可在IMPORT
语句后使用GETNAMES=NO语句。
若IMPORT程序读入的是分隔符文件,默认分隔符为空格。若不是,则需使用DILIMITER=statement语句指定分隔符。
PROC IMPORT DATAFILE=’filename’ OUT=data-set
DBMS=DLM REPLACE;
GETNAMES=NO;
DELIMITER=’delimiter-character’;
RUN;
使用IMPORT程序步读入PC文件
PROC IMPORT DATAFILE=’filename’ OUT=data-set
DBMS=identifier REPLACE;
列示SAS数据集的内容
PROC CONTENTS DATA=data-set;
CONTENTS程序步的功能是显示SAS对数据集的具体描述,主要内容有:
(1) 数据集描述
数据集的名称;
观测的数量;
变量的数量;
创建日期
(2) 变量描述
变量类型;
变量长度;
变量的输出格式;
变更的输入格式;
变量标识。
实例:
1.读入逗号分隔数据:cars_novname.csv
Acura,MDX,SUV,Asia,All,"$36,945 ","$33,337 ",3.5,6,265,17,23,4451,106,189
Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820 ","$21,761 ",2,4,200,24,31,2778,101,172
Acura,TSX 4dr,Sedan,Asia,Front,"$26,990 ","$24,647 ",2.4,4,200,22,29,3230,105,183
Acura,TL 4dr,Sedan,Asia,Front,"$33,195 ","$30,299 ",3.2,6,270,20,28,3575,108,186
Acura,3.5 RL 4dr,Sedan,Asia,Front,"$43,755 ","$39,014 ",3.5,6,225,18,24,3880,115,197
proc import datafile="cars_novname.csv" out=mydata dbms=csv replace;
getnames=no;
run;
proc contents data=mydata;
run;
SAS creates default variable names as VAR1-VARn when variables names are not present in the raw data file.
2.读入制表键分隔的数据:
proc import datafile="cars.txt" out=mydata dbms=tab replace;
getnames=no;
run;
3.根据不同任务将不同的数据集永久保存到对应任务的文件夹下:
libname dis "c:\dissertation";
proc import datafile="cars.txt" out=dis.mydata dbms=dlm replace;
delimiter='09'x;
getnames=yes;
run;
3.读入空格键分隔的数据:
proc import datafile="cars_sp.txt" out=mydata dbms=dlm replace;
getnames=no;
run;
4.分隔符的终极例子:
Other kinds of delimiters
You can use delimiter= on the infile statement to tell SAS what delimiter you are using to separate
variables in your raw data file. For example, below we have a raw data file that uses exclamation points ! to separate the variables in the file.
22!2930!4099
17!3350!4749
22!2640!3799
20!3250!4816
15!4080!7827
The example below shows how to read this file by using delimiter='!' on the infile statement.
DATA cars;
INFILE 'readdel1.txt' DELIMITER='!' ;
INPUT mpg weight price;
RUN;
PROC PRINT DATA=cars;
RUN;
As you can see in the output below, the data was read properly.
OBS MPG WEIGHT PRICE
1 22 2930 4099
2 17 3350 4749
3 22 2640 3799
4 20 3250 4816
5 15 4080 7827
It is possible to use multiple delimiters. The example file below uses either exclamation points or plus signs as delimiters.
22!2930!4099
17+3350+4749
22!2640!3799
20+3250+4816
15+4080!7827
By using delimiter='!+' on the infile statement, SAS will recognize both of these as valid delimiters. DATA cars;
INFILE 'readdel2.txt' DELIMITER='!+' ;
INPUT mpg weight price;
RUN;
PROC PRINT DATA=cars;
RUN;
As you can see in the output below, the data was read properly.
OBS MPG WEIGHT PRICE
1 22 2930 4099
2 17 3350 4749