sas读入数据全解析(3)

2021-09-24 15:32

3 22 2640 3799

4 20 3250 4816

5 15 4080 7827

import缺陷及注意事项:

Proc import does not know the formats for your variables, but it is able to guess the format based on what the beginning of your dataset looks like. Most of the time, this guess is fine. But if the length of a variable differs from beginning to end of your file, you might end up with some truncated values.

重点语法-Infile options

For more complicated file layouts, refer to the infile options described below.

DLM=

The dlm= option can be used to specify the delimiter that separates the variables in your raw data file. For example, dlm=','indicates a comma is the delimiter (e.g., a comma separated file, .csv file). Or, dlm='09'x indicates that tabs are used to separate your variables (e.g., a tab separated file).

DSD

The dsd option has 2 functions. First, it recognizes two consecutive delimiters as a missing value. For example, if your file contained the line 20,30,,50 SAS will treat this as 20 30 50 but with the the dsd option SAS will treat it as 20 30 . 50 , which is probably what you intended. Second, it allows you to include the delimiter within quoted strings. For example, you would want to use the dsd option if you had a comma separated file and your data included values like "George Bush, Jr.". With the dsd option, SAS will

recognize that the comma in "George Bush, Jr." is part of the name, and not a separator indicating a new variable.

FIRSTOBS=

This option tells SAS what on what line you want it to start reading your raw data file. If the first record(s) contains header information such as variable names, then set firstobs=n where n is the record number where the data actually begin. For example, if you are reading a comma separated file or a tab separated file that has the variable names on the first line, then use firstobs=2 to tell SAS to begin reading at the second line (so it will ignore the first line with the names of the variables).

MISSOVER

This option prevents SAS from going to a new input line if it does not find values for all of the variables in the current line of data. For example, you may be reading a space delimited file and that is supposed to have 10 values per line, but one of the line had only 9 values. Without the missover option, SAS will look for the 10th value on the next line of data. If your data is supposed to only have one observation for each line of raw data, then this could cause errors throughout the rest of your data file. If you have a raw data file that has one record per line, this option is a prudent method of trying to keep such errors from cascading through the rest of your data file.

OBS=

Indicates which line in your raw data file should be treated as the last record to be read by SAS. This is a good option to use for testing your program. For example, you might use obs=100 to just read in the first 100 lines of data while you are testing your program. When you want to read the entire file, you can remove the obs= option entirely.

A typical infile statement for reading a comma delimited file that contains the variable names in the first line of data would be:

INFILE "test.txt" DLM=',' DSD MISSOVER FIRSTOBS=2 ;

读入有缺失值的数据或者读入数值中含有分隔符的数据

DATA cars2;

length make $ 20 ;

INFILE 'readdsd.txt' DELIMITER=',' DSD ;

INPUT make mpg weight price;

RUN;

PROC PRINT DATA=cars2;

RUN;

48,'Bill Clinton',210

50,'George Bush, Jr.',180

DATA guys2;

length name $ 20 ;

INFILE 'readdsd2.txt' DELIMITER=',' DSD ;

INPUT age name weight ;

RUN;

PROC PRINT DATA=guys2;

RUN;

最经典例子:从某行开始读入数据

DATA cars2;

length nf 8;

INFILE 'F:\cars1.csv' DELIMITER=',' dsd MISSOVER firstobs=2 ; /* obs=20; would read just the first 20 observations from your file. */

INPUT nf zh hh xb cs IHA fj;

RUN;

PROC PRINT DATA=cars2;

RUN;

从FTP读入数据

read raw data via FTP in SAS?

SAS has the ability to read raw data directly from FTP servers. Normally, you would use FTP to download the data to your local computer and then use SAS to read the data stored on your local computer. SAS allows you to bypass the FTP step and read the data directly from the other computer via FTP without the intermediate step of downloading the raw data file to your computer. Of course, this assumes that you can reach the computer via the internet at the time you run your SAS program. The program below

illustrates how to do this. After the filename in you put ftp to tell SAS to access the data via FTP. After that, you supply the name of the file (in this case 'gpa.txt'. lrecl= is used to specify the width of your data. Be sure to choose a value that is at least as wide as your widest record. cd= is used to specify the directory from where the file is stored. host= is used to specify the name of the site to which you want to FTP. user= is used to provide your userid (or anonymous if connecting via anonymous FTP). pass= is used to supply your password (or your email address if connecting via anonymous FTP).

sas读入数据全解析(3).doc 将本文的Word文档下载到电脑 下载失败或者文档不完整,请联系客服人员解决!

下一篇:小学英语语音课教学案例

相关阅读
本类排行
× 注册会员免费下载(下载后可以自由复制和排版)

马上注册会员

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信: QQ: