datastage经验总结 - 图文(2)

2019-08-03 11:10

Dastage经验总结

2.9 行列互换之Horizontal Pivot（Pivot Stage）

列变行,即宽表变窄表,字段变少了,记录数变多了，牵涉到Column数量的变化；注意要在Pivot-?Output的Derivation中写上转换字段的来源字段,字段之间用逗号隔开例子：

PIVOT Input记录如下：

Id col1 col2 col3

2 Rootpath Workdate EdsDbname 3 Rootpath Workdate AsdmDbname PIVOT Output记录如下： Id colum 2 Rootpath 2 Workdate 2 EdsDbname 3 Rootpath 3 Workdate

3 AsdmDbname

2.10 行列互换之Vertical Pivot

Pivot Stage是宽表变窄表，也即：Horizontal Pivot；实际应用中还会牵涉到窄表宽表，即Vertical Pivot的应用。例如：输入Input记录如下： Id Column 2 Rootpath 2 Workdate 2 EdsDbname 3 Rootpath 3 Workdate 3 AsdmDbname

我们想要的输出Output记录是这样的： Id NewCol 2 Rootpath，Workdate，EdsDbname 3 Rootpath，Workdate，AsdmDbname

解决方法如下：

Server Job的做法：

Sequence File---? Transform---? Hash File 源表结构： Id varchar 10 Column varchar 10

Define Transform as follows

Individual 2007-10 6 / 64

Dastage经验总结

Stage Variables： currentKey

Initial value = \ Derivation = L1.Id newRecord

Initial value = \

Derivation = if currentKey=lastKey Then newRecord:\lastKey

Initial value = \

Derivation = currentKey L2 Deriviations L2.key = L1.Id

L2.line = newRecord

目标表结构：

Id varchar 10 (marked as the key) Newcol varchar 200

（注意：Stage Variable的有先后顺序的，它是按照先后顺序来赋值的,所以lastKey要在newRecord后面）

如果把Newcol的值放在不同的字段中，格式如下：

Id Col1 Col2 Col3

2 Rootpath Workdate EdsDbname 3 Rootpath Workdate AsdmDbname

解决方法是：把NewColm的值读出来赋给一个Stage Variable，然后使用Field(NewCord, \，Field(NewCord, \等等，把值赋给每个Colm。

Parallel Job的做法：（按照SERVER JOB的做法，然后改成串行方式也能实现） 1. 使用Sort Stage对Key Column： Id 进行分区和排序；并设置Create Key Change

Column=True（作用是第一条记录会标识为1，其它0），产生KeyChange Column；

运行出来的结果如下： Id Column KeyChange

-- ---------- ---------- 2 Rootpath 1 2 Workdate 0 2 EdsDbname 0 3 Rootpath 1 3 Workdate 0 3 AsdmDbname 0

2. 在Transform Stage里创建Stage Variable; 创建变量后，根据KeyChange的值来设置变量的值；如：创建变量svBuildColum，

赋值：if DSLink12.keyChange=1 then DSLink12.Column else svBuildColumn : \运行出来的结果如下：

Individual 2007-10 7 / 64

Dastage经验总结

Id Column KeyChange svBuildColum

-- ---------- ---------- ------------------------------------------------ 2 Rootpath 1 Rootpath 2 Workdate 0 Rootpath $Workdate 2 EdsDbname 0 Rootpath $Workdate $EdsDbname 3 Rootpath 1 Rootpath 3 Workdate 0 Rootpath $Workdate 3 AsdmDbname 0 Rootpath $Workdate $AsdmDbname

3. 使用Remove_Duplicates Stage，根据Key Colum：Id去除重复行，并Retain Last; 运行的结果如下：

Id svBuildColum

----- ---------------------------------------------- 2 Rootpath $Workdate $EdsDbname 3 Rootpath $Workdate $AsdmDbname

4. 如果把svBuildColum的值放在不同的字段中，使用Field(NewCord, \，

Field(NewCord, \等等，把值赋给每个Colm. 最后结果如下：

Id Col1 Col2 Col3

2 Rootpath Workdate EdsDbname 3 Rootpath Workdate AsdmDbname 2.11 Oracle EE Stage在VIEW数据时出现的错误及解决方法

错误信息如下：

##I TOSH 000002 04:05:22(001) orchgeneral: loaded ##I TOSH 000002 04:05:22(002) orchsort: loaded ##I TOSH 000002 04:05:22(003) orchstats: loaded

>##E TOSH 000205 04:05:22(004) PATH search failure:

>##E TOSH 000000 04:05:22(005) Error loading \Could not load \: The specified module could not be found.

>##E TOSH 000000 04:05:22(006) Could not locate operator definition, wrapper, or Unix command for \please check that all needed libraries are preloaded, and check the PATH for the wrappers

>##E TCOS 000029 04:05:22(007) Creation of step finished with status = FAILED

解决方法：

running 7.5x2 EE on the Windows platform

1. cd to your C:\\Ascential\\DataStage\\PXEngine\\install

2. type sh

3. ORACLE_HOME=\ 4. export ORACLE_HOME

5. APT_ORCHHOME=\ 6. export APT_ORCHHOME 7. sh install.liborchoracle

Individual 2007-10 8 / 64

Dastage经验总结

then you will see the message on the screen; Installing Oracle Drvie

Using C:/Your_Oracle_Client as ORACLE_HOME Installing drive for Oracle version 9i or 10g Oracel installation is complete.

Reboot the machine after above is done

2.12 DataStage SAP Stage的使用

见附件：

E:\\个人学习\\DataStage SAP Stage的

2.13 Colum Import Stage的使用

将一个字段中的数据输出到多个字段中, 完成分割单个字段数据到多个字段的目的;

输入数据应为定长或者有可以被识别的可分割的界限，必须是String或者Binary类型的，输出数据可以是任何数据类型;

Individual 2007-10 9 / 64

Dastage经验总结

字段分割后：

Individual 2007-10 10 /

共10页:

datastage经验总结 - 图文(2).doc 将本文的Word文档下载到电脑下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档