从方法的机理上看,PNN-DS多信源信息融合模型具有以下优点:
1)可降低每个神经网络处理数据样本的维数,充分利用概率神经网络收敛速度快和计算机并行处理能力,可以加快神经网络训练速度和诊断决策时间,进而解决高维输入神经网络训练收敛速度慢和诊断时间长等问题。分类各信息基因子集的神经网络工作相互独立,新特征基因信息增加方便,该分类识别系统具有可扩展性强的特点;
2)通过DS证据理论对不同特征信息子集的神经网络输出进行信息融合,可以综合不同类型信息,从而克服由单一特征信息提取和识别带来的误判。
九、参考文献
[1] 李颖新,刘全金,阮晓钢. 急性白血病的基因表达谱分析与亚型分类特征的鉴别[J]. 中国生物医学工程学报,2005,24(2) :240-244.
[2] 李颖新,阮晓钢. 基于基因表达谱的肿瘤亚型识别与分类特征基因选取研究[J]. 电子学报,2005,33(4) :651-655.
[3] 王树林. 生物子序列频数分布与肿瘤亚型分类模型研究[D]. 长沙:国防科技大学,2007. [4] 王树林,王戟,陈火旺,张波云. 基于主成份分析的肿瘤分类检测算法研究[J]. 计算机工程与科学,2007,29(9) :84-90. [5] 刘全金,李颖新,阮晓钢. 基于BP 网络灵敏度分析的肿瘤亚型分类特征基因选取[J]. 中国生物医学工程学报,2008,27(5) :710-715.
[6] 刘全金,李颖新,朱云华,阮晓钢. 基于BP 神经网络的肿瘤特征基因选取[J]. 计算机工程与应用,2005,34 :184-186.
[7] 黄德生. 基因识别和微阵列数据识别算法研究[D]. 北京:中国医科大学,2009. [8] 崔光照,曹祥红,张华. 基于小波变换的基因表达数据去噪聚类分析[J]. 信号处理,2005,21(4A) :463-466.
[9] 张海平,何正友,张钧. 基于量子神经网络和证据融合的小电流接地选线方法[J]. 电工技术学报,2009,24(12) :171-178.
21
附录
附录清单
附录一:两两冗余法和主成分分析法提取特征基因; 附录二:加权得分程序
附录一:两两冗余法和主成分分析法提取特征基因
clear all; clc;
fid1=fopen('pre_pro.txt','r');
data1=fscanf(fid1,'%g',[62,1909]); data=data1';
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %数据归一化 sum_total=0; std_total=0; for i=1:1909 for j=1:62
sum_total=sum_total+data(i,j); end end
ave=sum_total/(62*1909); for i=1:1909 for j=1:62
std_total=std_total+(data(i,j)-ave)^2; end end
std=sqrt(std_total/(62*1909-1)); for i=1:1909 for j=1:62
data_guiyihua(i,j)=(data(i,j)-ave)/std; end end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % 计算正常与癌变的均值与标准差 for i=1:1909
normal_ave(i)=sum(data_guiyihua(i,1:12))/12; for j=1:12
normal_biaozhuncha(i)=sqrt(sum(data_guiyihua(i,j)-normal_ave(i))^2/(12-1)); end end
22
for i=1:1909
cancer_ave(i)=sum(data_guiyihua(i,13:62))/40; for j=13:62
cancer_biaozhuncha(i)=sqrt(sum(data_guiyihua(i,j)-cancer_ave(i))^2/(40-1)); end end
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % 方法一:d为基因信噪比 % for i=1:1909 %
d(i)=abs((normal_ave(i)-cancer_ave(i))/(normal_biaozhuncha(i)+cancer_biaozhuncha(i))); % end % dd=d'; %
% [A,ind]=sort(d,'descend'); % d_juli=zeros(1909,2); %
% for i=1:1909
% d_juli(i,1)=ind(i); % d_juli(i,2)=A(i); % end %
% for i=1:300
% choose_300(i,1)=d_juli(i,1); % choose_300(i,2)=d_juli(i,2); % end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % 方法二:b为Bhattacharyya距离 for i=1:1909
b(i)=1/4*(normal_ave(i)-cancer_ave(i))^2/(normal_biaozhuncha(i)^2+cancer_biaozhuncha(i)^2)...
+1/2*log((normal_biaozhuncha(i)^2+cancer_biaozhuncha(i)^2)/(2*normal_biaozhuncha(i)*cancer_biaozhuncha(i))); end bb=b';
[B,ind2]=sort(b,'descend'); b_juli=zeros(1909,2);
23
for i=1:1909
b_juli(i,1)=ind2(i); b_juli(i,2)=B(i); end
nn=200; for i=1:nn
choose_300(i,1)=b_juli(i,1); choose_300(i,2)=b_juli(i,2); end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
index_huanyuan=zeros(1909,1); %提取的归一化数据 for i=1:nn
temp=choose_300(i,1); index_huanyuan(i)=temp; for j=1:62
data_tiqu(i,j)=data_guiyihua(temp,j); end end
%提取的未归一化的数据 for i=1:nn
temp=choose_300(i,1);
index_huanyuan(i)=temp;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%对应抽取出的300个基因的标号 for j=1:62
data_tiqu1(i,j)=data(temp,j); end end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %以下为冗余算法 for i=1:nn
ave_yangben(i)=sum(data_tiqu(i,:))/62; end
coef1=zeros(nn,nn); coef2=zeros(nn,1); for i=1:nn
24
for j=1:nn
for k=1:62
coef1(i,j)=coef1(i,j)+(data_tiqu(i,k)-ave_yangben(i))*(data_tiqu(j,k)-ave_yangben(j)); end end end
for i=1:nn for j=1:62
coef2(i)=coef2(i)+(data_tiqu(i,j)-ave_yangben(i))^2; end end
for i=1:nn for j=1:nn
coef(i,j)=coef1(i,j)/sqrt(coef2(i)*coef2(j)); end end
newB=choose_300(:,2);
newindex11=choose_300(:,1);
for i=1:nn
for j=i+1:nn
if (coef(i,j)>0.5) newB(j)=0;
newindex11(j)=0;
else
break; end
end end j=1;
for i=1:nn
if newB(i)~=0;
newBB(j)=newB(i);
newindex22(j)=newindex11(i); j=j+1;
25