利用KEGG数据库进行ID转换

2020-03-27 02:22

利用KEGG数据库进行ID转换

clusterProfiler can convert biological IDs using OrgDb object via the bitr function. Now I implemented another function, bitr_kegg for converting IDs through KEGG API.library(clusterProfiler) data(gcSample) hg

head(hg)## [1] '4597' '7111' '5266' '2175' '755' '23046'

eg2np ## Warning in bitr_kegg(hg, fromType = 'kegg', toType = 'ncbi-proteinid',

## organism = 'hsa'): 3.7% of input gene IDs are fail to map...

head(eg2np)## kegg ncbi-proteinid ## 1 8326 NP_003499 ## 2 58487 NP_001034707 ## 3 139081 NP_619647 ## 4 59272 NP_068576 ## 5 993 NP_001780 ## 6 2676 NP_001487 np2up

head(np2up)## ncbi-proteinid uniprot ## 1 NP_005457 O75586 ## 2 NP_005792 P41567 ## 3 NP_005792 Q6IAV3 ## 4 NP_037536 Q13421 ## 5 NP_006054 O60662 ## 6 NP_001092002 O95398

The ID type (both fromType & toType) should be one of 'kegg', 'ncbi-geneid', 'ncbi-proteinid' or 'uniprot'. The 'kegg' is the primary ID used in KEGG database. The data source of KEGG was from NCBI. A rule of thumb for the 'kegg' ID is entrezgene ID for eukaryote species and Locus ID for prokaryotes.Many prokaryote species don't have entrezgene ID available. For example we can check the gene information of ece:Z5100 in

http://www.genome.jp/dbget-bin/www_bget?ece:Z5100, which have NCBI-ProteinID and UnitProt links in the Other DBs Entry, but not NCBI-GeneID.If we try to convert Z5100 to ncbi-geneid, bitr_kegg will throw error of ncbi-geneid is not supported.bitr_kegg('Z5100', fromType='kegg', toType='ncbi-geneid', organism='ece')

## Error in KEGG_convert(fromType, toType, organism) :

## ncbi-geneid is not supported for ece ... We can of course convert it to ncbi-proteinid and uniprot:bitr_kegg('Z5100', fromType='kegg', toType='ncbi-proteinid', organism='ece') ## kegg ncbi-proteinid ## 1 Z5100 AAG58814

bitr_kegg('Z5100', fromType='kegg', toType='uniprot', organism='ece') ## kegg uniprot ## 1 Z5100 Q7DB85

search_kegg_organismclusterProfiler supports more than 4k species listed in

http://www.genome.jp/kegg/catalog/org_list.html for hypergeometric test (enrichKEGG & enrichMKEGG) and GSEA (gseKEGG & gseMKEGG). We can use bitr_kegg to convert ID for all these 4k species. To facilitate searching scientific name abbreviate used in the organism parameter of these functions, I implemented the

search_kegg_organism function. We can search by kegg_code, scientific_name or common_name (which is not available for prokaryotes).search_kegg_organism('ece', by='kegg_code')## kegg_code

scientific_name common_name

## 334 ece Escherichia coli O157:H7 EDL933 (EHEC) ecoli

dim(ecoli)## [1] 64 3

head(ecoli)## kegg_code scientific_name common_name

## 329 eco Escherichia coli K-12 MG1655

## 330 ecj Escherichia coli K-12 W3110

## 331 ecd Escherichia coli K-12 DH10B

## 332 ebw Escherichia coli BW2952

## 333 ecok Escherichia coli K-12 MDS42

## 334 ece Escherichia coli O157:H7 EDL933 (EHEC)

keyType parameterWith the ID conversion utilities built in clusterProfiler, I add a parameter keyType in enrichKEGG, enrichMKEGG, gseKEGG and gseMKEGG. Now we can

use ID type that is not the primary ID in KEGG database.x

head(summary(x))## ID Description GeneRatio

## hsa04072 hsa04072 Phospholipase D signaling pathway 11/133

## hsa04060 hsa04060 Cytokine-cytokine receptor interaction 14/133

## hsa04390 hsa04390 Hippo signaling pathway 10/133

## hsa04975 hsa04975 Fat digestion and absorption 5/133

## hsa05221 hsa05221 Acute myeloid leukemia 6/133

## BgRatio pvalue p.adjust qvalue

## hsa04072 216/9275 0.0002654190 0.03901659 0.03240905

## hsa04060 354/9275 0.0005349245 0.03931695 0.03265855

## hsa04390 213/9275 0.0009536247 0.04199404 0.03488227

## hsa04975 58/9275 0.0014014886 0.04199404


利用KEGG数据库进行ID转换.doc 将本文的Word文档下载到电脑 下载失败或者文档不完整,请联系客服人员解决!

下一篇:2015新程专转本五套模拟试卷计算机基础最终版

相关阅读
本类排行
× 注册会员免费下载(下载后可以自由复制和排版)

马上注册会员

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信: QQ: