NIH Public AccessAuthor ManuscriptSci Signal. Author manuscript; available in PMC 2014 September 10.Published in final edited form as:Sci Signal. ; 6(269): pl1. doi:10.1126/scisignal.2004088.NIH-PA Author ManuscriptNIH-PA Author ManuscriptNIH-PA Author ManuscriptIntegrative Analysis of Complex Cancer Genomics and ClinicalProfiles Using the cBioPortalJianjiong Gao1, Bülent Arman Aksoy1, Ugur Dogrusoz2, Gideon Dresdner1, BenjaminGross1, S. Onur Sumer1, Yichao Sun1, Anders Jacobsen1, Rileen Sinha1, Erik Larsson3,Ethan Cerami1,4, Chris Sander1, and Nikolaus Schultz11Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, NY 10065,USA2Computer Engineering Department, Bilkent University, 06800 Ankara, Turkey3Institute of Biomedicine, Department of Medical Biochemistry and Cell Biology, University ofGothenburg, S-405 30 Gothenburg, Sweden4Blueprint Medicines, Cambridge, MA 02142, USAAbstractThe cBioPortal for Cancer Genomics (http://cbioportal.org) provides a Web resource forexploring, visualizing, and analyzing multidimensional cancer genomics data. The portal reducesmolecular profiling data from cancer tissues and cell lines into readily understandable genetic,epigenetic, gene expression, and proteomic events. The query interface combined with customizeddata storage enables researchers to interactively explore genetic alterations across samples, genes,and pathways and, when available in the underlying data, to link these to clinical outcomes. Theportal provides graphical summaries of gene-level data from multiple platforms, networkvisualization and analysis, survival analysis, patient-centric queries, and software programmaticaccess. The intuitive Web interface of the portal makes complex cancer genomics profilesaccessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitatingbiological discoveries. Here, we provide a practical guide to the analysis and visualization featuresof the cBioPortal for Cancer Genomics.IntroductionLarge-scale cancer genomics projects, such as The Cancer Genome Atlas (TCGA) and theInternational Cancer Genome Consortium (ICGC) (1), are generating an overwhelmingamount of cancer genomics data from multiple different technical platforms, making itincreasingly challenging to perform data integration, exploration, and analytics, especiallyfor scientists without a computational background. The cBioPortal for Cancer Genomics(http://cbioportal.org) (2) was specifically designed to lower the barriers of access to theCorrespondence should be addressed to cbioportal@cbio.mskcc.org; user support is available at cbioportal@googlegroups.com.Competing interests: The authors declare that they have no competing interests.Gao et al.Page 2
complex data sets and thereby accelerate the translation of genomic data into new biologicalinsights, therapies, and clinical trials.
NIH-PA Author ManuscriptNIH-PA Author ManuscriptNIH-PA Author ManuscriptThe portal facilitates the exploration of multidimensional cancer genomics data by allowingvisualization and analysis across genes, samples, and data types. Users can visualize patternsof gene alterations across samples in a cancer study, compare gene alteration frequenciesacross multiple cancer studies, or summarize all relevant genomic alterations in an
individual tumor sample. The portal also supports biological pathway exploration, survivalanalysis, analysis of mutual exclusivity between genomic alterations, selective datadownload, programmatic access, and publication-quality summary visualization.
Genomic data types integrated by cBioPortal include somatic mutations, DNA copy-numberalterations (CNAs), mRNA and microRNA (miRNA) expression, DNA methylation, proteinabundance, and phosphoprotein abundance. Currently, the portal contains data sets from 10published cancer studies (3–10), including the Cancer Cell Line Encyclopedia (CCLE) (10),and more than 20 studies that are currently in the TCGA pipeline (table S1). For each tumorsample, data may be available from multiple genomic analysis platforms. The portal'ssimplifying concept is to integrate multiple data types at the gene level and then query forthe presence of specific biological events in each sample (for example, genetic mutation,gene homozygous deletion, gene amplification, increased or decreased mRNA or miRNAexpression, and increased or decreased protein abundance). This allows users to querygenetic alterations per gene and sample and test hypotheses regarding recurrence andgenomic context of gene alteration events in specific cancers.
Equipment
A personal computer or computing device with an Internet browser with Javascriptenabled
Note: We support and test the following browsers: Google Chrome, Firefox 3.0 andabove, Safari, and Internet Explorer 9.0 and above.Adobe Flash player
Note: This browser plug-in is required for visualizing networks on the networkanalysis tab. It can be downloaded from http://get.adobe.com/flashplayer/. Thisrequirement is to be removed by mid-2013.Java Runtime Environment
Note: This application is needed for launching the Integrative Genomics Viewer(IGV). It can be downloaded from http://www.java.com/getjava/.Adobe PDF Reader
Note: This is necessary for viewing the Pathology Reports and for viewing many ofthe downloadable files. It can be downloaded from http://get.adobe.com/reader/.Vector graphic editor
Sci Signal. Author manuscript; available in PMC 2014 September 10.
Gao et al.Page 3
Note: This is necessary for visualizing and editing the SVG file of OncoPrints
downloaded from the cBioPortal. Examples of software supporting SVG are AdobeIllustrator (http://www.adobe.com/products/illustrator.html) and Inkscape (http://inkscape.org/).
NIH-PA Author ManuscriptNIH-PA Author ManuscriptNIH-PA Author ManuscriptInstructions
The genomic data sets in the cBioPortal for Cancer Genomics (http://cbioportal.org) can bequeried or downloaded by using an interactive Web interface or can be accessed
programmatically. Users have the option of querying a single cancer study or queryingacross cancer studies. They can also view relevant genomic alterations in individual cancersamples.
Querying Individual Cancer Studies
In a single-cancer query, users can explore and visualize genomic alterations in a selectedset of genes, including the relationship between alterations in these genes across all selectedsamples and the relationship between different data types for the same gene. There are foursteps to performing a query of a single-cancer study (Fig. 1). The general process isdescribed along with the specific query used to generate the results shown.
Users can select from one of more than 25 cancer studies. When selecting genomic profiles,mutations and CNAs are specified by default. When available, relative mRNA or miRNAexpression or relative protein and phosphoprotein abundance data can also be selected.Protein and phosphoprotein data are based on reverse phase protein array (RPPA)
experiments. For mRNA or miRNA data and protein and phosphoprotein data, z scores areprecomputed from the expression values, and users can specify the threshold or use thedefault setting (2 SDs from the mean). The z scores for mRNA expression are determinedfor each sample by comparing a gene's mRNA expression to the distribution in a referencepopulation that represents typical expression for the gene. If expression data are availablefor normal adjacent tissues, those data are used as the reference population; otherwise,expression values of all tumors that are diploid for the gene in question in the cancer studyare used. The z scores for miRNA expression or protein abundance are determined for eachsample by comparing with all samples with miRNA or protein data, respectively.
When defining case sets for analysis, the default option is set to match the selected genomicprofiles. For example, cases with sequencing data will be selected if querying for mutationsonly. However, the user can change this selection by choosing from the drop-down list ofcase sets defined by the available data (for example, tumors with mutations, CNA data, geneexpression, or RPPA data) or by known tumor subtypes. Users may also input specific casesof interest by selecting “User-Defined Case List” or build a customized case set based onclinical attributes in the “Build Case Set” dialog.
When entering gene sets for analysis, users can manually enter HUGO gene symbols, EntrezGene identifiers, and gene aliases or select from predefined gene sets or pathways ofinterest. If lists of recurrently altered genes are available for a given cancer study—forexample, recurrently mutated genes from MutSig or genes with recurrent CNAs from
Sci Signal. Author manuscript; available in PMC 2014 September 10.
Gao et al.NIH-PA Author ManuscriptNIH-PA Author ManuscriptNIH-PA Author ManuscriptPage 4
GISTIC (11)—then users can also select genes from these lists and either build the gene setby using these lists or add to the set of manually entered genes by selecting from these lists.
The Onco Query Language (OQL) can be used to refine the query (Table 1). OQL can beused in single- and cross-cancer queries. Once OQL is used in the initial query, this
refinement is reflected in results, such as the OncoPrint. Users can define alterations for fourdata types: CNAs, mutations, mRNA or miRNA expression changes, and protein orphosphoprotein abundance changes (Table 1). CNA and mutation events have discretesettings, whereas mRNA, miRNA, and protein abundance events have continuous settings.Expression values are converted to z scores to facilitate comparison and the definition ofalteration thresholds.
1.
General: Select a cancer study from the drop-down menu.Specific example: Select “Gliobastoma (TCGA, Nature 2008).”2.
General: Select the genomic profiles.
Specific example: Use the default setting with “Mutations” checked and “CopyNumber data” checked and “Putative copy-number alterations (RAE, 203 cases)”selected.
Note: Mutations and copy-number alterations are selected by default. Otheroptions are presented when the data are available. For mRNA or miRNA dataand protein and phosphoprotein data, the default z score threshold can beoptionally modified to a user-defined positive value. When both microarrayand RNA-Seq data are available, the RNA-Seq data set is preferred.
3.
General: Select a patient/case set from the drop-down menu or using the optionspresented in “Build Case Set.”
Specific example: Select “Tumors with sequence and aCGH data” from the drop-down menu.
Note: To enter a user-defined case list, this option must be selected from thedrop-down menu; then, enter the case ID separated by a space in the box thatappears.
4.
General: Enter genes of interest manually or by selecting from predefined lists.Specific example: Enter “CDKN2A CDK4 RB1” with spaces separating the genesand without any punctuation.
Note: Queries may be refined using Onco Query Language (OQL) (Table 1).
5.
General: Select the “Download Data” tab and select the desired data option toobtain a copy of the data in text format.
Specific example: Perform the following query from the Download Data tab:“CDKN2A CDK4 RB1” Select “Gliobastoma (TCGA, Nature 2008),”
“Mutations,” and “CDKN2A CDK4 RB1,” and press submit. Copy and paste thedisplayed data into a spreadsheet or choose “Save as” from the File menu in thebrowser.
Sci Signal. Author manuscript; available in PMC 2014 September 10.
Gao et al.Page 5
Note: Only data from one genomic profile can be selected for each downloadquery.
Viewing and Interpreting the Results
NIH-PA Author ManuscriptNIH-PA Author ManuscriptNIH-PA Author ManuscriptOn the basis of the query criteria, the portal classifies each gene in each sample as altered ornot altered, and this classification is used for all analysis and visualizations in the portal,each of which is represented on a separate tab. We describe the results shown in each tabbelow, using example queries. The query parameters representing the first four stepsoutlined in the previous section are shown on the figure associated with each example.Results Tab 1: OncoPrint—An OncoPrint is a concise and compact graphical summaryof genomic alterations in multiple genes across a set of tumor samples. Rows representgenes, and columns represent samples. Glyphs and color coding are used to summarizedistinct genomic alterations including mutations, CNAs (amplifications and homozygousdeletions), and changes in gene expression or protein abundance. Additional details areavailable by mousing over the event indicated on the gene and include the case ID (eachcase represents a patient sample or cell line), linked to the patient view page. For mutationevents, this also displays amino acid changes. By default, cases are sorted according toalterations. Users can also restore original case orders (alphabetical order by case ID for apredefined case lists, or the same order for a customized case list). Users also have theoption to remove unaltered cases from the visualization. By visualizing gene alterationsacross a set of cases, OncoPrints help identify trends such as mutual exclusivity or co-occurrence between genes within a gene set.
In addition to the OncoPrint, this results tab also includes information about the genes
queried that is available in the Sanger Cancer Gene Census and links to the Gene database inNCBI.
We use the OncoPrint from a query for alterations in the retinoblastoma (RB) pathway genesCDKN2A (encoding the cyclin-dependent kinase inhibitor p16), CDK4 (encoding cyclin-dependent kinase 4), and RB1 in glioblastoma multiforme (GBM) as an example (Fig. 2).From the OncoPrint, 65 cases (71%) have an alteration in at least one of the three genes,with the frequency of alteration in each of the three selected genes shown. For CDKN2A,most of the alterations are homozygous deletions, and there are a few mutations. The
alterations in CDK4 are amplifications. Events associated with RB1 included a deletion andseveral mutations (3). The alterations in these three genes are distributed in a nearly
mutually exclusive way across samples, which can be statistically analyzed and visualizedwith the Mutual Exclusivity tab.
1.Perform the query as specified in Fig. 2. Once the “submit” button is pressed, theOncoPrint result is displayed automatically.
2.Use the horizontal scroll bar if the genes do not fit the window.
3.
To make an OncoPrint more compact, there are three options available from the“Customize” button: (i) scale the OncoPrint by using the “Zoom” bar; (ii) remove
Sci Signal. Author manuscript; available in PMC 2014 September 10.