Deep sequencing-based expression analysis shows major advanc

2018-12-17 14:28

Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms

Peter A. C. 't Hoen1,*, Yavuz Ariyurek1, Helene H. Thygesen1, Erno

Vreugdenhil2, Rolf H. A. M. Vossen1, Renée X. de Menezes1, Judith M. Boer1, Gert-Jan B. van Ommen1 and Johan T. den Dunnen1

The Center for Human and Clinical Genetics and the Leiden Genome

Technology Center, Leiden University Medical Center and 2The Department of Medical Pharmacology from the Leiden/Amsterdam Center for Drug Research, Leiden, The Netherlands

*To whom correspondence should be addressed. Tel: +31 71 526 9421; Fax: +31 71 526 8285; Email: p.a.c.hoen@lumc.nl

Received August 12, 2008. Revised September 16, 2008. Accepted September 29, 2008.

TOP ABSTRACT INTRODUCTION MATERIALS AND METHODS RESULTS DISCUSSION SUPPLEMENTARY DATA FUNDING REFERENCES ABSTRACT The hippocampal expression profiles of wild-type mice and mice transgenic for C-doublecortin-like kinase were compared with Solexa/Illumina deep sequencing technology and five different microarray platforms. With

Illumina's digital gene expression assay, we obtained 2.4 million sequence tags per sample, their abundance spanning four orders of

magnitude. Results were highly reproducible, even across laboratories. With a dedicated Bayesian model, we found differential expression of 3179 transcripts with an estimated false-discovery rate of 8.5%. This is a much higher figure than found for microarrays. The overlap in differentially expressed transcripts found with deep sequencing and microarrays was most significant for Affymetrix. The changes in expression observed by deep sequencing were larger than observed by microarrays or quantitative PCR. Relevant processes such as calmodulin-dependent protein kinase activity and vesicle transport along microtubules were found affected by deep sequencing but not by microarrays. While undetectable by microarrays,

antisense transcription wasfound for 51% of all genes and alternative polyadenylation for 47%. We conclude that deep sequencing provides a major advance in robustness, comparability and richness of expression profiling data and is expected to boost collaborative, comparative and integrative

genomics studies.

TOP ABSTRACT INTRODUCTION MATERIALS AND METHODS RESULTS DISCUSSION SUPPLEMENTARY DATA FUNDING REFERENCES INTRODUCTION Gene expression microarrays are at present the default technology for transcriptome analysis. Since they rely on sequence-specific probe hybridization, they suffer from background and cross-hybridization

problems and measure only the relative abundances of transcripts (1). Moreover, only predefined sequences are detected. In contrast, tag-based sequencing methods like SAGE (Serial Analysis of Gene Expression) measure absolute abundance and are not limited by array content (2). However, laborious and costly cloning and sequencing steps have thus far greatly

limited the use of SAGE. This has radically changed with the introduction of deep sequencing technology, enabling the simultaneous sequencing of up to millions of different DNA molecules. The shared idea behind the different deep sequencing approaches is the clonal detection of single DNA molecules at physically isolated locations(3–5). We used the

Solexa/Illumina 1G Genome Analyzer, in which adapter sequences, ligated to both ends of the DNA molecule, are bound to a glass surface coated with complementary oligonucleotides. This is followed by solid-phase DNA amplification and sequencing-by-synthesis (6). The system yields

millions of short reads (currently up to 36 bp), and is therefore very suitable for tag-based transcriptome sequencing. The technology is also referred to as Digital Gene Expression tag profiling (DGE), and is

essentially an improvedversion of the earlier Massively Parallel Signature Sequencing (MPSS) technology(3,7).

The first steps of the procedure are similar to classical LONG-SAGE. Two restriction enzymes are used to generate tags, cutting at the most 3' CATG and 17 bp downstream of the first enzyme site. Unlike in classical SAGE, tags are neither concatenated nor cloned, but sequenced immediately. The unprecedented sequencing depth now enables the analysis of individual biological samples, while pooling of samples was previously the only affordable option in SAGE. Our results include a striking example of the intrinsic hazards of pooling in expression profiling.

The biological question addressed in the current study was the identification of transcripts differentially expressed in the hippocampus between wild-type and transgenic mice overexpressing a splice variant of the doublecortin-like kinase-1 (Dclk1) gene. This splice variant, C-doublecortin-like kinase (DCLK)-short, makes the kinase constitutively active (8), and causes subtle behavioral phenotypes (Schenk et al., in preparation). The exact same RNA samples have been analyzed before on five different genome-wide microarray expression profiling platforms (9), which detected few differences in expression between the two groups. We report here that DGE detects a lot more small, yet significant differences between the two groups of mice, including

thosein antisense transcripts and transcripts with different

3'-untranslated regions (UTRs). Furthermore, we discuss the advantages of deep sequencing over microarray expression profiling.

MATERIALS AND METHODS

TOP Samples ABSTRACT Wild-type male C57/BL6j and transgenic male mice INTRODUCTION overexpressing DCLK-short with a C57/BL6j MATERIALS AND METHODS background were individually housed 7 days prior to RESULTS the start of the experiment. Animals were housed

DISCUSSION under standard conditions, 12 h/12 h light/dark SUPPLEMENTARY DATA

cycle and hadaccess to food and water ad libitum. FUNDING Wild-type (N = 4) and transgenic (N = 4) tissue REFERENCES

samples were collected by taking the brain fromthe skull and quickly dissecting out both hippocampi. Dissection was performed at 0° C to prevent degradation of RNA.

Hippocampiwere put directly in pre-chilled tubes containing Trizol reagent (Invitrogen Life Technologies, Carlsbad, CA, USA). All animal treatments were approved by the Leiden University Animal Care and Use Committee (UDEC# 01022).

RNA extraction

After transfer to ice-cold Trizol, hippocampi were homogenized using a tissue homogenizer (Salm&Kipp, Breukelen, The Netherlands) and total RNA was isolated according to the manufacturer's protocol. After

precipitation, RNA was purified with Qiagen's RNeasy kit with on-column DNase digestion. The quality of the RNA was assessed with the RNA 6000 Labchip kit in combination with the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA), using the Eukaryote Total RNA Nano assay according to the manufacturer's instructions.

Sequence tag preparation Sequence tag preparation was done with Illumina's Digital Gene Expression Tag Profiling Kit according to the manufacturer's protocol (version 2.1B). A schematic overview of the procedure is given in Supplementary Figure 1.1.pdf One microgram of total RNA was incubated with oligo-dT beads to capture the polyadenlyated RNA fraction. First- and second-strand cDNA synthesis were performed while the RNA was bound to the beads. While on the beads, samples were digested with NlaIII to retain a cDNA fragment

from themost 3' CATG to the poly(A)-tail. Subsequently, the GEX adapter1 was ligated to the free 5' end of the RNA, and a digestion with MmeI was performed, which cuts 17 bp downstream of the CATG site. At this point, the fragments detach from the beads. After dephosphorylation and phenol extraction, the GEX adapter 2 was ligated to the 3' end of the tag. A PCR amplifcation with 15 cycles using Phusion polymerase (Finnzymes) was performed with primers complementary to the adapter sequences to enrich the samples for the desired fragments. The resulting fragments of 85 bp were purified by excision from a 6% polyacrylamide TBE gel. The DNA was

eluted from the gel debris with 1x NEBuffer 2 by gentle rotation for 2 h at room temperature. Gel debris were removed using Spin-X Cellulose Acetate Filter (2 ml, 0.45 μm) and the DNA was precipitated by adding 10 μl of 3 M sodium acetate (pH 5.2) and 325 μl of ethanol (–20°C), followed by centrifugation at 14 000 r.p.m. for 20 min. After washing the pellet with 70% ethanol, the DNA was resuspended in 10 μl of 10 mM Tris–HCl, pH8.5 and quantified the DNA with a Nanodrop 1000 spectrophotometer. Sequencing using Solexa/Illumina Whole Genome Sequencer

Cluster generation was performed after applying 4 pM of each sample to the individual lanes of the Illumina 1G flowcell. After hybridization of the sequencing primer to the single-stranded products, 18 cycles of base

incorporation were carried out onthe 1G analyzer according to the manufacturer's instructions. Image analysis and basecalling were

performed using the Illumina Pipeline, where sequence tags were obtained after purity filtering. This was followed by sorting and counting the unique tags. The raw data (tag sequences and counts) have been submitted to Gene Expression Omnibus (GEO) under series GSE10782 [NCBI GEO] . Illumina DGE tag annotation

All tags were annotated using a database provided by Illumina. Briefly, a preprocessed database of all possible CATG + 17-nt tag sequences was created, using mouse genome (mm8 version from UCSC site) and mouse transcriptome (all refseq, mRNA and ESTs found in GenBank as of November 2006 and Unigene version Mm159). All tags were classified based on the location and orientation in the original sequence as outlined in Supplementary Table 1. The genome was used as a backbone for tag clustering, using tag per genome position as a unique key. Best possible ‘local’ annotation was chosen for each genome location. Finally, best annotation for each distinct tag sequence was chosen based on quality of local annotation and number of transcripts in that location. The total number of genome and transcriptome hits for each tag is also recorded. This nonredundant set of all tags (‘tophit’) could be used as a lookup table for all experimental tags annotation. Only perfect matches were considered, and no mismatches were allowed.

The total set of all annotation tags could be separated into several groups:

canonical transcriptomic tags—3'-mosttags from known transcripts (the 52 281 tags most expected in a DGE tag profiling experiment); noncanonical

transcriptomictags–all tags in the mouse genome that map to any known exon (both strands) but not 3'-most or derived from few ESTs only (1.6 million tags); tags derived from ribosomal (46 tags) and mitochondrial RNA (108 tags); REPEAT tags—tags that map to the genome more than 100 times (2900 tags); and tags that map to the genome but not to any known exon (17 million ‘just genome’ tags).

共6页:

Deep sequencing-based expression analysis shows major advanc.doc 将本文的Word文档下载到电脑下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档