Single Molecule, Real-Time Sequencing of Full ... - Pacific Biosciences

Report 0 Downloads 43 Views
Single Molecule, Real-Time Sequencing of Full-length cDNA Transcripts Uncovers Novel Alternatively Spliced Isoforms Tyson A. Clark, Ting Hon, and Elizabeth Tseng Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025

Abstract

Sample Prep Improvements

SageELF™ Size Fractionation

In higher eukaryotic organisms, the majority of multiexon genes are alternatively spliced. Different mRNA isoforms from the same gene can produce proteins that have distinct properties such as structure, function, or subcellular localization. Thus, the importance of understanding the full complement of transcript isoforms with potential phenotypic impact cannot be underscored. While microarrays and other NGS-based methods have become useful for studying transcriptomes, these technologies yield short, fragmented transcripts that remain a challenge for accurate, complete reconstruction of splice variants.

Protocol Adjustments Improve Representation of Longer Transcripts

SageELF Allows For Collection of cDNA Molecules in 12 Fractions Across the Entire Size Distribution

The Iso-Seq™ protocol developed at PacBio offers the only solution for direct sequencing of full-length, singlemolecule cDNA sequences to survey transcriptome isoform diversity useful for gene discovery and annotation. Knowledge of the complete isoform repertoire is also key for accurate quantification of isoform abundance. As most transcripts range from 1 – 10 kb, fully intact RNA molecules can be sequenced using SMRT® Sequencing (avg. read length: 10-15 kb) without requiring fragmentation or post-sequencing assembly. Our open-source computational pipeline delivers high-quality, non-redundant sequences for unambiguous identification of alternative splicing events, alternative transcriptional start sites, polyA tail, and gene fusion events. The standard Iso-Seq protocol workflow available for all researchers is presented using a deep dataset of fulllength cDNA sequences from the MCF-7 cancer cell line, and multiple tissues (brain, heart, and liver). Detected novel transcripts approaching 10 kb and alternative splicing events are highlighted. Even in extensively profiled samples, the method uncovered large numbers of novel alternatively spliced isoforms and previously unannotated genes.

Phusion Kapa Hifi SeqAmp

5-10 kb

3-6 kb

2-3 kb

1-2 kb

Liver 8-12 kb

5-10 kb

3-6 kb

2-3 kb

1-2 kb

10-15 kb

Heart 8-12 kb

6-10 kb

5-10 kb

3-6 kb

2-3 kb

1-2 kb

Brain

4000 4000 2000 1250 800 500

2000 1250 800 500

cDNA Amplified with Kapa Hifi

Changing the PCR enzyme allows for amplification of transcripts in the 5-10 kb size range from tissue samples that have significant expression of cDNAs in that size range.

Sage Science’s BluePippin Size Fractionation

Amplified cDNAs after size selection on either Sage ELF or BluePippin.

Bioanalyzer® Traces of SageELF Size-Selected cDNA from Human Brain

Example Bioanalyzer trace of four size-selected Iso-Seq libraries

Full-Length Human Tissue Transcriptomes PacBio Sequencing of Iso-Seq Libraries From 3 Human Tissues

Targeted Full-Length cDNA Sequencing

Sample Preparation Methods

Sequencing of Full-Length RT-PCR Products Shows Differential Alternative Splicing Across Three Tissues

Iso-Seq Sample Preparation Workflow 1-2 kb

Total RNA

2-3 kb

Sage ELF increases the flexibility of size selection and allows for isolation of amplified cDNAs from several hundred kb up to more than 10 kb in size.

3-6 kb

Optional PolyA Selection Re-Amplification

polyA+ RNA Reverse Transcription (Clontech SMARTer PCR cDNA Synthesis Kit)

1-2 kb

2-3 kb

3-6 kb

Full-length st 1 Strand cDNA

SMRTbell™ Template Preparation

1-2 kb

2-3 kb

3-6 kb

Large-scale Amplification

PCR Optimization

Optional Size Selection (BluePippin System)

3-6 kb

Amplified cDNA

SMRT Sequencing Size Selection (BluePippin™ System or Gel)

1-2 kb

Overview of the dataset showing numbers of transcripts of various sizes and the number of isoforms per gene

2-3 kb

3-6 kb

Full-Length Non-Redundant Transcript Sequences Brain

PacBio sequencing of full-length RT-PCR products simplifies identification of alternatively spliced isoforms and allows for relative quantification of isoform abundance.

5-10 kb Heart

Liver

RNA is converted into first strand cDNA using the Clontech SMARTer PCR cDNA Synthesis Kit followed by universal amplification. Amplified cDNA is size fractionated and converted into SMRTbell templates for sequencing on the PacBio® RS II.

Clontech® SMARTer® PCR cDNA Synthesis Kit

Size Distribution of Amplified cDNA From Multiple Tissues

Brain

Brain Heart Liver Heart

Summary and Resources Summary: • The Iso-Seq method provides full-length cDNA sequences without the need for assembly. • Improved sample prep and size-selection methods allows for sequencing of transcripts up to 10 kb. • Alternatively spliced transcripts can be easily identified from either whole transcriptome or targeted sequencing. PacBio human three tissue dataset available here: http://blog.pacificbiosciences.com/2014/10/data-release-whole-human-transcriptome.html

PacBio MCF-7 transcriptome dataset available here: Liver

http://blog.pacificbiosciences.com/2013/12/data-release-human-mcf-7-transcriptome.html

Additional information and Iso-Seq protocols: http://www.pacb.com/applications/isoseq/index.html

Details on data analysis of Iso-Seq data can be found here: https://github.com/PacificBiosciences/cDNA_primer/wiki Two examples of genes with differential alternative splicing across the three tissues

For Research Use Only. Not for use in diagnostic procedures. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seq are trademarks of Pacific Biosciences of California, Inc. All other trademarks are the property of their respective owners. © 2015 Pacific Biosciences of California, Inc. All rights reserved.