Single Molecule, Real-Time Sequencing of Full-length cDNA Transcripts Uncovers Novel Alternatively Spliced Isoforms Tyson A. Clark, Ting Hon, and Elizabeth Tseng Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025
Abstract
Sample Prep Improvements
SageELF™ Size Fractionation
In higher eukaryotic organisms, the majority of multiexon genes are alternatively spliced. Different mRNA isoforms from the same gene can produce proteins that have distinct properties such as structure, function, or subcellular localization. Thus, the importance of understanding the full complement of transcript isoforms with potential phenotypic impact cannot be underscored. While microarrays and other NGS-based methods have become useful for studying transcriptomes, these technologies yield short, fragmented transcripts that remain a challenge for accurate, complete reconstruction of splice variants.
Protocol Adjustments Improve Representation of Longer Transcripts
SageELF Allows For Collection of cDNA Molecules in 12 Fractions Across the Entire Size Distribution
The Iso-Seq™ protocol developed at PacBio offers the only solution for direct sequencing of full-length, singlemolecule cDNA sequences to survey transcriptome isoform diversity useful for gene discovery and annotation. Knowledge of the complete isoform repertoire is also key for accurate quantification of isoform abundance. As most transcripts range from 1 – 10 kb, fully intact RNA molecules can be sequenced using SMRT® Sequencing (avg. read length: 10-15 kb) without requiring fragmentation or post-sequencing assembly. Our open-source computational pipeline delivers high-quality, non-redundant sequences for unambiguous identification of alternative splicing events, alternative transcriptional start sites, polyA tail, and gene fusion events. The standard Iso-Seq protocol workflow available for all researchers is presented using a deep dataset of fulllength cDNA sequences from the MCF-7 cancer cell line, and multiple tissues (brain, heart, and liver). Detected novel transcripts approaching 10 kb and alternative splicing events are highlighted. Even in extensively profiled samples, the method uncovered large numbers of novel alternatively spliced isoforms and previously unannotated genes.
Phusion Kapa Hifi SeqAmp
5-10 kb
3-6 kb
2-3 kb
1-2 kb
Liver 8-12 kb
5-10 kb
3-6 kb
2-3 kb
1-2 kb
10-15 kb
Heart 8-12 kb
6-10 kb
5-10 kb
3-6 kb
2-3 kb
1-2 kb
Brain
4000 4000 2000 1250 800 500
2000 1250 800 500
cDNA Amplified with Kapa Hifi
Changing the PCR enzyme allows for amplification of transcripts in the 5-10 kb size range from tissue samples that have significant expression of cDNAs in that size range.
Sage Science’s BluePippin Size Fractionation
Amplified cDNAs after size selection on either Sage ELF or BluePippin.
Bioanalyzer® Traces of SageELF Size-Selected cDNA from Human Brain
Example Bioanalyzer trace of four size-selected Iso-Seq libraries
Full-Length Human Tissue Transcriptomes PacBio Sequencing of Iso-Seq Libraries From 3 Human Tissues
Targeted Full-Length cDNA Sequencing
Sample Preparation Methods
Sequencing of Full-Length RT-PCR Products Shows Differential Alternative Splicing Across Three Tissues
Iso-Seq Sample Preparation Workflow 1-2 kb
Total RNA
2-3 kb
Sage ELF increases the flexibility of size selection and allows for isolation of amplified cDNAs from several hundred kb up to more than 10 kb in size.
3-6 kb
Optional PolyA Selection Re-Amplification
polyA+ RNA Reverse Transcription (Clontech SMARTer PCR cDNA Synthesis Kit)
1-2 kb
2-3 kb
3-6 kb
Full-length st 1 Strand cDNA
SMRTbell™ Template Preparation
1-2 kb
2-3 kb
3-6 kb
Large-scale Amplification
PCR Optimization
Optional Size Selection (BluePippin System)
3-6 kb
Amplified cDNA
SMRT Sequencing Size Selection (BluePippin™ System or Gel)
1-2 kb
Overview of the dataset showing numbers of transcripts of various sizes and the number of isoforms per gene
2-3 kb
3-6 kb
Full-Length Non-Redundant Transcript Sequences Brain
PacBio sequencing of full-length RT-PCR products simplifies identification of alternatively spliced isoforms and allows for relative quantification of isoform abundance.
5-10 kb Heart
Liver
RNA is converted into first strand cDNA using the Clontech SMARTer PCR cDNA Synthesis Kit followed by universal amplification. Amplified cDNA is size fractionated and converted into SMRTbell templates for sequencing on the PacBio® RS II.
Clontech® SMARTer® PCR cDNA Synthesis Kit
Size Distribution of Amplified cDNA From Multiple Tissues
Brain
Brain Heart Liver Heart
Summary and Resources Summary: • The Iso-Seq method provides full-length cDNA sequences without the need for assembly. • Improved sample prep and size-selection methods allows for sequencing of transcripts up to 10 kb. • Alternatively spliced transcripts can be easily identified from either whole transcriptome or targeted sequencing. PacBio human three tissue dataset available here: http://blog.pacificbiosciences.com/2014/10/data-release-whole-human-transcriptome.html
PacBio MCF-7 transcriptome dataset available here: Liver
http://blog.pacificbiosciences.com/2013/12/data-release-human-mcf-7-transcriptome.html
Additional information and Iso-Seq protocols: http://www.pacb.com/applications/isoseq/index.html
Details on data analysis of Iso-Seq data can be found here: https://github.com/PacificBiosciences/cDNA_primer/wiki Two examples of genes with differential alternative splicing across the three tissues
For Research Use Only. Not for use in diagnostic procedures. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seq are trademarks of Pacific Biosciences of California, Inc. All other trademarks are the property of their respective owners. © 2015 Pacific Biosciences of California, Inc. All rights reserved.