A Next Generation Sequencing Panel for DNA Typing of Challenging Samples
Magdalena Bus Dept. of Immunology, Genetics and Pathology Science for Life Laboratory Uppsala University, Sweden
Challenging samples
Bones/teeth:
• Low Copy Number
Viking Age boat graves
• Highly degraded
Vasa warship
• Contaminated
Viking Age mass graves
• Limited samples
Historical Persons
Old Uppsala (Sweden), a Viking Age boat grave from 9th or 10th century.
Why Massively Parallel Sequencing?
Sanger sequencing of mtDNA (SNPs) low throughput high cost Fluorescent-based CE-STR typing detection of DNA fragment size – SNP variants cannot be detected loss of larger size loci Analyses of mixed DNA samples – challenge!
1
Why Massively Parallel Sequencing? • Thousands to millions DNA targets can be sequenced in parallel • Simultaneous analysis of multiple loci on autosomes, sex chromosomes, and the entire mtGenome • Different type of markers: SNPs (single or microhaplotypes), STRs and InDels using the same technology • Many markers - obtain enough partial data even for highly degraded samples
Commercially available NGS kits • HID-Ion AmpliSeqTM Ancestry Panel - Ion PGMTM System, Ion Torrent HID-Ion AmpliSeqTM Ancestry Panel – 165 autosomal Ancestry Informative Markers (AIMs); amplicon size range 120-130 bp • ForenSeqTM DNA Signature Prep Kit – Illumina MiSeq 27 global autosomal STRs, 7 X-STRs, 24 Y-STRs, 94 identity SNPs, 22 phenotypic SNPs, biogeographical SNPS; amplicon size range 61-462 bp
The aims: Simultaneous analysis of multiple loci in a single panel – SNPs, STRs, InDels
Analysis of nuclear DNA and mitochondrial DNA in the same panel with correction for copy number differences Development of a MPS panel for DNA extracted from historical, limited and highly degraded samples
2
Panel design Target capture approach
Web-based tool for custom design of probepanels
Not PCR-based targeting – less bias Design parameters: - 150 bp paired-end reads - Illumina MiSeq - for FFPE samples (degraded DNA) down to 50 bp fragments
www.agilent.com
Description of panels Panel 1: nuclear DNA markers Most SNPs from the ALFRED database 34-plex SNP for ID 52-plex SNP for ID 86-plex IISNPs 40 X- and Y-SNPs 39 Eye- and hair color SNPs 135 SNPs from HID-Ion AmpliSeqTM panel
The ALlele FREquency Database
http://alfred.med.yale.edu/alfred /snpSets.asp
>300 SNPs
30 InDels
– Individual identification – Ancestry information – Eye- and hair color prediction
13 autosomal short STR targets
Panel 2: entire mitochondrial genome
DNA extraction Control samples: high quality and quantity of DNA for evaluation Aged samples: bones, teeth DNA concentration: 0.233 – 4.680 ng/μL
Sequencing (Illumina) MiSeq
Template preparation
1. Digestion of genomic DNA with restriction enzymes
2. HaloPlex Target Enrichment: • Incorporation of indexes and Illumina seq motifs and gDNA fragment circularization • Capture target DNA-probe hybrids
4. PCR amplification
Primer 1
Seq motif
Target DNA
Seq motif
Index
Primer 2
3
Control samples, two HaloPlex panels - mixed in different ratios Sample A11 B8 C10 E14 F15 F20 G6 I8 J1 J5 K20
Increased information microhaplotypes 6 samples 5 haplotypes
Mixture analysis – epithelial cells and sperm cells in an unknown ratio
Epithelial cells (XX) Sperm cells (XY)
Mixture (XX/XY)
An average coverage: 130 reads
5
HVII in three samples
3 samples 3 different mtDNA haplotypes
Positions 150/152/153
73 A/G
Challenging Viking-age samples (updated panel with more than 900 nSNPs) Sample ID
DNA concentration
P
0.233 ng/µL (total 14 ng)
S
2.19 ng/µL (total 131 ng)
I
0.982 ng/µL (total 59 ng)
Required concentration
5 ng/µL (250 ng in total)
Maximum coverage mtDNA
# nDNA SNPs/ range of coverage/ average coverage
31
158 10 – 433 42.7
395
99 10 – 236 56.3
28
92 10 – 313 39.3
Sample P
At least 90 nDNA SNPs STRs – very low coverage Predicted to have had blue eyes, light hair
Damage of single bases: Sanger sequencing vs. MPS of mtDNA
6
Conclusions • MPS using HaloPlex and MiSeq is promising for challenging sample analysis • Test on high quality DNA - 5 ng/µL (225 ng) – coverage of > 200 reads for most targets • Over 90 nDNA SNPs detected for suboptimal input and highly degraded DNA – ”only” 10 %, ”only” 14 ng
• Nuclear and mitochondrial in the same panel for limited samples – promising strategy to save material, flexible • More effort for improving the methodology needed to get higher coverage