Supplementary Information
Scaffolding of a bacterial genome using MinION nanopore sequencing Karlsson, E. 1§, Lärkeryd, A. 1§, Sjödin, A. 1,2, Forsman, M. 1 and Stenberg, P. 1,2,3*
Swedish Defence Research Agency, Umeå, Sweden Department of Chemistry, Computational Life Science Cluster (CLiC), Umeå University, Umeå, Sweden 3 Molecular Biology, Umeå University, Umeå, Sweden § Equal contribution * Correspondence:
[email protected] 1 2
Sequencing run R7.3 (FSC996) R7 (FSC1006)
Supplementary Tables
Number of 2D reads 30423 20099
Number of bases 190148225 117183952
Mean length 6250 5830
Median length 6038 5580
Max length 34370 30578
Mapped bases 128164261 82872274
Supplementary Table S1. Sequence output (all 2D reads) from the MinION R7 (FSC1006 genome) and the R7.3 (FSC996 genome) sequencing runs.
Supplementary Figures
a
Read lengths
0.10
Fraction of total reads
0 - 4000 bp (n = 5721) 4000 - 8000 bp (n = 10436) 8000 - 30578 bp (n = 3942)
0.05
0.00
0
5
10
15
20
Read length (kbp)
b
Mapability
Cumulative fraction of reads
1.0
0 - 4000 bp (n = 5721) 4000 - 8000 bp (n = 10436) 8000 - 30578 bp (n = 3942)
0.5
0.0 0.00
0.25
0.50
0.75
1.00
Proportion of read aligned
Supplementary Figure S1. Quality of MinION (R7) sequencing reads from FSC1006. (a) Length distribution of the reads. MinION reads are divided into three length categories that are coloured separately. Note that the high number of MinION reads of about 3.5 kb originate from the ligation control fragment. (b) Mapability of MinION reads divided into the same length categories as in (a). Read alignment length is the fraction of the reads covered in the BLAST alignment against the reference genome.
MinION (FSC996)
0 4
2 G
ap
4,
2
-2 1, M
M
1,
-2
G
ap
2,
4
M M
M
M
1,
-2
G
ap
0,
2 -3
G
ap
4,
4 M
M
2,
-3
G
ap
2,
4 M
M
2,
-3
G
ap
0,
2
2, M M
M
M
1,
-1
G
ap
4,
2
2, ap
ap
G
G
-1
-1 1, M
M
1,
-2
G
ap
0,
4
2
10
4,
2
20
0.04
0.00
M
Coverage
0.06
0.02
M
M
M
1,
-2
G
ap
2,
4 G
ap
0,
2 M
M
1,
-2
G
ap
4,
4
-3 2, M M
M
M
2,
-3
G
ap
2,
4
ap G -3
-1
0,
2
4,
2
2, ap M
1,
-1 1, M M
M
M
1,
-1
G
ap
0,
4
2 G
ap
4,
2
-2 1, M M
M
M
1,
-2
G
ap
2,
4 -2
G
ap
0,
2 M
M
1,
-3
G
ap
4,
4 M
M
2,
-3
G
ap
2,
4 M
M
2,
-3
G
ap
0,
2
2, M
M
1,
-1
G
ap
4,
2
2,
0,
ap M M
M
M
1,
-1
G
ap G -1 1, M
30
0.08
0
0.00
40
0.10
10
0.02
0
0.00
20
0.04
10
0.02
M
0.06
50
0.12
1,
0.04
30
60
0.14
M
20
0.08
G
0.06
40
ap
30
0.10
G
0.08
0.12
2,
40
70
0.16
50
M
0.10
0.14
Substitutions
0.18
60
M
0.12
70
M
50
Insertions
Mean error/bp/read
0.16
Coverage
0.18
60
Coverage
Mean error/bp/read
0.14
70
M
Coverage Mean error/bp/read
0.16
Mean error/bp/read
Deletions
0.18
PacBio (FSC996) Insertions
Substitutions 29.4
0.004
29.4
0.0035
29.3
0.0035
29.3
0.0035
29.3
Coverage
29.0
0.0015
28.9
0.001
4 G
ap
4,
2 -2 M
M
1,
-2
G
ap
2,
2 1, M M
M
M
1,
-2
G
ap
0,
4 4, G
ap
2, M
M
2,
-3
G
ap
0, M
M
2,
-3
G
ap
4, -3 2, M M
M
1,
-1
G
ap
2,
2
ap G -1
1, M
M
M
M
M
1,
-1
G
ap
0,
4 ap G
-2 M M
2
28.7 4
28.8
0 4
0.0005
4,
2
29.1
0.002
1,
-2
G
ap
2,
2 1, M M
M
M
1,
-2
G
ap
0,
4 G
ap
4,
2
4
-3 2, M
M
2,
-3
G
ap
2,
4
0, ap G
-3 M
2,
-1 1, M
M
ap
2,
2 0, ap G
-1
-1
1, M
M
1, M M
29.2
0.0025
28.7
4 ap G
0.003
2
28.8
4,
2 -2 M
M
1,
-2
G
ap
2,
2 0, 1, M M
M
M
1,
-2
G
ap
4, G
ap
2, M
M
2,
-3
G
ap
0, M
M
2,
-3
G
ap
4, -3 M
M
2,
-1
G
ap
2,
0,
ap 1, M M
M
M
1,
-1
G
ap G -1 1,
4
0
2
28.7 4
0 4
0.0005
2
28.8
M
28.9
0.001
0.0005
M
Coverage
0.0015
4,
28.9
0.001
29.0
G
0.0015
0.002
ap
29.0
29.1
G
0.002
0.0025
M
29.1
29.2
M
Coverage
0.0025
0.003
2
29.2
M
0.003
Mean error/bp/read
0.004
Mean error/bp/read
29.4
2
Mean error/bp/read
Deletions 0.004
Supplementary Figure S2. Error rates and genomic coverage both vary with BLAST parameters. Mean error rates (deletions, insertions and substitutions) per base pair per read and genomic coverage (calculated as the summed aligned length of all reads divided by the genome size) after mapping MinION (R7.3) and PacBio reads to the FSC996 reference genome using different BLAST parameters. MM=match and mismatch scores and Gap=gap opening and gap extension penalties. Note that for match and mismatch scores of 2 and -3 respectively, a gap opening penalty of 0 combined with a gap extension penalty of 2 is not allowed by BLAST. Therefore a gap extension penalty of 4 was used instead.
Basecalling errors (Genome)
0.15
Sequence MinION (R7.3) - FSC996
0.10
PacBio - FSC1006 PacBio - FSC996
Deletions
5 00
3 00
0. 0
0. 0
0
3
01
01
Insertions
Error type
0. 0
0. 0
00 0.
00 0.
0.00
33
0.05
32
Mean error/bp/read
MinION (R7) - FSC1006
Substitutions
Supplementary Figure S3. Error rates in the sequence reads generated by the two MinION (R7 and R7.3) and PacBio runs. Mean error rates (deletions, insertions and substitutions) per base pair per read across the FSC996 and FSC1006 genomes are shown.
0.25
Error rates in all 2D reads
Mean error/bp/read
0.20
Error type 0.15
Indels Substitutions
0.10
0.05
0.00
Passed 2D reads
Failed 2D reads
Supplementary Figure S4. Boxplot showing the difference in rates of Indels and substitutions between 2D MinION reads (R7.3) that passed and failed quality filtering. Thick black lines and boxes indicate median values and the 25th to 75th quartile range, respectively. Whiskers represent 1.5x the inter-quartile range and black dots denote outliers.
MinION Basecalling errors
0.25
Region Genome GC
Mean error/bp/read
0.20
MonoA repeats MonoT repeats
0.15
MonoG repeats MonoC repeats
0.10
Dinucleotide repeats Nonanucleotide repeats
0.05
0.00
Deletions
Insertions
Error type
Substitutions
Supplementary Figure S5. Error rates within different genomic regions in the sequence reads generated by MinION (R7.3) sequencing. Mean error rates (deletions, insertions and substitutions) per base pair per read in the genome (32% GC), high GC-regions (47.8% GC), monomer repeats (A, T, G and C), dimer repeats and nonamer repeats. All repeats are at least five repeat units long.