Omixon Chicago ASHI2013 2

Report 2 Downloads 19 Views
Thousand Genomes And HLA Typing By NGS:

Hid d en T r e a s u r e s In P ubl i c Sh ort Read Data A. Bérces 1, E. Major 1, S. Juhos 1, K. Rigó 1, T. Hague 1, P. Gourraud 2 Department of Neurology – Omixon Biocomputing (www.omixon.com)

Introduction One of the important goals of the 1000 Genomes (1KG) project was to find common mutations in diverse populations with the help of next generation sequencing. In case of HLA genes, NGS gives a promise to resolve phase information and open the possibilities of large-scale HLA typing.

searched for allele pairs in a way that we were optimizing for both coverage depth and coverage %. Allele pairs that contained both a high number of mapped reads and had adequate coverage of exons for both alleles at each locus were reported.

One SNP difference in MIC-A alleles

We are presenting a sufficiently fast algorithm using 1KG whole-exome Illumina data to obtain HLA types for HLA-A, B, C, DRB1 and DQB1 genes. For validation, the results of Sanger capillary sequencing based HLA typing was used for over thousand Coriell samples.

QC-failed sample

Mistypings For samples where reads are covering only some of the exons, there are too many candidates and high ambiguity. Since the targeting method was not specific to the HLA region, but intended to capture whole exome sequences, this is more the consequence of the sequencing strategy rather than the typing algorithm.

Methods Whole-exome Illumina samples were filtered for reads that can be aligned with no or very few mismatches to the collection of alleles in the IMGT/HLA database. For HLA typing these filtered reads were aligned to the exons in the IMGT/HLA reference allele sequences allowing no or very few mismatches and soft clips at read ends. After alignment, allele candidates were ranked by coverage depth and coverage % (extent of the exons covered). In the next step we filtered allele candidates using all this allele coverage data, and left only those candidates that had a high enough number of reads covering the allele. Finally, we

Reads from similar genes or pseudogenes can be another source for discordance. Some of these mistypings can be corrected by excluding reads that are mappable to more than one genes, but there will be still systematic cross-mappings.

Conclusion Concordance is around 95% for MHC-I and 90% for MHC-II Not all of the 2126 filtered samples gave reliable results, quality check (QC) measures had to be included. For example: • coverage % for exons 2 and 3 (or only for exon 2 for MHC-II genes) has to be over 80%

Com paring similar alleles

• read length has to be longer than 75 basepairs The algorithm is capable of typing other genes like MIC-A, MIC-B or KIR, although the determination of copy-number variations (e.g. for KIR) is unreliable. 6 digits precision is available for samples with at least a few thousand reads aligned to the reference allele.

Concordance and exact match

Concordance Concordance Gene

Total QC passed

Mistyped

Concordance %

HLA-A

621

30.5

95.1

HLA-B

673

34.5

94.9

HLA-C

714

32.5

95.4

HLA-DRB1

787

76.5

90.3

HLA-DQB1

817

110.5

86.5

100

100

75

75

50

50

25

25

0

0 5

16

29

44

59

76

Coverage % MHC-I concordance

MHC-I concordance

94

98

99

MHC-I exact match

MHC-I exact match

17

38

53

68

78

87

Coverage % MHC-II concordance

MHC-II concordance

Contact: 1 Omixon Biocomputing, Budapest, Hungary 2 Department of Neurology, University of California San Francisco, San Francisco, CA, USA.

95

99

100

MHC-II exact match

MHC-II exact match

See related results in our recent publication in PLOS ONE by scanning this code

Corresponding author: [email protected]

Omixon_42x36in_tabla01.indd 1

2013.11.13. 14:35