Identifying de novo mutations with GEMINI

Report 7 Downloads 82 Views
Identifying de novo mutations with GEMINI Aaron Quinlan University of Utah ! ! ! ! ! quinlanlab.org

Please refer to the following Github Gist to find each command for this session. Commands should be copy/pasted from this Gist https://gist.github.com/arq5x/9e1928638397ba45da2e#file-denovo-sh 1

Automated tools for disease inheritance models

2

Automated tools for disease inheritance models

2

Automated tools for disease inheritance models

3

Common options for disease model tools.

4

Why search for de novo mutations?

5

Brian O’Roak

High impact variants

6

Brian O’Roak

De novo mutations

7

How many de novo mutations should we expect?

8

De novo mutations (rough expectations)

9

In practice, it’s not so simple.

10

Brian O’Roak

11

Why are there so many artifacts? •

Prior probabilities - the more interesting something is, the less likely it is to be real



If something can go wrong, it will. • Incorrect genotype assignment • Low coverage in one or more of the individuals in the family (especially the parents…why?) • Mismapping • Misalignment • Paralogy • Systematic artifacts • Somatic events

!

12

Detective work with GEMINI

13

The de_novo tool in GEMINI

14 http://gemini.readthedocs.org/en/latest/content/tools.html#de-novo-identifying-potential-de-novo-mutations

Create a GEMINI database from a VCF Notes: 1. The VCF has been normalized and decomposed with VT 2. The VCF has been annotated with VEP. http://gemini.readthedocs.org/en/latest/content/preprocessing.html#step-1-split-left-align-and-trim-variants

$  curl  https://s3.amazonaws.com/gemini-­‐tutorials/trio.trim.vep.vcf.gz  >  trio.trim.vep.vcf.gz   $  curl  https://s3.amazonaws.com/gemini-­‐tutorials/denovo.ped  >  denovo.ped   $  gemini  load  -­‐-­‐cores  4  \                              -­‐v  trio.trim.vep.vcf.gz  \                              -­‐t  VEP  \                              -­‐-­‐skip-­‐gene-­‐tables  -­‐-­‐skip-­‐cadd  -­‐-­‐skip-­‐gerp-­‐bp  \                              -­‐p  de_novo.ped  \  

!

                 trio.trim.vep.denovo.db Note: copy and paste the full command from the Github Gist to avoid errors

~8 minutes 15

Normalization and decomposition are required preprocessing steps Variant normalization

Details can be found in the GEMINI documentation
 http://gemini.readthedocs.org/en/latest/ content/preprocessing.html#preprocessingand-loading-a-vcf-file-into-gemini

http://genome.sph.umich.edu/wiki/File:Normalization_mnp.png

Variant decomposition

http://genome.sph.umich.edu/wiki/Vt#Decompose 16

Running the de_novo tool  $  gemini  de_novo  trio.trim.vep.denovo.db Note: copy and paste the full command from the Github Gist

17

Information overload

There are currently 115 columns in the variants table.

Perhaps a bit of overkill for a typical analysis

18 http://gemini.readthedocs.org/en/latest/content/database_schema.html#the-variants-table

Limit the attributes returned w/ the -­‐-­‐columns option. Note: copy and paste the full command from the Github Gist

 $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        trio.trim.vep.denovo.db

19

Limit the attributes returned w/ the -­‐-­‐columns option. Note: copy and paste the full command from the Github Gist

 $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        trio.trim.vep.denovo.db

20 http://gemini.readthedocs.org/en/latest/content/tools.html#common-args-common-arguments

Better, but there are still so many (likely false) candidates.

Note: copy and paste the full command from the Github Gist

 $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        trio.trim.vep.denovo.db  |  wc  -­‐l

771  candidates!

21

Causes of erroneous genotype predictions: lack of depth

22

Let’s enforce a minimum sequence depth for each subject: -­‐d

Note: copy and paste the full command from the Github Gist

 $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        -­‐d  15  \        trio.trim.vep.denovo.db  |  wc  -­‐l

676  candidates 23

Causes of erroneous genotype predictions: low quality variants

24

Require that the mutation passes GATK QC with -­‐-­‐filter

Note: copy and paste the full command from the Github Gist

 $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        -­‐d  15  \        -­‐-­‐filter  "filter  is  NULL"  \        trio.trim.vep.denovo.db  |  wc  -­‐l

55  candidates 25

Require that the mutation is likely to have functional consequence

26

Require that the mutation is likely to have functional consequence

Note: copy and paste the full command from the Github Gist

 $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        -­‐d  15  \        -­‐-­‐filter  "filter  is  NULL  and  impact_severity  !=  ‘LOW’”  \        trio.trim.vep.denovo.db  |  wc  -­‐l

13  candidates 27

Require that the mutation is not likely to be a known polymorphism

28

Require that the mutation is not likely to be a known polymorphism

Note: copy and paste the full command from the Github Gist

 $  gemini  de_novo  \        -­‐-­‐columns  "chrom,  start,  end,  ref,  alt,  \                              filter,  qual,  gene,  impact"  \        -­‐d  15  \        -­‐-­‐filter  "filter  is  NULL  \                            and  is_coding  =  1  and  impact_severity  !=  ‘LOW’  \                            and  (aaf_1kg_eur