Identifying dominant gene candidates with GEMINI Aaron Quinlan University of Utah ! ! ! ! ! quinlanlab.org
Please refer to the following Github Gist to find each command for this session. Commands should be copy/pasted from this Gist https://gist.github.com/arq5x/9e1928638397ba45da2e#file-autosomal-dominant-sh 1
Autosomal dominant pedigrees
2
Jessica Chong
Create a PED file (done for you)
3
Jessica Chong
Create a GEMINI database from a VCF
Notes: 1. The VCF has been normalized and decomposed with VT 2. The VCF has been annotated with VEP.
http://gemini.readthedocs.org/en/latest/content/preprocessing.html#step-1-split-left-align-and-trim-variants $ curl https://s3.amazonaws.com/gemini-‐tutorials/trio.trim.vep.vcf.gz > trio.trim.vep.vcf.gz $ curl https://s3.amazonaws.com/gemini-‐tutorials/dominant.ped > dominant.ped $ gemini load -‐-‐cores 4 \ -‐v trio.trim.vep.vcf.gz \ -‐t VEP \ -‐-‐skip-‐gene-‐tables \ -‐p dominant.ped \
!
trio.trim.vep.dominant.db Note: copy and paste the full command from the Github Gist to avoid errors
4
Running the autosomal_dominant tool
$ gemini autosomal_dominant --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ trio.trim.vep.dominant.db \ | head \ | column -t Note: copy and paste the full command from the Github Gist
chrom start end ref alt gene impact cadd_raw variant_id family_id family_members family_genotypes samples family_count chr15 59508889 59508890 T C AC092756.1 mature_miRNA -‐0.78 8480 family1 1805,1847,4805 T/C,T/T,T/C 1805,4805 1 chr15 84868493 84868494 G C AC103965.1 downstream -‐0.37 9360 family1 1805,1847,4805 G/C,G/G,G/C 1805,4805 1 chr15 75966884 75966885 G A AC105020.1 upstream -‐0.74 8906 family1 1805,1847,4805 G/A,G/G,G/A 1805,4805 1 chr15 84945184 84945185 C T AC136704.1 upstream 0.01 9370 family1 1805,1847,4805 C/T,C/C,C/T 1805,4805 1 chr15 84945511 84945512 C T AC136704.1 upstream 0.74 9371 family1 1805,1847,4805 C/T,C/C,C/T 1805,4805 1 chr15 84949950 84949951 C G AC136704.1 mature_miRNA -‐0.43 9378 family1 1805,1847,4805 C/G,C/C,C/G 1805,4805 1 chr15 89398552 89398553 C A ACAN non_syn_coding 2.57 9519 family1 1805,1847,4805 C/A,C/C,C/A 1805,4805 1 chr15 100649247 100649248 G A ADAMTS17 synonymous_coding 1.51 9856 family1 1805,1847,4805 G/A,G/G,G/A 1805,4805 1 chr15 100692844 100692845 A G ADAMTS17 non_syn_coding -‐1.16 9858 family1 1805,1847,4805 A/G,A/A,A/G 1805,4805 1
5
Time to start filtering
$ gemini autosomal_dominant \ --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ trio.trim.vep.dominant.db \ | wc -l Note: copy and paste the full command from the Github Gist
1515 candidates
6
Again, use the -‐-‐filter option Exclude variants that failed GATK filters
$ gemini autosomal_dominant \ --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ --filter "filter is NULL" \ trio.trim.vep.dominant.db \ | wc -l Note: copy and paste the full command from the Github Gist
1288 candidates
7
Further restrict variants with functional consequence $ gemini autosomal_dominant \ --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ --filter "filter is NULL and impact_severity != 'LOW'" \ trio.trim.vep.dominant.db \ | wc -l Note: copy and paste the full command from the Github Gist
347 candidates
8
Use ESP and ExAC to focus on rare variants $ gemini autosomal_dominant \ --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ --filter "filter is NULL and impact_severity != 'LOW' and (aaf_esp_ea =20
Note: copy and paste the full command from the Github Gist
chrom start end ref alt gene impact gts.1805 gts.1847 gts.4805 gt_depths.1805 gt_depths.1847 gt_depths.4805 chr2 229933 229934 A G SH3YL1 intron A/G A/A A/G 161 225 231 chr2 277249 277250 G A ACP1 downstream G/A G/G G/A 183 237 234 chr2 905441 905442 A T AC113607.2 upstream A/T A/A A/T 90 142 246 chr2 1636903 1636904 C T PXDN UTR_3_prime C/T C/C C/T 54 87 174 chr2 1642789 1642790 T C PXDN intron T/C T/T T/C 74 120 179 chr2 3653841 3653842 C A COLEC11 intron C/A C/C C/A 123 223 199 chr2 3660868 3660869 G A COLEC11 intron G/A G/G G/A 76 96 250 chr2 3661001 3661002 G A COLEC11 intron G/A G/G G/A 70 86 182 chr2 6879884 6879885 T A LINC00487 intron T/A T/T T/A 83 110 106
17
We have the inheritance model, now apply the annotation filters $ gemini query \ -q "SELECT chrom, start, end, ref, alt, gene, impact, \ (gts).(*), (gt_depths).(*) \ FROM variants \ WHERE filter is NULL and impact_severity == 'HIGH' and (aaf_esp_ea