Satellite DNA genomic structure and heterochromatin

Report 0 Downloads 74 Views
Satellite DNA genomic structure and heterochromatin organization Amanda Larracuente Department of Biology

Satellite DNA

•  Chromosome segregation •  Heterochromatin

formation

•  Nuclear organization

http://www.chrombios.com

Satellite DNA: misregulation •  Genomic instability and cancer Zhu et al. 2011, Ting et al. 2011, Shamas 2011

•  Senescence and Aging e.g. Swanson et al. 2013

•  Chromosome mis-segregation Rosic et al. 2014

Satellite DNA: rapid evolution •  Rapid turnover between species

•  Genetic incompatibilities

Ferree and Prasad 2011

Why don’t we know more about satellite DNA? •  Low recombination •  Difficult genomics 1. Underrepresented among Sanger reads 2. Assembly difficult/impossible •  Mutation dynamics not well understood

Outline I.  Heterochromatin organization

II.  Detailed satDNA structure

Drosophila genomes

2 Mya

0.24 Mya

Photo: A. Karwath

simulans clade

Drosophila genomes

2 Mya

0.24 Mya

Photo: A. Karwath

P5C3

~95X

Kim et al. 2014

Drosophila genomes

2 Mya

0.24 Mya

Photo: A. Karwath

P6C4 ~115X

~120X

~85X

~95X

simulans clade assemblies I.

Canu

+

Hybrid DBG2OLC

Canu: Koren et al. 2016

quickmerge

Merged-1

Chakraborty et al. 2016 NAR Mahul Chakraborty , Ching-Ho Chang

simulans clade assemblies I.

Canu

+

Hybrid DBG2OLC

II.

Merged-1

+

quickmerge

MHAP

quickmerge

Merged-1

Merged-2

Quiver x2 Pilon

III. MHAP: Berlin et al. 2015 Canu: Koren et al. 2016

Final Chakraborty et al. 2016 NAR Mahul Chakraborty , Ching-Ho Chang

Assembly contiguity 1.5e+08

1.5e+0

Strain Dmel

1.0e+08

Data Source Genbank Contig Genbank Scaffold

5.0e+07

Cumulative contig length

Cumulative contig length

NG50 = 21.3 Mb

0.0e+00

1.0e+0

5.0e+0

0.0e+00 0

50

100

Contig/Scaffold rank

150

Mahul Chakraborty

Contig Scaffold ontig

Cumul

5.0e+07

Genbank Scaffold PacBio Contig Illumina Contig

Assembly contiguity

0.0e+00

0

50

100

Contig/Scaffold rank

150

1.5e+08

Cumulative contig length

Scaffold

Strain Dmel Dsim Dsec Dmau

1.0e+08

Data Source Genbank Contig Genbank Scaffold

5.0e+07

Illumina Contig

0.0e+00 0

50

100

Contig/Scaffold rank

150

Mahul Chakraborty

Assembly contiguity Dsim NG50 = 22.9Mb Dmau NG50 = 22.3 Mb

1.5e+08

Contig Scaffold

Cumulative contig length

Dsec NG50 = 21.2 Mb Strain Dmel Dsim Dsec Dmau

1.0e+08

Data Source Genbank Contig Genbank Scaffold PacBio Contig Illumina Contig

5.0e+07

0.0e+00 0

50

100

Contig/Scaffold rank

150

Mahul Chakraborty

Genome organization Few structural rearrangements

2L

X

2R Dmau Dsim Dmel

Dmau Dsim Dmel

Dmau Dsim Dmel

3L

4

3R

Dmau Dsim Dmel

Ching-Ho Chang Emerson Khost

Heterochromatin organization Pericentric gene-containing regions rearranged

2L

X

2R Dmau Dsim Dmel

Dmau Dsim Dmel

Dmau Dsim Dmel

3L

4

3R

Dmau Dsim Dmel

Ching-Ho Chang Emerson Khost

Heterochromatin organization Cytogenetic maps: FISH

2

Y 4

2

2 X

Y

Y 4

3

4

X

3X

3

2

2

2

X

4 X 3

2

2

4

3

Time (Mya)

Time (Mya) Time (Mya)

4

4

3

3

0.2

4

2

4

4

3

2

Rsp AAGAG 2 3

3

3

X

X

X

D. melanogaster D. melanogaster D. melanogaster D. sechellia D. sechellia D. sechellia D. simulans D. mauritiana D. simulans D. simulans D. mauritiana D. mauritiana Dmau 2L

X

Dsim Dmel

2R Dmau Dsim Dmel

3L

4

Dmau Dsim Dmel

0.2

Y

Y

2

3

X

Y

X

X 4

4X

2

D. mauritiana 0.2

D. simulans

D. melanogaster

2

2

Dmau Dsim Dmel

3R

Complex satellite DNA famlilies Rsp

Rsp-like

1.688

Ching-Ho Chang Emerson Khost

Underrepresented heterochromatin Dmau

Sequenced males

600

600

count count

Expected: ~100X for autosomes ~ 50X for X ~ 50X for Y

400

400

200

200

0

Ching-Ho Chang

0

0

50

0

50

100

coverage coverage

100

150 150

Underrepresented heterochromatin Dmau Region

Sequenced males

A

600

600

count count

Expected: ~100X for autosomes ~ 50X for X ~ 50X for Y

U

*

X

Y X

Y

400

*

Region A

400

U X Y

Observed: ~100X for autosomes ~ 50X for X ~ 31X for Y ~ 29X for U

200

200

0

Ching-Ho Chang

0

0

0 *P < 10-16 MWU

50 50

coverage coverage

100

150

100

150

Outline I.  Heterochromatin organization –  Genes in pericentric regions reorganized –  satDNA reorganization –  Biased heterochromatic read recovery

II.  Detailed satDNA structure

Outline I.  Heterochromatin organization –  Genes in pericentric regions reorganized –  satDNA reorganization –  Biased heterochromatic read recovery

II.  Detailed satDNA structure

Responder (Rsp) satellite

D. melanogaster Larracuente 2014

Responder (Rsp) satellite

Dimeric Structure Left

Right

120 bp

120 bp

Wu et al. 1988

Dynamic evolution Rsp

Ch 2

Ch 2 Ch 3

Ch X

Lost?

Larracuente 2014

Rapid evolution of satellite DNA •  Natural selection? •  Neutral? Unequal crossing-over and gene conversion Expansion

Contraction

Smith 1976; Dover 1982; Charlesworth et al. 1994; Stephan 1986

Satellite DNA assembly D. melanogaster ~95X PacBio Kim et al. 2014 Ø  PBcR-BLASR

Phillipy, Koren

Cel8.1

Ø  MHAP

Berlin et al. 2015

Cel8.2, Cel 8.3

Ø  Canu

Koren et al. 2016

Photo: A. Karwath

Satellite DNA assembly D. melanogaster ~95X PacBio Kim et al. 2014 Ø  PBcR-BLASR

Phillipy, Koren

Cel8.1

Ø  MHAP

Berlin et al. 2015

Cel8.2, Cel 8.3

Ø  Canu

Koren et al. 2016

Photo: A. Karwath

•  Run over a grid of parameter values •  Quiver + Pilon

Comparing assemblies

Assembly name

#Rsp Rsp # Rsp contigs score

R6.03 343 PBcR BLASR 1088 MHAP_16_1500_20X 1260 Canu 4% 1114

9 3 4 3

38.1 362.7 315.0 371

Khost, Eickbush & Larracuente BioRxiv 2016

Assembly validation

Ø  Read coverage Ø  Long PCR Ø  Genomic southern blots Ø  Pulse field gels

Khost, Eickbush & Larracuente BioRxiv 2016

Comparing assemblies

Assembly name

#Rsp Rsp # Rsp contigs score

R6.03 343 PBcR BLASR 1088 MHAP_16_1500_20X 1260 Canu 4% 1114

9 3 4 3

38.1 362.7 315.0 371

Khost, Eickbush & Larracuente BioRxiv 2016

1 G 5_

5_ 2 G

g fra

5_ 3 G

G

Bari1 Helitron Jockey LTR Retrotransposon Mariner.Tc1 Non-LTR Retrotransposon Rsp Left Rsp Right Rsp Trunc Rsp Variant Simple repeat Transib

30

Counts

20

10

0

5_

5_ 4

5 G 5_

6 G 5_

7 G 5_

G

2

Rsp satellite

G

(A AG

AG ) n

Detailed organization of Rsp satellite

0

0

100

Position (kb)

200

300

Khost, Eickbush Larracuente BioRxiv 2016

1 G 5_

5_ 2 G

g fra

5_ 3 G

G

5_

5_ 4

5 G 5_

6 G 5_

7 G 5_

G

2

Rsp satellite

G

(A AG

AG ) n

Detailed organization of Rsp satellite Concerted evolution: Unequal exchange and gene conversion Bari1 Helitron Jockey LTR Retrotransposon Mariner.Tc1 Non-LTR Retrotransposon Rsp Left Rsp Right Rsp Trunc Rsp Variant Simple repeat Transib

30

Counts

20

10

0

0

0

100

Position (kb)

200

300

Khost, Eickbush Larracuente BioRxiv 2016

Summary

1  pixels

–  Heterochromatin organization –  Genomics of satellite structure

Summary

1  pixels

–  Heterochromatin organization –  Genomics of satellite structure –  Detailed evolutionary history of satellites

Evolutionary dynamics within populations Sequence diversity and abundance variation across populations 3000 2500

Count

2000 1500

1000 750

1000

500

500

250 0

Population

bw e

a Zi

m

ba

an i m Ta s

h ig le Ra

nd er la th Ne

a ac Ith

ng iji Be

ba

bw e

ia Zi m

Ta sm an

h ig le Ra

nd Ne

th

er

la

a ac Ith

Be

iji

ng

s

s

0

Population

Data: DGRP & GDL Illumina genomes from MacKay et al. 2012, Grenier et al. 2015

Summary

1  pixels

–  Heterochromatin organization –  Genomics of satellite structure –  Detailed evolutionary history of satellites –  Functional genomics

3000 2500

1500

1000 750

1000

500

500

250

ia an m

ba bw e

Zi m

gh

Population

Ta s

le i Ra

la n

ng

ac a

er th Ne

Ith

iji Be

ia an m

gh

ba bw e m

Zi

Ta s

ds a

la n

le i Ra

Ne th

ng

ac

iji

Ith

er

Population

ds

0

0

Be

Count

2000

Satellite expression and regulation Rsp satellite

+ strand - strand

ovary small RNAs Data from: Pane et al. 2011

Emerson Khost

Summary

1  pixels

–  Heterochromatin organization –  Genomics of satellite structure –  Detailed evolutionary history of satellites –  Functional genomics –  Experimental manipulation 3000 2500

1500

1000 750

1000

500

500

250

ia an m

ba bw e

Zi m

gh

Population

Ta s

le i Ra

la n

ng

ac a

er th Ne

Ith

iji Be

ia an m

gh

ba bw e m

Zi

Ta s

ds a

la n

le i Ra

Ne th

ng

ac

iji

Ith

er

Population

ds

0

0

Be

Count

2000

Acknowledgments sim clade genomes

Complex satellites

Ching-Ho Chang (U. Rochester) Mahul Chakorbharty (UC Irvine) J.J. Emerson (UC Irvine) Kristi Montooth (U Nebraska) Colin Meiklejohn (U Nebraska)

Emerson Khost (U. Rochester) Danna Eickbush (U. Rochester)

UR Center for Integrated Research Computing

Jeffrey Vedenayagam (NYU)

Funding

Join our group! Positions open in evolutionary genomics Ø  NIH-funded postdoc Ø  Graduate students http://blogs.rochester.edu/Larracuente

260-bp satellite organization 1.688 family satellite Helitron Jockey LTR Retrotransposon Loa Mariner.Tc1 Non-LTR Retrotransposon Simple repeat

15

Counts

10

5

0 600

700

Position (Kb)

800

900

Khost, Eickbush Larracuente BioRXiv 2016

Y chromosome 40 Mb

D. melanogaster reference: ~4 Mb D. simulans: ~30 Mb Ching-Ho Chang

Rapid evolution of satellite DNA: Neutral? •  Intrinsic mutational properties •  Unequal crossing-over and gene conversion Expansion

Contraction

Smith 1976; Dover 1982; Charlesworth et al. 1994; Stephan 1986

Rapid evolution of satellite DNA: Selection? •  Intragenomic conflict over germline transmission •  Target of male drive 95 % Sandler et al. 1959

Rapid evolution of satellite DNA: Selection? •  Intragenomic conflict over germline transmission •  Target of male drive •  Female meiotic drive (centromere drive)

>50 % < 50 %

Walker 1971; Henikoff et al. 2001; Malik and Henikoff 2001

What can we learn from this assembly? •  Variation •  Recombination: Sequence diversity and abundance What contributes to differences between individuals? 3000

2000 1500 1000

Ral_208

Ral_380

Ral_379

Ral_391

Ral_362

Ral_313

Ral_350

Ral_427

Ral_399

Ral_358

Ral_40

Ral_437

Ral_375

500

Ral_357

copy #

2500

1 5_ G

G

5_ 3

G

g

fra

G

G

Bari1 Helitron Jockey LTR Retrotransposon Mariner.Tc1 Non-LTR Retrotranspo Rsp Left Rsp Right Rsp Trunc Rsp Variant Simple repeat Transib

30

Counts

20

10

0

5_

5_ 4

5

5_

G

6

5_

G

G

5_

7

2

G

(A AG

AG ) n

•  Variation •  Recombination: Sequence diversity and abundance

5_ 2

What can we learn from this assembly?

0

0

100

Position (kb)

200

300

30

Counts

Unequal exchange in array center 20 Expansion

10

0

0

1 5_

Bari1 Helitron Jockey LTR Retrotransposon Mariner.Tc1 Non-LTR Retrotranspo Rsp Left Rsp Right Rsp Trunc Rsp Variant Simple repeat Transib

Contraction 0

G

G

5_ 3

G

g

fra

G

G

5_

5_ 4

5

5_

G

6

5_

G

G

5_

7

2

G

(A AG

AG ) n

•  Variation •  Recombination: Sequence diversity and abundance

5_ 2

What can we learn from this assembly?

100

Position (kb)

200

300

1 G 5_

5_ 2 G

G

5_ 3

g fra

5_ 1 G

2 G

5_

3 5_ G

G

5_

4 5_ G

30 20 20

Counts

Counts

fra g

G

G

5_ 5 G

5_ 6 G

G

5_ 7

2 G

n

AG

)

(A AG (A AG

Bari1 Helitron Jockey Bari1 LTR Retrotransposon Helitron Mariner.Tc1 Jockey Non-LTR Retrotransposon LTR Retrotransposon Rsp Left Mariner.Tc1 Rsp Right Non-LTR Retrotransposon Rsp Trunc Rsp Left Rsp Variant Rsp Right Simple repeat Rsp Trunc Transib Rsp Variant Simple repeat Transib

30

10 10

0

5_

5_ 4

5 G 5_

6 G 5_

7 G 5_

G

AG ) n

2

Detailed organization of Rsp satellite

0

100

0 0

Position (kb)

200

0

0

100

Position (kb)

300

Khost, Eickbush Larracuente BioRxiv 2016 200

300

D. melanogaster complex satellites Rsp 1.688

Emerson Khost

Unusual locus composition in D. melanogaster D. sechellia

D. simulans

Psec = 0.15

Psim = 0.18

D. mauritiana

Pmau = 0.45

Larracuente 2014; Anthony Geneva, methods: Blomberg et al 2003

Genome data from MacKay et al. 2012 Ral_208

Ral_380

Ral_379

Ral_391

Ral_362

Ral_313

Ral_350

Ral_427

Ral_399

Ral_358

Ral_40

Ral_437

2500

Ral_375

Ral_357

Rsp copy #

Rsp abundance variation in D. melanogaster

3000

R=0.93

2000

1500

1000

500

Copy number variation in global populations Rsp 3000 2500

1500

260-bp

1000 750

1000

500

500

250 0

Population

Population

bw e

ia Zi

m

ba

an m Ta s

h ig le Ra

nd la er Ne th

a ac Ith

g iji n

ba Zi m

Be

bw e

ia an m Ta s

h ig le Ra

nd Ne

th

er

la

a ac Ith

iji

ng

s

s

0

Be

Count

2000

Rsp satellite activity ovary small RNAs

+ strand - strand

Data from: Pane et al. 2011

Emerson Khost

Y chromosome D. melanogaster

•  ~40 Mb •  Heterochromatic •  ~20 genes

A

Concerted evolution

B

Rsp locus

0

170

Relative locus position (kb) Left Right

Variant/Trunc

7

B

170 Left Right Variant/Trunc

Concerted evolution 260-bp locus

75

Relative locus position (kb)

0