Multimax

Report 6 Downloads 23 Views
Building Better Trees

Victor Hanson-Smith 2,3 and Joe Thornton

1,2

1 Dept.of Computer and Information Science, Univ. of Oregon 2 Center for Ecology and Evolutionary Biology, Univ. of Oregon 3 Howard Hughes Medical Institute

Modern evolutionary biology relies on building trees (phylogenies) from molecular sequence data using models of molecular evolution. These models include many free parameters that must be optimized. Virtually everyone optimizes their phylogeny using Unimax, but here we show that Multimax is more accurate---and more computationally demanding.

L(x)

L(x,y)

x

Mulitmax moves multidimensionally across the parameter landscape and maximizes all parameters simultaneously. We implemented Multimax using a conjugate-gradient method [2].

Download our software: http://markov.uoregon.edu/software/phyesta



200

mineralocorticoid receptors androgen receptors

Multimax

Cyprinus carpio Carassius auratus Spinibarbus denticulatus Oncorhynchus mykiss Danio rerio

0.82 0.00 0.99 0.28

progesterone receptors

Acanthopagrus schlegelii 0.83 Oreochromis niloticus Kryptolebias marmoratus Salmo salar

0.89

Micropogonias undulatus

0.38 0.42 0.44 0.41 0.48 0.49 0.52

estrogen ‘b’ receptors estrogen-related receptors

0.60

0.5

Unimax lnL = -33228

0.62

Multimax lnL = -33225

-32000

log(L) 0

seconds

15000

Pagrus major Acanthopagrus schlegelii Micropogonias undulatus Dicentrarchus labrax Gasterosteus aculeatus b Halichoeres trimaculatus Haplochromis burtoni b Gambusia affinis Kryptolebias marmoratus Oryzias latipes Oreochromis niloticus Anguilla japonica b Oncorhynchus mykiss b2 Oncorhynchus mykiss a2 Pimephales promelas Danio rerio Carassius auratus Garfish Gambusia affinis a Haplochromis burtoni Oreochromis niloticus Anguilla japonica

0.48

Xenopus laevis Serinus canaria 0.40 Manacus vitellinus Taeniopygia guttata Coturnix japonica Sus scrofa Rattus norvegicus Oryctolagus cuniculus Homo sapiens Crocuta crocuta Canis lupus familiaris

0.45

Leucoraja erinacea

Unimax

Steroid hormone receptor neighbor topology estimation, after one round of branch length optimization:

300

branch support: approximate likelihood ratio test chi-squared values.

estrogen ‘a’ receptors

Sparus aurata Acanthopagrus schlegelii Paralichthys olivaceus 0.99 Oryzias latipes Oreochromis niloticus 0.85 Kryptolebias marmoratus 0.91 Salmo salar

-34000

Phylogeny of 238 steroid hormone receptors alignment contains 359 amino acid sites best-fit model = JTT + 8 gamma cats.

glucocorticoid receptors

Micropogonias undulatus 0.83 0.22 Oryzias latipes Paralichthys olivaceus 0.87 Sparus aurata 1.0 0.99

xc

Multimax: Multidimensional Maximization

●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ●● ● ●● ● ● ● ● ●● ● ● ● 0 ●● ● 50 100 150● ● ●

Multimax changes our evolutionary interpretation of empirical data.

Unimax

xa xb



0.99

0.35

0.92 0.94

0.99 0.86 0.94 0.90

Acanthopagrus schlegelii Pagrus major Micropogonias undulatus Dicentrarchus labrax Gasterosteus aculeatus b Halichoeres trimaculatus Haplochromis burtoni b Gambusia affinis Oryzias latipes Kryptolebias marmoratus Oreochromis niloticus Oncorhynchus mykiss b2 Oncorhynchus mykiss a2 Pimephales promelas Danio rerio Carassius auratus Anguilla japonica b Anguilla japonica Gambusia affinis a Haplochromis burtoni

Garfish

Oreochromis niloticus

Xenopus laevis Taeniopygia guttata Coturnix japonica 0.99 Serinus canaria Manacus vitellinus scrofa 0.97 Sus Rattus norvegicus Oryctolagus cuniculus Homo sapiens Crocuta crocuta Canis lupus familiaris

1.0

Multimax

Leucoraja erinacea

Multimax found a tree that is 3 log-likelihood units better than Unimax’s best tree, and different at nine branch placements.

Unimax

best lnL = -260005 mean lnL = -260017

Multimax

best swap lnL = -259902 mean lnL = -259912

0

estimations of neighbor lnLs

59 88 7

Given a proposed tree (T), Unimax optimizes parameters (b, Q, π) sequentially one-at-atime [3]. Unimax assumes the parameters are separable and not related, but this assumption is probably incorrect---made merely for computational convenience.



Multimax and Unimax infer different branch lengths for the same tree, and this can affect our estimation of the best topology rearrangment. Although Unimax and Multimax start in the space space, they can lead us to different topologies.

-2

Unimax: Unidimensional Maximization



starting distance from true tree

an alignment of molecular sequences (d)

The goal of ML phylogenetic inference is to find values for T, b, Q, and π that maximize the function L.

y

0

Cyprinus carpio Carassius auratus Spinibarbus denticulatus Oncorhynchus mykiss Danio rerio

a proposed topology (T) with branch lengths (b)

Unimax tree error 10 minus 5 Multimax tree error



2

i

pinnate trees

balanced trees



15

● ● ● ● ● ● ●

34

{ πA , πC , πG , πT }  L(T, b, Q, π|d) = P (di |T, b, Q, π)

20

Why?

nt

A a b d Ca c e Gb c f T d e f

25

60

A CGT

I generated trees of different shapes and sizes, and simulated sequences (both nucleotides and amino acids) evolving on those trees to create several unique sequence alignments. For each replicate, I used Unimax and Multimax to infer an ML phylogeny.

● 16 ● 32 ● 64 ● 128 ● 256 ● 512

each point is one replicate dataset

-2

We start with an alignment of related molecular sequences (d). The likelihood (L) of a proposed phylogenetic tree (T) with branch lengths (b), substitution rates (Q), and equilibrium frequencies (π) equals the product of the probability of observing each sequence site di, given T, b, Q, and π (see reference [1]).

Under simulated conditions, Multimax finds more accurate trees tree size, n taxa than Unimax, especially for large trees. 30

co u

Maximum Likelihood (ML) Phylogenetic Inference

Further Reading

1. Felsenstein, Journal of Molecular Evolution, 1981. “Evolutionary Trees from DNA sequences: A maximum likelihood approach.”

2. Nocedal, Mathematics of Computation, 1980.

“Updating Quasi-Newton Matrices with Limited Storage.”

3. Guindon et al., Systematic Biology, 2010.

“New Algorithms and Methods to Estimate ML Phylogenies.”