Evolutionary diversification of the multimeric states of proteins

Report 2 Downloads 132 Views
PNAS PLUS

Evolutionary diversification of the multimeric states of proteins Michael Lynch1 Department of Biology, Indiana University, Bloomington, IN 47405 Contributed by Michael Lynch, June 12, 2013 (sent for review May 3, 2013)

| oligomer | quaternary structure

G

iven the cellular basis of life, a fully synthetic theory of evolution will ultimately require an understanding of the population-genetic mechanisms influencing heritable changes in cell features. As much of cellular infrastructure is composed of proteins, such an enterprise should arguably begin with the evolutionary dynamics of protein structure. Remarkably, however, one of the most striking features of proteins across the tree of life is almost completely unexplored from an evolutionary perspective. Only a minority of proteins function as isolated units. Instead, most exist as symmetrical higher-order complexes composed of subunits encoded by the same locus. Depending on the numbers of subunits, such complexes are referred to as homodimers, homotrimers, homotetramers, etc. Understanding the emergence of such liaisons is a central issue in the nascent field of evolutionary cell biology. Most attempts to explain the existence of multimers have started with the implicit assumption that their origin and retention are a consequence of adaptive evolution (1–5), and the resultant proposed advantages are not in short supply. First, it is generally easier to fold multiple small proteins than a single long one, although this does not explain the very large fraction of multimers retaining active sites within each subunit (as opposed to active sites being products of subunit interfaces). Second, the encounter rate of an enzyme and a small substrate is proportional to the effective radius of the enzyme (6, 7), and provided the catalytic site remains exposed, the elimination of extraneous protein surface may further enhance the frequency of productive encounters. Third, a smaller surface-area to volume ratio may reduce a protein’s vulnerability to denaturation or engagement in promiscuous interactions. Fourth, higher-order structures may reduce the sensitivity of catalytic sites to internal motions, thereby increasing substrate specificity; and oligomerization may protect otherwise unstable proteins from aggregation (8). Fifth, complexation offers increased opportunities for allosteric regulation of protein activity.

www.pnas.org/cgi/doi/10.1073/pnas.1310980110

Results A Case for Stochastic Transitions Among Oligomeric States. Ap-

proximately two-thirds of the proteins for which well-curated structural data exist are known to assemble as homomers (4, 14). Moreover, although imbalances across protein families and/or organisms presumably cause biases in the current databases, the distributions of multimeric complexity (number of subunits) are quite similar across phylogenetic lineages (10, 15). Roughly twothirds of multimers are dimers, ∼ 15% are tetramers, and the remaining classes exhibit progressively lower frequencies. Oddmers Significance Rather than operating as single units, most proteins assemble as multimers, usually with all subunits derived from the same gene. In contrast to patterns of gene structure and genome organization, which typically exhibit substantial increases in complexity from unicellular to multicellular organisms, the structural complexity of orthologous proteins appears roughly constant over the tree of life. The interfaces of multimers also often shift dramatically over evolutionary time. To explain these observations, a model is presented for the stochastic origin of variation in the multimeric states of proteins via the joint processes of mutation, random genetic drift, and constant directional selection. Author contributions: M.L. designed research, performed research, analyzed data, and wrote the paper. The author declares no conflict of interest. 1

E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1310980110/-/DCSupplemental.

PNAS | Published online July 8, 2013 | E2821–E2828

CHEMISTRY

complex adaptation

Although the large pool of oligomeric structures in today’s organisms cannot possibly be strongly maladaptive, this need not imply that they have arisen by or are currently maintained by adaptive processes. Indeed, despite the plausibility of many of the above hypotheses, empirical evidence for the adaptive value of alternative multimeric structures is essentially completely lacking, and a number of examples can be pointed to in which a more complex structure seemingly operates no more efficiently in its lineage than a simpler structure in others (9, 10). Moreover, the transition to an oligomerized state imposes clear challenges. First, to achieve a critical concentration of an active multimer, the expression of monomeric subunits must be raised to a high enough level to ensure an adequate number of encounters for successful complex assembly. This increase in subunit production will entail an energetic cost. Second, unless a newly emerging dimer has a symmetric interface and the correct subunit orientation, concatenations into indefinite filaments can arise. Numerous human disorders are known to result from inappropriate protein aggregation (11), and highly expressed proteins are especially vulnerable to promiscuous interactions (12, 13). Third, in diploid species, some dimerizing mutations may have deleterious effects in heterozygous individuals as a consequence of heterotypic complexation, a process that can greatly reduce the probability of establishment of the dimerizing allele (10).

EVOLUTION

One of the most striking features of proteins is their common assembly into multimeric structures, usually homomers with even numbers of subunits all derived from the same genetic locus. However, although substantial structural variation for orthologous proteins exists within and among major phylogenetic lineages, in striking contrast to patterns of gene structure and genome organization, there appears to be no correlation between the level of protein structural complexity and organismal complexity. In addition, there is no evidence that protein architectural differences are driven by lineage-specific differences in selective pressures. Here, it is suggested that variation in the multimeric states of proteins can readily arise from stochastic transitions resulting from the joint processes of mutation and random genetic drift, even in the face of constant directional selection for one particular protein architecture across all lineages. Under the proposed hypothesis, on a long evolutionary timescale, the numbers of transitions from monomers to dimers should approximate the numbers in the opposite direction and similarly for transitions between higherorder structures.

are noticeably underrepresented, with, for example, trimers having less than half the frequency of tetramers. A less biased picture of the diversity of multimeric structures can be achieved by focusing on specific sets of proteins with functions conserved across the tree of life. The enzymes involved in glycolysis and the citric-acid cycle comprise two such groups for which substantial structural data exist. For these pathways, nearly every protein that has been studied in more than a few major lineages exhibits two or more oligomeric states, with variation often existing both within and among major lineages (Fig. 1). Multimers for these proteins are almost invariably homomeric rather than heteromeric (with subunits encoded by different loci), although there are certainly plenty of the latter for other loci in all domains of life (10). Moreover, in accordance with the broader analysis noted above, there is no association between the level of multimeric complexity and organismal complexity. For any given protein, the most complex structures are just as likely to be harbored by a prokaryote as by a metazoan or land plant. This lack of phylogenetic pattern in protein architectural features is striking when one considers the substantial differences in the populationgenetic environments (rates of mutation, recombination, and

Eubacteria

Glycolysis:

random genetic drift) of these major classes of organisms and their likely roles in generating major differences in gene structure and genomic architecture (16). These kinds of observations motivate the question of whether the multimeric states of proteins are driven by idiosyncratic processes unique to individual lineages, perhaps even being free to drift in an effectively neutral fashion. Further evidence in support of such stochastic turnover derives from numerous examples in which different taxa deploy the same numbers of subunits for a specific protein and yet use completely nonoverlapping interfaces for subunit binding (19–23). Approximately two-thirds of protein families containing homomers exhibit phylogenetic variation in the use of binding interfaces, with ∼ 4% using five or more different interfaces (24). Multimeric proteins with 10–40% amino acid sequence divergence between taxa use different interfaces ∼10% of the time, whereas those with > 50% divergence are equally likely to use the same or different interfaces. It is difficult to explain such patterns unless taxa lose their multimeric structures (reverting to pure monomers) and then regain them in different manners. The few phylogenetic analyses of quaternary

Archaea

Uni.Euks.

Land plants

Metazoans

Hexokinase Glucose 6-phosphate isomerase Phosphofructokinase Fructose bisphosphate aldolase Triosephosphate isomerase Glyceraldehyde phosphate dehydrogenase Phosphoglycerate kinase Phosphoglucomutase Enolase Pyruvate kinase

Citric-acid cycle: Citrate synthase Isocitrate dehydrogenase Fumarase Malate dehydrogenase

Monomer

Dimer

Trimer

Tetramer

Hexamer

Octamer

Fig. 1. The types of multimeric structures observed for enzymes of glycolysis and of the citric-acid cycle, obtained from information at the Braunschweig Enzyme Database (BRENDA) (17) and Protein Interfaces, Surfaces and Assemblies (PISA) (18) databases. Data are shown only for enzymes with structures known in at least four of the major groups (note that the unicellular eukaryotes are not monophyletic, but distributed over several major lineages). There has been no attempt to weight the observations according to the frequency observed, as the data are not evenly distributed within major groupings. Because protein structural data are very limited for plants, the paucity of variation with this lineage may simply be a sampling artifact. The lack of structural variation for glucose 6-phosphate isomerase is likely due to the fact that the active site of this enzyme is formed at the dimeric interface.

E2822 | www.pnas.org/cgi/doi/10.1073/pnas.1310980110

Lynch

Fig. 2. Variation in the area of binding interfaces in different taxa as a function of the overall surface area of monomeric subunits. The data, given for enzymes of glycolysis and the citric-acid cycle, are derived from the PISA database (18) and are distributed over a broad range of prokaryotic and eukaryotic taxa. The average fraction of a multimer covered by the sum of its interfaces is 0.165, with a SD of 0.061 (sample size = 159). Diagonal lines denote points with equivalent proportions of the surfaces of monomeric subunits associated with interfaces.

Lynch

i−1

nþ1

j¼1

k¼iþ1

~ pi ¼ ∏ mj;jþ1 ∏ mk;k−1 ;

PNAS PLUS

[1]

where n denotes the final allele in the series. For example, with four allelic classes, ~ p1 ¼ m2;1 m3;2 m4;3 , ~ p2 ¼ m1;2 m3;2 m4;3 , ~ p3 ¼ m1;2 m2;3 m4;3 , and ~ p4 ¼ m1;2 m2;3 m3;4 . The absolute probabil~ i Þ, which sum to one, are obtained by dividing each ~ ities ðP pi by the normalization constant C ¼ ∑ni¼1 ~ pi . The details leading to this and more general solutions are outlined in SI Text. Under ~ i can be equivalently the assumptions of the steady-state model, P viewed as the proportion of time a specific lineage spends in state i over a long evolutionary period or as the fraction of populations experiencing identical population-genetic environments that are expected to reside in class i at any specific time. Further insight requires that the transition rates between adjacent classes be defined in terms of their underlying determinants. Letting N be the population size, μi;j be the mutation rate from allele class i to j (where j can be only i − 1 or i þ 1), and ϕi;j be the probability of fixation of a newly arisen j allele in a population predominated by allele i, the transition rates are equal to the products of the relevant numbers of new mutant alleles arising per generation and their probabilities of fixation; i.e., mi; j ¼ 2Nμi; j ϕi; j , assuming diploidy. Letting class 1 denote the allelic category producing a monomeric protein and subsequent indexes denote dimeric classes with increasing interfacial binding stabilities, the conditions encouraging alternative oligomeric states can be ascertained by expanding and solving Eq. 1. In the following, we assume a constant upward mutation rate between classes, i.e., μi;iþ1 ¼ u for all i, as might be closely approximated if the number of potential surface residues that can mutate to adhesive states is large. On the other hand, the downward mutation rate is assumed to increase linearly with the allelic class, i.e., μi;i−1 ¼ ði − 1Þv, in accordance with the increase in the number of adhesive residues at the interface PNAS | Published online July 8, 2013 | E2823

CHEMISTRY

Transitions Between Monomeric and Dimeric Classes. Although the multimeric structures of proteins are typically viewed as discrete molecular states, the stability of a dimer depends in large part on the binding strength of the interface. Depending on the dissociation constant of the subunits, a mixture of monomers and dimers will exist within a cell, with the extremes being nearly 100% of one type or the other. The salient issue is that a gradient of different allelic classes can be expected between these extremes, and this is reflected in the wide range of interfacial surface areas observed in orthologous proteins in different lineages (Fig. 2), as well as in the widespread presence of “weak dimers” in nature (31). Dozens of studies have shown that dimeric interfaces can generally be obliterated with just one or two key amino acid substitutions (5, 27, 32). Likewise, single mutations are often sufficient to instigate

a moderately stable (15, 33, 34) and, at least in some cases, advantageous interface (8). These observations motivate a relatively simple model for the evolution of alternative structural states, with each allelic class having a finite probability of transitioning to an adjacent class. An alternative, but mathematically more challenging approach would be to treat the magnitude of binding strength as a continuously distributed trait (35). Under both perspectives, an evolutionary lineage has the capacity to wander among alternative oligomeric states over a long timescale to a degree that depends on the joint forces of mutation, selection, and random genetic drift. This model assumes that active sites are contained within individual monomeric subunits rather than being constructed at interfaces, the latter condition being relatively rare. The effectiveness of selection will be modulated by the magnitude of genetic drift, but even in large populations, the force of selection can be balanced or overpowered by biased mutation pressure in the opposite direction. We start with an allelic series with the lower terminal state denoting an effectively pure monomeric structure and the subsequent states separated by single mutations representing dimeric structures with progressively stronger interfaces, i.e., increasing tendencies of monomeric subunits to assemble into dimers (Fig. 3). Letting mi;j denote the rate of evolutionary ~ i denote the long-term transition from state i to state j and P probability of allelic state i, at equilibrium the net probability flux into each state must be balanced by the net efflux. For a linear array of alleles, assuming nonzero transition rates between all adjacent classes, the equilibrium solution takes on a simple, intuitive form—the steady-state probability of allelic class i is proportional to the product of all transition rates pointing toward the class from both directions,

EVOLUTION

structures of proteins are consistent with such recurrent transitions (25–27). Although the strong tendency for proteins to form homomeric complexes may reflect some intrinsic selective advantage, such a bias may also exist for purely biophysical reasons. To be adhesive enough to ensure complexation, proteins must overcome the energetic cost of thermal motion, and the basic features of proteins are such that symmetrical interfaces are more likely to generate extremes of binding strength than random asymmetric interfaces (28, 29). One simple reason for such behavior is that any pair of adhesive residues will be present twice in a symmetric interface (1), ensuring that the very small tail of random interfaces with sufficient strength to overcome background thermal motion will be enriched with self-referential forms. The mere tendency for proteins of the same type to be colocalized within the cell further enhances the likelihood of self-interactions (30). Finally, because proteins are typically selected to have hydrophilic exteriors, a strong mutational bias exists in the direction of adhesivity. As will be seen below, even if the monomeric state is uniformly advantageous, unless the fitness benefit of such a state is substantially greater than the mutation bias toward aggregation, variation in the quaternary structure of proteins is expected to be common.

Class: 1

2

3

probability of fixation (which is no longer a constant 1=2N). Letting si denote the selective disadvantage of allele i, measured relative to a perfect fitness of 1.0, then siþ1 < si implies that allele i þ 1 is beneficial compared with allele i. Assuming mutations with additive effects on fitness, application of Kimura’s (36) expression for the fixation probability of new mutations yields the convenient result ϕi;iþ1 =ϕiþ1;i ¼ e4Nðsi −siþ1 Þ (37, 38), leading to a general expression for the steady-state frequencies

4

m12

m23

m34

m21

m32

m43

Frequency of Allelic Class

0.5 u/v = 0.1 u/v = 1.0 u/v = 2.0 u/v = 10.0 u/v = 20.0

0.4 0.3

i−1 −4Nsi ~ i ¼ ðu=vÞ e P ; ði − 1Þ!C

0.2 0.1 0.0 10

20

30

Allelic Class Fig. 3. (Upper) A general model for a linear array of oligomeric states. Class 1 represents the most extreme monomeric state, with all other classes denoting classes with increasing interface stability. The transition coefficient mi;j denotes the rate at which a population makes a change from state i to state j. (Lower) Under neutrality, the expected probability distribution of allelic states is Poisson, with a form that is independent of the size of the population. The mean allelic state is equal to 1 þ ðu=vÞ and the SD is ðu=vÞ0:5 , where u and v denote the mutation rates in the upward and downward directions (the former being per molecule and the latter per relevant interfacial residue). This same relationship applies with selection promoting or opposing dimerization if ðu=vÞe4Ns or ðu=vÞe−4Ns is substituted for ðu=vÞ.

subject to loss. As u is an aggregate mutation rate over a large number of sites, and v is a fraction of the per-site mutation rate, we expect u=v to be generally >1 and perhaps greatly so. Because any selective advantages of consecutive classes must eventually decline (as a state of molecular perfection is approached), with μi;iþ1 =μi;i−1 declining as well, the equilibrium frequencies of the upper classes must eventually diminish toward zero, in which case Eq. 1 characterizes a stochastic steady-state probability distribution of an effectively infinite array of alleles. Moreover, because each transition rate contains either a u or a v, the final result depends only on the ratio of mutation rates independent of their absolute values. It is instructive to first consider the simplest situation in which all allelic classes are selectively equivalent (i.e., all dimers operate with equal efficiencies as monomers). This is a plausible condition, as the majority of multimeric proteins retain single active sites within each subunit. The equilibrium probabilities of alternative states then depend on u=v alone, independent of population size (because the number of new mutations is proportional to 2N, whereas the probability of fixation of a neutral mutation is equal to its initial frequency, 1=2N), i−1 ~ i ¼ ðu=vÞ : P ði − 1Þ!C

[2]

i u=v , the With a normalization constant of C ¼ ∑∞ i¼0 ðu=vÞ =i! ¼ e distribution of allelic types is Poisson, with the probability of the ~ 1 ¼ e−u=v . For u=v ¼ 0:01, extreme monomeric state being simply P ~ 1 ¼ 0:99, 0.90, 0.37, 0.1, 1.0, and 10.0, respectively, this yields P and 0.000045, implying that even in the absence of adaptive differences among allelic states, the probability of a dimeric structure ~ 1 Þ can be substantial. It also follows that for large u=v subð1 − P stantial variation in the structure of interfaces is expected among lineages (Fig. 3), consistent with the observations noted above on variation for interfacial areas. In the presence of selection, these expressions must be modified, as each transition coefficient must be weighted by the

E2824 | www.pnas.org/cgi/doi/10.1073/pnas.1310980110

[3]

where C is again the normalization constant. The quantity e−4Nsi has the useful interpretation of being equivalent to the ratio of fixation probabilities of deleterious and beneficial mutations with the same absolute effects. Consider first the situation in which dimers are advantageous relative to monomers and become progressively perfected at higher states, such that si ¼ s1 e−kði−1Þ , with s1 denoting the selective disadvantage of the monomer. For small fitness changes between alleles ðki  1Þ,  i−1 ðu=vÞe4Ns ~ Pi ¼ ; ði − 1Þ!C

[4]

where s ¼ ks1 is now the selective advantage of allele i þ 1 over allele i. As this expression is identical in form to Eq. 2, with u=v simply being weighted by a constant e4Ns , the expected Poisson distribution of allelic states is maintained. In this case, however, the steady-state probability of the monomeric state is exp½ − ðu=vÞe4Ns . Thus, the behavior of the system with unconditionally weak and positive selection for dimerization is a function of a single composite quantity ðu=vÞe4Ns , which itself is determined by the directional magnitude of mutation pressure ðu=vÞ, the selection differential between adjacent alleles (s), and the magnitude of random genetic drift ð1=2NÞ. The ratio of the power of selection to the power of drift, s=ð1=2NÞ ¼ 2Ns, simply enters as an exponential pressure, pushing the system farther in the direction of dimerization than would be expected on the basis of mutation pressure alone. If the power of drift is substantially greater than that of selection, i.e., 4Ns  1, the distribution of allele frequencies will closely approximate the neutral expectation (Eq. 2). The situation in which selection opposes dimerization can be evaluated in a parallel manner, and again with weak selection a Poisson distribution is retained for the equilibrium probabilities of alternative states, identical in form to Eq. 4 but with the sign of the exponential term now being negative. In other words, a regular form of weak selection against dimerization simply shifts the equilibrium distribution to the left of the neutral expectation, leading to a monomer probability of exp½ − ðu=vÞe−4Ns . Thus, the patterns in Fig. 3 apply to any constant regimen of directional selection by simply multiplying the mutation-pressure term ðu=vÞ by the selection-pressure term (e4Ns or e−4Ns ). The more general model that allows the selective disadvantages of consecutive alleles to progressively approach zero is given by ~ i ¼ ðu=vÞ P

i−1

   exp 4Ns1 1 − e−kði−1Þ : ði − 1Þ!C

[5]

This expression is no longer a precise Poisson distribution of allelic probabilities and necessarily predicts a lower mean allelic state owing to the progressive reduction in the strength of selection with increasing allelic state. However, for the deviations from Eq. 4 to be large, the strength of selection must be sufficient to push the distribution of allelic states beyond the Lynch

m32 m24

m35

m42

m53

m45 m54 m56

m65

Fig. 4. A 2D array of oligomeric states, allowing for transitions to dimeric and tetrameric structures with increasing levels of interfacial stability. Only the first three columns are included in the model discussed in the text.

point at which the expected distribution under mutation pressure alone would be negligible. As the latter has mean and variance u=v, the linear approximation will then still hold very closely pffiffiffiffiffiffiffiffi provided kðu=vÞ þ 2 u=v  1. For u=v ¼ 10, these conditions are satisfied if k  0:06. Thus, unless the incremental selective effects of multimerization are quite large or mutation pressure alone is capable of pushing the protein to an architecture very close to molecular perfection (e.g., to the point at which the reaction rate is limited only by diffusion), the general conclusions outlined above will be qualitatively unaffected. Because the average absolute selective effects of mutations (of any type) are generally < 0:01 (39–41), these conditions may not be very restrictive. Although a number of variants on the preceding model can be imagined, the qualitative conclusions seem unlikely to be altered greatly, and given the uncertainties about the mutational basis of interface establishment, more detailed mathematical models do not seem warranted. One matter of interest, however, is the nature of consecutive mutations leading to the formation of a stable interface. In the preceding analyses, the entire pool of mutations was treated as homogeneous, with their fitness effects depending only on their order of appearance, not on their physical nature. In principle, transitions to a dimeric state may be precipitated by a type of mutation (e.g., a major insertion or deletion) (27) that is fundamentally different from those accruing later. Indeed, given the protective environment of an interface, a multimeric condition is likely to encourage the accumulation of secondary mutations that would otherwise be rejected by selection, e.g., substitutions causing hydrophobic or electrostatic interactions that enhance interfacial binding strength but would be harmful if exposed on a protein’s surface. This type of reinforcement scenario is qualitatively consistent with numerous studies showing that laboratory-induced point mutations that prevent interfacial binding usually retain some catalytic ability while also experiencing a substantial reduction in structural stability (5), presumably a result of exposure to conditionally deleterious interfacial mutations. If we assume that reversions to the monomeric state require the elimination of all secondary mutations before loss of the initiating mutation, the main modification to the preceding model would be the need to reduce the mutation rate of each dimerized allele to a lower dimeric state by v. The consequences of more complex scenarios, Lynch

PNAS PLUS

m21

including a bottleneck in fitness for the lowest-level dimerizing alleles, can be readily evaluated by a simple modification of the transition coefficients in Eq. 1. Transition to a Higher-Order Tetramer. Although tetramers can have cyclical structures, with each monomeric subunit presenting a different interface to each neighbor, such assemblies are relatively rare (< 10% of all tetramers) (14). Instead, most homotetramers are dihedral in nature, i.e., simple dimers of dimers. Compelling evidence suggests that such structures become established through an intermediate state of symmetrical dimers, an evolutionary order reflected in the physical order of assembly/disassembly of complexes (in the vein of “ontogeny recapitulates phylogeny”) (14). Such a scenario can be incorporated into the scheme outlined above by adding a second dimension to the problem (Fig. 4), although the steady-state solutions are not as easily verbalized as in the two-state case (SI Text). As this added dimensionality can substantially increase the number of possible allelic states, we focus on the simplified situation in which there are just two classes of dimers and three of tetramers, with adjacent classes differing by the binding strength at one or both interfaces. This leads to a network of six allelic classes, with the probabilities of the mono~1, ~m ¼ P meric, dimeric, and tetrameric states being, respectively, P ~d ¼ P ~2 þ P ~ 3 , and P ~t ¼ P ~4 þ P ~5 þ P ~ 6 (Fig. 4). P Consider again a situation of weak selection, such that each additional binding-strength category increases fitness by some constant small amount s. Fitness relative to the monomer is then 1 þ s for allele class 2 (the lowest-level dimer), 1 þ 2s for classes 3 (highest-level dimer) and 4 (low-level tetramer), 1 þ 3s for class 5 (intermediate tetramer), and 1 þ 4s for class 6 (the highestlevel tetramer). Assuming an upward mutation rate of u in all cases (as above) and downward rates of v for 2→1 and 5→3; 2v for 3→2, 4→2, and 5→4; and 4v for 6→5, the equilibrium solution (SI Text) reduces to expressions that are functions of the same composite quantity noted above, θ ¼ ðu=vÞe4Ns , the product of the ratios of the mutation-rate and selection pressures in the upward and downward directions (Fig. 5). When θ is low, the lineage is expected to almost always reside in the monomeric state, but with 0:1 < θ < 10:0, the steady-state probabilities of all three multimeric states are at observable levels. Regardless of θ, the steady-state probability of dimerized

CHEMISTRY

m23

Fig. 5. The equilibrium probabilities of dimeric and tetrameric states in a lineage, as a function of the joint upward and downward pressures from mutation and selection. The probability of a monomeric state is equal to 1.0 minus the sum of probabilities of dimers and tetramers. Here it is assumed that there is positive selection for multimerization.

PNAS | Published online July 8, 2013 | E2825

EVOLUTION

m12

structures is always < 0:5, with θ > 10 being sufficient to maintain a lineage almost exclusively in the tetrameric state. Of course, this conclusion would be altered if tetrameric states were selected for (or against) much more strongly than dimeric states. The overall results also show that even if there is a substantial advantage to the tetrameric state, such structures will be rare if the rate of interface-disrupting mutation is sufficiently high to render θ < 1:0. The Rarity of Trimers. As noted above, the vast majority of multimeric enzymes contain even numbers of subunits, a disparity that may result from the special constraints on constructs involving odd numbers of subunits. As first pointed out by Monod et al. (1), subunit interfaces can be partitioned into two categories: isologous associations with each subunit deploying identical residues and heterologous associations with each subunit donating different sets of residues. As noted above, one advantage of an isologous interface is the “two-for-one” property of mutations. In addition, isologous interfaces have a higher propensity to yield closed structures, which minimize the sensitivity of the overall multimer to destabilizing forces. Although multimers of any number of subunits can form closed structures with heterologous interfaces, this is not possible with an odd number of subunits and isologous interfaces (Fig. 6). Consider a potential trimer with the monomeric subunits having two interfaces A and B; two subunits may have an isologous A-A interface, and the third subunit may join to make an isologous B-B interface, but this leaves a pair of nonmatching A and B interfaces unresolved. Thus, the assembly of symmetrical trimers, pentamers, and heptamers requires the use of heterologous interfaces, which likely require more mutations for establishment. In addition, oddmers carry the risk of concatenating into endless chains unless the angular orientation of the subunits promotes a closed structure. To gain more formal insight into the challenges of establishing a trimer vs. a dimer, consider the simple scheme in Fig. 7, where it is assumed that the evolution of a trimer requires two mutations (one for each interface), with the first being potentially deleterious (with selective disadvantage δ). This model does not rule out the acquisition of further stabilizing mutations (as used

Isologous

Heterologous

Dimer

Trimer

Tetramer

Fig. 6. The consequences of isologous and heterologous interfaces for the production of closed multimers with even and odd numbers of subunits. Note the mismatched interface in trimers when interfacial binding is isologous.

E2826 | www.pnas.org/cgi/doi/10.1073/pnas.1310980110

in the discussion of dimers and tetramers above), but the focus here is on the initial establishment of the baseline structures. The rate of dimerizing mutations is ud and that to trimerizing mutations is ut , and the rate of loss of such mutations is again designated as v. Relative to the monomeric state, the selection coefficients of dimers and trimers are designated sd and st , respectively. With this scheme, positive δ designates a deleterious intermediate en route to trimerization, and positive sd and st imply advantageous multimers; negative signs would imply the opposite. Assuming population sizes large enough that the power of selection against deleterious alleles is substantially greater than the power of drift ð4Nδ  1Þ, each of the intermediate-state alleles will be kept at low frequency ut =δ by selection–mutation balance and, although never going to fixation, will serve as launching pads for second-step trimerizing mutations. Thus, the rate of transition from the monomeric to the trimeric state is approximately equal to the rate at which second-step mutations arise in the small pool of segregating first-step mutations times their probability of fixation, ð2Nut Þð2ut =δÞϕðst Þ, whereas that in the opposite direction is ð2NvÞ½2v=ðδ þ st Þϕð − st Þ (42, 43). Using the approaches outlined above, the steady-state frequencies of monomers, dimers, and trimers are then ~ m ¼ 1=C P

[6a]

  ~ d ¼ ðud =vÞe4Nsd C P

[6b]

o. n ~ t ¼ ðut =vÞ2 ½ðδ þ st Þ=s0 e4Nst P C;

[6c]

where the normalization constant C is the sum of the numerators. It follows that the expected ratio of trimers to dimers is ~ t u2 δ þ st P t e4Nðst −sd Þ ; ¼ ~d vud δ P

[7]

which reduces to u2t =ðvud Þ if dimers and trimers are selectively neutral. As the middle term in Eq. 7 is > 0 when the intermediate state is deleterious, it is clear that rarity of trimers requires a selective disadvantage relative to that of dimers ðst < sd Þ and/or a much lower stepwise rate of mutations to trimerizing structures than to dimers. The problem of complex closure noted above for heterologous interfaces is consistent with the second point, as this will restrict the mutational paths to trimers; and potential disadvantages of trimers (relative to dimers) include the increased concentration barrier for a three-particle encounter rate necessary for complexation and the added vulnerability of two interfaces to nonfunctionalizing mutations. Discussion Although highly maladaptive modifications to a protein’s architecture are not expected to proliferate, this need not imply that features vital to survival and/or reproduction are immune to stochastic excursions between alternative states (38). The theory suggested above extends this point by illustrating how substantial variation in phenotypes might arise among lineages even in the presence of uniform directional selection across taxa. The greatest diversity is expected when the composite parameter θ is near 1.0, as this denotes situations in which any directional selection pressure is counterbalanced by mutation bias in the opposite direction, resulting in no directional tendency between alternative phenotypic states, even though the alternatives may not be selectively neutral. Under this view, a substantial fraction of phylogenetic variation in the multimeric states of proteins may exist not because of idiosyncrasies in modes of selection in different lineages, but as a simple outcome of the stochastic evolutionary dynamics that arise in finite populations when the combined pressures of muLynch

Monomer

Trimer ut

ut v

ud v

ut v

Selection:

sd

0

PNAS PLUS

Dimer

v

ut v

st

Lynch

extreme difficulties in directly estimating the scaled selection parameter ð4NsÞ for alternative states in an allelic series, this means that without an estimate of the mutation-bias parameter ðu=vÞ, it will be difficult to decipher the role played by selection from purely comparative observations. Unfortunately, owing to the secondary accumulation of conditionally deleterious mutations at interfaces, evaluating the consequences of loss-ofinterface mutations in well-established multimers will generally be uninformative with respect to the intrinsic adaptive value of multimeric structures. There are a number of ways in which the models outlined above might be modified. For example, the approach used treats populations as essentially pure states that stochastically undergo clean shifts to adjacent states at irregular intervals, with waiting times equal to the reciprocal of the transition rates. Such an approximation is expected to be quite appropriate for populations of sufficiently small size that the fixation of a novel allele occurs well before the emergence of more adaptive alleles at high frequencies. In large populations, however, multiple alleles will often be segregating simultaneously, and this raises the possibility of double mutants going to fixation before the intermediate state ever becomes common. As noted above for trimer evolution, in terms of transition rates, such an issue can become important when intermediate states are deleterious, as it allows populations to cross a valley in the fitness landscape without ever experiencing a reduction in mean fitness (42, 43, 46). As presented, the model also ignores issues that may arise with diploidy. The central issue here is that products of early-stage alleles in the transition toward a multimer will initially always find themselves inside heterozygous cells containing ancestraltype alleles. This raises the question of how monomeric subunits with structural modifications interact with the structures produced by ancestral alleles. For example, it has often been argued that the domain-swapping model provides a simple singlemutation mechanism for the emergence of a dimeric molecule (30, 33). Under this model, a monomeric protein with internal binding between two domains becomes compromised by a deletion in the hinge region that prevents communication between domains of the same polypeptide chain, but is resolved when two separate chains can swap domains with each other. However, it remains to be resolved whether there are negative consequences within heterozygotes where the construction of endless concatamers between the two allelic products might result (10). Such an effect, which can greatly reduce the probability of fixation of a domain-swapping allele unless the effective population size is extraordinarily small, can be accommodated into the preceding model by modifying the expression for one or more PNAS | Published online July 8, 2013 | E2827

EVOLUTION

tation and selection are not overwhelmingly large in one direction. If this hypothesis is correct and one had the capacity to sample a single evolutionary lineage over a very long period, individual proteins would be observed to occupy various multimeric states in frequencies reflecting the underlying transition probabilities. Although we do not have the luxury of making such observations, provided enough evolutionary time has elapsed for the tree of life to have reached the steady-state distribution, a corollary can be tested—regardless of the direction and magnitude of selection, the number of transitions from a monomeric to a dimeric state should equal that in the opposite direction, and the same symmetry should hold for dimer–tetramer transitions, etc. Although this prediction may seem counterintuitive, it arises simply from the fact that at equilibrium there must be an inverse relationship between the frequency of adjacent states and the transition rates between them. Although unequal reciprocal rates would imply nonequilibrium conditions, this would not rule out the types of processes outlined above. Unfortunately, owing to the huge imbalance in the taxa and proteins with existing structural data, such a test cannot yet be made, and if such efforts are to be pursued, it must be kept in mind that the phylogenetic depth for sampling must be substantially greater than the expected transition times between alternative states. For example, if the likely transition rate (the population-level rate of origin of quaternary structural mutations times the probability of fixation) between states is on the order of 10−8 /y, a focus on a lineage that diverged more recently would be uninformative. Of additional concern is the matter of effective population size. Even if the biases of mutation and selection pressures are similar in all lineages, effective population sizes vary by at least three orders of magnitude among lineages (16). This may generate variation in the transition probabilities between alternative multimeric states (except in the case of effective neutrality), with the states of lineages experiencing higher magnitudes of drift more closely reflecting the expectations under mutation bias alone. Related to this issue is the fact that individual lineages may experience substantial and prolonged changes in effective population sizes over time, potentially reducing the chances that they ever attain a meaningful equilibrium state, while also compromising the ability to phylogenetically reconstruct ancestral states from current-day features (44, 45). Although the preceding analyses do not rule out the possibility that alternative multimeric states of proteins are effectively neutral with respect to each other, the overall model is agnostic on this matter. Indeed, a key point is that the steadystate distribution (and transition probabilities) for alternative states is a composite function of the power of mutation, selection, and drift [θ ¼ ðu=vÞe4Ns in the simplest case]. Given the

CHEMISTRY

Fig. 7. Flow diagram for the situation in which dimers or trimers can evolve from a monomeric ancestral state. The mutation rates between alternative allelic states are given on the arrows, and the selection coefficients are denoted below the diagram. The two potential paths to a trimeric structure simply differ in the order of occurrence of the two mutations necessary for complexation.

fixation probabilities to allow for the behavior of underdominant alleles (10). ACKNOWLEDGMENTS. I thank D. Bolon, A. Dean, A. Panchenko, E. Shakhnovich, and B. Shraiman for helpful comments. This work has been supported by

National Institutes of Health Grant R01 GM036827 (to M.L. and W. K. Thomas), National Science Foundation (NSF) Grant EF-0827411 (to M.L.), and US Department of Defense Grant ONRBAA10-002 (to M.L., P. Foster, H. Tang, and S. Finkel). Support was also provided by NSF Grant PHY11-25915 to the Kavli Institute for Theoretical Physics.

1. Monod J, Wyman J, Changeux JP (1965) On the nature of allosteric transitions: A plausible model. J Mol Biol 12:88–118. 2. Marianayagam NJ, Sunde M, Matthews JM (2004) The power of two: Protein dimerization in biology. Trends Biochem Sci 29(11):618–625. 3. Ali MH, Imperiali B (2005) Protein oligomerization: How and why. Bioorg Med Chem 13(17):5013–5020. 4. Hashimoto K, Nishi H, Bryant S, Panchenko AR (2011) Caught in self-interaction: Evolutionary and functional mechanisms of protein homooligomerization. Phys Biol 8(3):035007. 5. Griffin MD, Gerrard JA (2012) The relationship between oligomeric state and protein function. Adv Exp Med Biol 747:74–90. 6. Smoluchowski MV (1917) Attempt of a mathematical theory of Koagulationskinetik colloidal solutions. Z Phys Chem 92:129–168. 7. Phillips R, Kondev J, Theriot J, Garcia H (2013) Physical Biology of the Cell (Garland Science, New York), 2nd Ed. 8. Bershtein S, Mu W, Shakhnovich EI (2012) Soluble oligomerization provides a beneficial fitness effect on destabilizing mutations. Proc Natl Acad Sci USA 109(13): 4857–4862. 9. Lukes J, Archibald JM, Keeling PJ, Doolittle WF, Gray MW (2011) How a neutral evolutionary ratchet can build cellular complexity. IUBMB Life 63(7):528–537. 10. Lynch M (2012) The evolution of multimeric protein assemblages. Mol Biol Evol 29(5): 1353–1366. 11. Chiti F, Dobson CM (2009) Amyloid formation by globular proteins under native conditions. Nat Chem Biol 5(1):15–22. 12. Semple JI, Vavouri T, Lehner B (2008) A simple principle concerning the robustness of protein complex activity to changes in gene expression. BMC Syst Biol 2:1. 13. Vavouri T, Semple JI, Garcia-Verdugo R, Lehner B (2009) Intrinsic protein disorder and interaction promiscuity are widely associated with dosage sensitivity. Cell 138(1): 198–208. 14. Levy ED, Boeri Erba E, Robinson CV, Teichmann SA (2008) Assembly reflects evolution of protein complexes. Nature 453(7199):1262–1265. 15. Levy ED, Teichmann S (2013) Structural, evolutionary, and assembly principles of protein oligomerization. Prog Mol Biol Transl Sci 117:25–51. 16. Lynch M (2007) The Origins of Genome Architecture (Sinauer, Sunderland, MA). 17. Schomburg I, et al. (2013) BRENDA in 2013: Integrated reactions, kinetic data, enzyme function data, improved disease classification: New options and contents in BRENDA. Nucleic Acids Res 41(Database issue):D764–D772. 18. Krissinel E, Henrick K (2007) Inference of macromolecular assemblies from crystalline state. J Mol Biol 372(3):774–797. 19. Royer WE, Jr., Love WE, Fenderson FF (1985) Cooperative dimeric and tetrameric clam haemoglobins are novel assemblages of myoglobin folds. Nature 316(6025):277–280. 20. Kolatkar PR, Meador WE, Stanfield RL, Hackert ML (1988) Novel subunit structure observed for noncooperative hemoglobin from Urechis caupo. J Biol Chem 263(7): 3462–3465. 21. Bourne Y, et al. (1996) Novel dimeric interface and electrostatic recognition in bacterial Cu,Zn superoxide dismutase. Proc Natl Acad Sci USA 93(23):12774–12779. 22. Vassylyev DG, et al. (2006) Crystal structure of the translocation ATPase SecA from Thermus thermophilus reveals a parallel, head-to-head dimer. J Mol Biol 364(3): 248–258. 23. Atkinson SC, et al. (2012) Crystal, solution and in silico structural studies of dihydrodipicolinate synthase from the common grapevine. PLoS ONE 7(6):e38318.

24. Dayhoff JE, Shoemaker BA, Bryant SH, Panchenko AR (2010) Evolution of protein binding modes in homooligomers. J Mol Biol 395(4):860–870. 25. Archibald JM, Blouin C, Doolittle WF (2001) Gene duplication and the evolution of group II chaperonins: Implications for structure and function. J Struct Biol 135(2): 157–169. 26. Hashimoto K, Madej T, Bryant SH, Panchenko AR (2010) Functional states of homooligomers: Insights from the evolution of glycosyltransferases. J Mol Biol 399(1): 196–206. 27. Hashimoto K, Panchenko AR (2010) Mechanisms of protein oligomerization, the critical role of insertions and deletions in maintaining different oligomeric states. Proc Natl Acad Sci USA 107(47):20352–20357. 28. Lukatsky DB, Shakhnovich BE, Mintseris J, Shakhnovich EI (2007) Structural similarity enhances interaction propensity of proteins. J Mol Biol 365(5):1596–1606. 29. André I, Strauss CE, Kaplan DB, Bradley P, Baker D (2008) Emergence of symmetry in homooligomeric biological assemblies. Proc Natl Acad Sci USA 105(42):16148–16152. 30. Kuriyan J, Eisenberg D (2007) The origin of protein interactions and allostery in colocalization. Nature 450(7172):983–990. 31. Dey S, Pal A, Chakrabarti P, Janin J (2010) The subunit interfaces of weakly associated homodimeric proteins. J Mol Biol 398(1):146–160. 32. Bogan AA, Thorn KS (1998) Anatomy of hot spots in protein interfaces. J Mol Biol 280(1):1–9. 33. Bennett MJ, Choe S, Eisenberg D (1994) Domain swapping: Entangling alliances between proteins. Proc Natl Acad Sci USA 91(8):3127–3131. 34. Grueninger D, et al. (2008) Designed protein-protein association. Science 319(5860): 206–209. 35. Zeldovich KB, Chen P, Shakhnovich EI (2007) Protein stability imposes limits on organism complexity and speed of molecular evolution. Proc Natl Acad Sci USA 104(41):16152–16157. 36. Kimura M (1962) On the probability of fixation of mutant genes in a population. Genetics 47:713–719. 37. Sella G, Hirsh AE (2005) The application of statistical physics to evolutionary biology. Proc Natl Acad Sci USA 102(27):9541–9546. 38. Lynch M (2012) Evolutionary layering and the limits to cellular perfection. Proc Natl Acad Sci USA 109(46):18851–18856. 39. Lynch M, et al. (1999) Spontaneous deleterious mutation. Evolution 53:645–663. 40. Eyre-Walker A, Keightley PD (2007) The distribution of fitness effects of new mutations. Nat Rev Genet 8(8):610–618. 41. Schneider A, Charlesworth B, Eyre-Walker A, Keightley PD (2011) A method for inferring the rate of occurrence and fitness effects of advantageous mutations. Genetics 189(4):1427–1437. 42. Weissman DB, Desai MM, Fisher DS, Feldman MW (2009) The rate at which asexual populations cross fitness valleys. Theor Popul Biol 75(4):286–300. 43. Lynch M, Abegg A (2010) The rate of establishment of complex adaptations. Mol Biol Evol 27(6):1404–1414. 44. Belle EM, Duret L, Galtier N, Eyre-Walker A (2004) The decline of isochores in mammals: An assessment of the GC content variation along the mammalian phylogeny. J Mol Evol 58(6):653–660. 45. Rho M, et al. (2009) Independent mammalian genome contractions following the KT boundary. Genome Biol Evol 1:2–12. 46. Iwasa Y, Michor F, Nowak MA (2004) Stochastic tunnels in evolutionary dynamics. Genetics 166(3):1571–1579.

E2828 | www.pnas.org/cgi/doi/10.1073/pnas.1310980110

Lynch