Canonical Approximation of Fitness Landscapes - Semantic Scholar

Report 4 Downloads 129 Views
Canonical Approximation of Fitness Landscapes



Robert Happela and Peter F. Stadlera;b;

a Theoretische Biochemie,

Institut fur Theoretische Chemie Universitat Wien, Vienna, Austria b Santa

Fe Institute, Santa Fe, NM

 Mailing Address: Peter F Stadler, Inst. f. Theoretische Chemie, Univ. Wien Wahringerstr. 17, A-1090 Wien, Austria Phone: [43] 1 40480 665 Fax: [43] 1 40480 660 E-Mail: [email protected]

Keywords Correlation Functions | Fourier Series | Landscapes | Laplace Operator | Random Walk | RNA Folding

R. Happel & P.F. Stadler: Approximation of Landscapes

Abstract We present a method for approximating a tness landscape as a superposition of \elementary" landscapes. Given a correlation function of the landscape in question we show that the relative amplitudes of contributions with p-ary interactions can be computed. We show an application to RNA free energy landscapes.

1. Fitness Landscapes Since Sewall Wright's seminal paper [1] the notion of a tness landscape underlying the dynamics of evolutionary optimization has proved to be one of the most powerful concepts in evolutionary theory. Implicit in this idea is a collection of genotypes arranged in an abstract metric space, with each genotype next to those other genotypes which can be reached by a single mutation, as well as a value assigned to each genotype. Such a construction is by no means restricted to the theory of biological evolution. Hamiltonians of disordered systems, such as spin glasses [2, 3], and the cost functions of combinatorial optimization problems [4] have the same mathematical structure. Eigen's [5] pioneering work on the molecular quasispecies has shown that the dynamics of evolutionary adaptation (optimization) on a landscape depends crucially on detailed structure of the landscapes itself. Extensive computer simulations, see, e.g., [6, 7, 8, 9, 10] have made it very clear that a complete understanding of the dynamics is impossible without a thorough investigation of the underlying landscape [11]. A theory of landscapes is based on three ingredients: the rst two items are a nite (but very large) set V of con gurations and a tness function f : V ! IR. The third ingredient is a notion of neighborhood between the con gurations, which allows us to interpret V as the vertex set of a graph ?, with an edge set E de ned by the neighborhood relations, i.e., two vertices are joined by an edge in ? if and only if they are neighbors. We will refer to ? as the con guration space of the landscape f . In this contribution we shall mostly consider con gurations that are sequences of constant length n taken from an alphabet of letters. ? is called a sequence space [12] in this special case, and in case of a 2-letter alphabet one {1{

R. Happel & P.F. Stadler: Approximation of Landscapes

speaks of a boolean hypercube. We remark at this point that the theory outlined in section 2 applies also to more general classes of graphs. For the technical details we refer to [13, 14].

2. Approximation Theory for Landscapes

2.1.Fourier Expansions and Elementary Landscapes In the following we shall need an algebraic description of the con guration space ?. The most useful algebraic representation of a graph is its adjacency matrix A which has the following entries

Axy

def

===



1 if fx; yg 2 E 0 otherise ;

(1)

i.e., Axy = 1 is and only if the vertices x and y are neighbors. The number of neighbors of vertex x 2 V is called the degree of x. Let D be the diagonal matrix of the vertex degrees. Sequence spaces are regular, i.e., all vertices have the same number of neighbors, namely D = ( ? 1)n. For regular graphs we have, of course, D = DI, where I is the identity matrix and D is the common degree of all vertices. Instead of the adjacency matrix A, it is often convenient to characterize a graph ? by its Laplacian ? =def== D ? A: (2) For regular graphs this de nition simpli es to  = A ? DI. The graph Laplacian  can be viewed as a generalization of the more familiar di erential operator  in continuous spaces. For the details we refer to [15, 16, 17, 18]. The graph Laplacian ? is non-negative de nite, and the smallest eigenvalue 0 = 0 has multiplicity 1 if and only if ? is a connected graph. In what follows we will assume that ? is connected. In this case an eigenvector with eigenvalue  = 0 is always constant on V . {2{

R. Happel & P.F. Stadler: Approximation of Landscapes

A series expansion in terms of a complete and orthonormal system of eigenfunctions of the Laplace operator is commonly termed a Fourier expansion. Following [19], we will adopt the same terminology here. Thus, let f be landscape on ? and let f'ig denote a complete orthonormal set of eigenvectors of the graph Laplacian1. Then we call the expansion

f (x) =

jV j X i=1

ai 'i(x)

(3)

Fourier expansion of the landscape. A landscape is called elementary if it is of the form f (x) = c + '(x), where c is an arbitrary constant and ' is an eigenvector of the graph-Laplacian belonging to an eigenvalue  > 0, see [20, 14]. Elementary landscapes form an important class because the landscape of the most intensively studied combinatorial optimization problems, such as the travelling salesman problem [21], the graph bipartitioning problem, or the graph coloring problem, are of this type [20].

For boolean hypercubes de ned on the alphabet f+1; ?1g, a Fourier basis can easily be determined explicitly. Let x 2 f+1; ?1gn. It is easy to verify then that each product of exactly p factors of the form

"i1 i2 :::ip (x)

def

===

xi1 xi2 : : : xip

with i1 < i2 < : : : < ip

(4)

is an eigenvector of ? belonging to the eigenvalue p = 2p. In particular, Derrida's p-spin models

Hp(x)

def

===

X

i1