Path Weights, Networked Partial Correlations and their
arXiv:1510.02510v1 [q-bio.QM] 8 Oct 2015
Application to the Analysis of Genetic Interactions Alberto Roverato Universit`a di Bologna, Italy
[email protected] Robert Castelo Universitat Pompeu Fabra, Barcelona, Spain
[email protected] September 30, 2015
Abstract Gene coexpression is a common feature employed in predicting buffering relationships that explain genetic interactions, which constitute an important mechanism behind the robustness of cells to genetic perturbations. The complete removal of such buffering connections impacts the entire molecular circuitry, ultimately leading to cellular death. Coexpression is commonly measured through Pearson correlation coefficients. However, Pearson correlation values are sensitive to indirect effects and often partial correlations are used instead. Yet, partial correlation values convey no information on the (linear) influence of the association within the entire multivariate system or, in other words, of the represented edge within the entire network. Jones and West (2005) showed that covariance can be decomposed into the weights of the paths that connect two variables within the corresponding undirected network. Here we provide a precise interpretation of path weights and show that, in the particular case of singleedge paths, this interpretation leads to a quantity we call networked partial correlation whose value depends on both the partial correlation between the intervening variables and their association with the rest of the multivariate system. We show that this new quantity correlates better with quantitative genetic interactions in yeast than classical coexpression measures.
Keywords: Covariance decomposition; Concentration matrix; Gene coexpression; Partial correlation; Undirected graphical model.
1
Introduction
The simultaneous expression of two genes, known as gene coexpression, is a proxy for the presence of a functional association between them. The high-throughput profiling of expression for thousands of genes in parallel provides multivariate data whose analysis with clustering techniques and graphical models has proven to be useful for exploring gene coexpression in terms of gene network representations of the data. This has been exploited in 1
a number of applications ranging from inferring function in poorly characterized genes to predicting buffering relationships behind genetic interactions (Eisen et al., 1998; Friedman, 2004; Wong et al., 2004). In graphical Gaussian models a key role is played by the partial covariance because if a pair of variables is not joined by an edge in the network then the corresponding partial covariance is equal to zero (Lauritzen, 1996). Any partial covariance can be normalized to obtain a partial correlation that, in this context, is typically regarded as the natural measure of the strength of the association between genes encoded by the corresponding edge in the network. However, partial correlation is a measure of coexpression between genes whereas the term genetic interaction is typically used to refer to a wide range of different biological meanings. For instance, interactions resulting from the simultaneous deletion of two genes such as synthetic sick or lethal (SSL) genetic interactions (Tucker and Fields, 2003) are important for understanding how an organism tolerates random mutation and therefore its genetic robustness. In this context, it makes sense to use graphical Gaussian models because gene coexpression plays an fundamental role in SSL interactions (Nijman, 2011). On the other hand, the information provided by the value of a partial correlation is not sufficient to describe the strength of the interaction represented by the edge of the graph. Indeed, robustness depends on a several different factors involving not only coexpression, but also the functional relationships between the pair of genes of interest and the remaining genes in the system. In the analysis of graphical Gaussian models, Jones and West (2005) associated a weight to every path in the network and showed that the covariance between two variables can be computed as the sum of the weights of all the paths joining the two variables. Jones and West (2005) envisaged that this covariance decomposition will provide an alternative to the path coefficients of Wright (1921) which provide covariance decomposition along paths between two variables in the context of directed graphs (see also Chen and Pearl, 2015, for a recent review). The path weights of Jones and West (2005) allow one to compare paths with common endpoints, and therefore, to identify the relative contribution of such path to the value of the corresponding covariance. However, it is not clear either how to interpret the value of a single path or how to compare two paths with different endpoints. In this paper we consider the covariance decomposition of Jones and West (2005) and provide an interpretation of the value taken by the weight of a path. We identify a class of paths, called minimal paths, whose weights have a straightforward interpretation. Furthermore, the weight of a minimal path can be normalized to compare paths with different endpoints within the same graph. Any single edge of the network can be regarded as a path and thus it has a weight associated with it. Hence, we can use such weight as a measure of the strength of the interaction represented by the edge of the network. This is of special interest because any edge is a minimal path and, therefore, its weight has a clear interpretation. In fact, the interpretation of the edge weights suggests us to name these quantities networked partial 2
covariances. We then provide a decomposition of the networked partial covariance into the sum of path weights of certain paths of the network. Since the covariance turns out as a special case of the networked partial covariance, the latter result generalizes the covariance decomposition of Jones and West (2005). This suggests that, as far as undirected graphs are of concern, the networked partial covariance is a natural generalization of the covariance. The networked partial covariance can be normalized to obtain the networked partial correlation whose value depends on both the corresponding partial correlation, and therefore to the coexpression of the genes, and on the strength of the association between the two genes and the remaining genes. We show that the latter provides an useful alternative to the partial correlation as a measure of edge-interaction whenever the relevance of the edge also depends on the connections existing between the endpoints of the edge and the other vertices in the network. This is the case, for instance, of robustness, and we present an application where we show that a biological measure of robustness in yeast is better predicted by the networked partial correlation than by either partial correlation or Pearson correlation. This paper is organized as follows. Sections 2 and 3 provide the required background on genetic interactions, undirected graphical models, path weights and the partial vector correlation coefficient. The interpretation of path weights is discussed in section Section 4 whereas Section 5 introduces the networked partial covariance and correlation. The limitedorder networked partial covariance and its decomposition are given in Section 6. Section 7 presents the application to yeast data and, finally, Section 8 contains a brief discussion.
2
Genetic interactions
The deletion of individual genes in model organisms, such as the budding yeast, Saccharomyces cerevisiae, produces mutants that cannot express the knocked-out gene, constituting one of the primary tools in experimental genetics to elucidate gene function. However, the systematic culture of yeast single-gene mutants has revealed that the majority of its about 6000 genes are dispensable because no sizeable effect in fitness can be observed among the corresponding mutants (Winzeler et al., 1999). An explanation to this observation is the presence of buffering relationships between pairs of genes, by which the absence of one gene is counterbalanced by the expression of its partner. The simultanenous deletion of two genes produces a so-called double mutant organism. When the change in fitness of a double-mutant significantly deviates from the expected change resulting from the combination of the two single mutant fitness effects, then one concludes that there is a so-called genetic interaction between these two genes. The extreme case of this phenomenon, known as synthetic lethality, occurs when two single mutants are viable but the genetic interaction of the two knocked-out genes leads to cell death. This concept has been exploited in field of cancer research to tackle the resistance to chemotherapeutics by trying to target simultaneously multiple oncogenes (Luo et al., 2009). 3
Genetic interactions can be experimentally identified by measuring the deviation in fitness between the expected effect of combining two single mutants and the observed effect of the corresponding double mutant (Baryshnikova et al., 2010). Yet, producing an exhaustive catalog of single and double mutants to enable the exploration of all possible genetic interactions is only feasible in model organisms with a moderate number of genes, such as yeast. For this reason, it is important to have computational tools that enable predicting genetic interactions in larger model organisms (Eddy, 2006) or in non-model ones, such as human (Deshpande et al., 2013), where in addition to the reduced possibilities for genetic manipulation, the number of possible gene pairs can be 10-fold larger. Existing approaches to this problem attempt to predict genetic interactions using multiple biological features such as protein function and localization, homology relationships, protein-protein interactions and gene coexpression measured by Pearson correlation coefficients (Wong et al., 2004; Zhong and Sternberg, 2006; Conde-Pueyo et al., 2009; Deshpande et al., 2013). While gene coexpression is commonly identified using Pearson correlation coefficients, the marginal nature of this quantity often leads to spurious associations resulting from indirect effects and nonbiological sources of variation. To address this problem, we can employ partial correlation coefficients, which hold fixed the values of the rest of the variables, effectively adjusting for unwanted variation. However, partial correlation restricts the information it provides to the relationship between the two genes at hand. In the rest of this article, we are going to show that decomposing the covariance between two genes into the paths that connect them provides additional information that improves the use of gene coexpression as a proxy for the presence of a genetic interaction.
3 3.1
Notation and background Undirected graphical models
Let X ≡ XV be a random vector indexed by a finite set V = {1, . . . , p} so that for A ⊆ V , XA is the subvector of X indexed by A. The random vector XV has probability distribution PV and we denote the covariance matrix of XV by Σ = ΣV V = {σuv }u,v∈V whereas the concentration (or precision) matrix by Σ−1 = K = {κuv }u,v∈V . For B ⊆ V with A ∩ B = ∅ the partial covariance matrix ΣAA·B = ΣAA − ΣAB Σ−1 BB ΣBA is the covariance matrix of XA |XB , that is the residual vector deriving from the linear least square predictor of XA from XB (see Whittaker, 1990, p. 134). Recall that, in the Gaussian case, ΣAA·B coincides with the covariance matrix of the conditional distribution of XA given XB . We use the convention that we write Σ−1 AA when the submatrix extraction is performed before the inversion, that −1 −1 ¯ is ΣAA = (ΣAA )−1 and, similarly, Σ−1 AA·B = (ΣAA·B ) . We write A = V \A to denote the complement of a subset A with respect to V and recall that, from the rule for the inversion −1 of a partitioned matrix, Σ−1 ¯. ¯ = KAA and, accordingly, ΣAA = KAA·A AA·A An undirected graph with vertex set V is a pair G = (V, E) where E is a set of edges,
4
which are unordered pairs of vertices; formally E ⊆ V × V . The graphs we consider have no self-loops, that is {v, v} 6∈ E for any v ∈ V . A path between x and y in G is a sequence π = hx = v1 , . . . , vk = yi of k ≥ 2 distinct vertices such that {vi , vi+1 } ∈ E for every i = 1, . . . , k − 1 and we denote by Πxy the collection of all paths from x to y in G. A cycle is a path with the modification that x = y and a tree is a connected graph with no cycles or, equivalently, an undirected graph with |Πxy | = 1 for every x, y ∈ V , x 6= y. We say that a graph G 0 = (V 0 , E 0 ) is a subgraph of G if V 0 ⊆ V and E 0 ⊆ E. The subgraph of G induced by A ⊆ V is the undirected graph GA with vertex set A and edges EA = {{u, v} ∈ E : u, v ∈ A}. We denote by V (π) ⊆ V and E(π) ⊆ E the set of vertices and edges of the path π, respectively, and by G(π) the subgraph of G corresponding to the path π; formally, G(π) = (V (π), E(π)). Furthermore, when clear from the context, in order to improve the readability of sub- and super-scripts we will set P ≡ V (π). We say that the concentration matrix K of XV implies the graph G = (V, E) if every nonzero off-diagonal entry of K corresponds to an edge in G. The concentration graph model (Cox and Wermuth, 1996) with graph G is the family of multivariate normal distributions whose concentration matrix implies G. The latter model has also been called a covariance selection model (Dempster, 1972) and a graphical Gaussian model (Whittaker, 1990); we refer the reader to Lauritzen (1996) for details and discussion.
3.2
Path weights
Let V be a finite set and G = (V, E) be an undirected graph. Furthermore, let π be a path from x to y in G and Γ ≡ ΓV V a positive definite matrix indexed by the elements of V. We set Y ω(π, Γ) ≡ (−1)|P |+1 |ΓP P | {Γ−1 }uv (1) {u,v}∈E(π)
where |P | denotes the cardinality of P = V (π) whereas |ΓP P | is the determinant of ΓP P . Jones and West (2005) introduced (1) in an alternative formulation that relies on the equality |ΓP P | =
|ΘP¯ P¯ | |Θ|
(2)
where Θ = Γ−1 , and with the convention that |ΘP¯ P¯ | = 1 whenever P¯ = ∅. Theorem 3.1 (Jones and West (2005)). Let K = Σ−1 be the concentration matrix of XV . If K implies the graph G = (V, E) then for every x, y ∈ V it holds that X σxy = ω(π, Σ) (3) π∈Πxy
where it follows from (1) and (2) that ω(π, Σ) = (−1)|P |+1
|KP¯ P¯ | |K|
5
Y {u,v}∈E(π)
κuv .
(4)
We call ω(π, Σ) in (4) the (full) path weight of π relative to XV . Furthermore, we will refer to (3) with the name of the covariance decomposition over G because it gives a decomposition of σxy into the sum of the path weights for all the paths connecting the two vertices in G. Accordingly, the weight ω(π, Σ) represents the contribution of the path π to the covariance σxy . It follows from the equation (5) below that the term (−1)|P |+1 in (4) is such that ω(π, Σ) has the same sign as the product of the partial correlations corresponding to the edges of the path but, otherwise, the there is no clear interpretation associated with the value taken by a path weight. Finally, we recall that another interesting decomposition of the covariance in Gaussian models in terms of walk-weights can be found in Malioutov et al. (2006) and references therein. Unlike paths, walks can cross an edge multiple times.
3.3
The partial vector correlation coefficient
We denote by ρxy the correlation coefficient of the variables Xx and Xy , with x, y ∈ V . Furthermore, we write ρxy·V \{x,y} to denote the partial correlation coefficient of Xx and Xy given XV \{x,y} , and recall that −κxy ρxy·V \{x,y} = √ κxx κyy
(5)
(see Lauritzen, 1996, p. 130). In the literature, different quantities have been introduced in order to provide a generalization of the concept of (partial) correlation from pairs of variables to pairs of vectors; see Mardia et al. (1979, Section 6.5.4), Timm (2002, p. 485) and Kim and Timm (2006, Section 5.6) for a review of measures of correlation between vectors. The rest of this section is devoted to a coefficient, called the partial vector correlation, which naturally arises in the theory of path weights developed in this paper. For a pair A, B ⊆ V , with A ∩ B = ∅, Hotelling (1936) introduced the vector alienation coefficient defined as λ(A)(B) ≡ |ΣA∪BA∪B |/ (|ΣAA | × |ΣBB |). Notice that the sampling version of λ(A)(B) is the Wilks’ lambda, used to test the independence of XA and XB under normality. Furthermore, λ(A)(B) =
r Y (1 − %2i )
(6)
i=1
where %i , for i = 1, . . . , r, is the i-th canonical correlation between XA and XB and r = min(|A|, |B|); see Mardia et al. (1979, Section 6.5.4) and Timm (2002, p. 485). The vector alienation coefficient was used by Rozeboom (1965) to define the vector correlation coefficient given by q (7) ρ(A)(B) ≡ 1 − λ(A)(B) and it is easy to check that, for A = {x}, ρ2(A)(B) coincides with the square of the multiple correlation coefficient so that if also B = {y} then ρ2(A)(B) = ρ2xy (see also Timm, 2002, p. 485). 6
Consider a subset C ⊆ V such that A ∩ C = B ∩ C = ∅. Rozeboom (1965) generalized (7) to the partial vector correlation coefficient as follows ρ(A)(B)·C =
q
1 − λ(A)(B)·C
where
λ(A)(B)·C =
|ΣA∪BA∪B·C | . |ΣAA·C ||ΣBB·C |
(8)
We remark that the covariance matrices we consider are assumed to be positive definite so that 0 ≤ ρ(A)(B)·C < 1. Furthermore, ρ(A)(B) = ρ(A)(B)·∅ , and we use the convention that ρ(A)(B)·C = 0 whenever either A = ∅ or B = ∅. Note that, for A = {x} and B = {y} it holds that ρ2(A)(B)·C = ρ2xy·C that is the square of the partial correlation.
4
Interpretation and comparison of path weights
In this section we show that a path weight can be written as the ratio between a partial path weight and a simple function of a partial vector correlation coefficient. The analysis of these two components allows us to gain insight into meaning of path weights and the kind of information they provide. Then, we identify a class of paths whose weights can be normalized to obtain a measure of association that can be used to compare paths that do not have the same endpoints. Let K = Σ−1 be the concentration matrix of XV . If K implies the graph G = (V, E), and π is a path between x and y in G, then it follows that π is also a path in GA = (A, EA ) for any A such that V (π) ⊆ A ⊆ V . On the other hand, the concentration matrix KAA = Σ−1 ¯ AA·A of XA |XA¯ implies the subgraph GA and, therefore, it makes sense to compute the weight of π with respect to the distribution of XA |XA¯ , that is ω(π, ΣAA·A¯ ), which we call the partial weight of π relative to XA |XA¯ . More specifically, it follows from (1) and (2) that Y
ω(π, ΣAA·A¯ ) = (−1)|P |+1 |ΣP P ·A¯ |
κuv
{u,v}∈E(π)
= (−1)|P |+1
|KA\P A\P | |KAA |
Y
κuv .
{u,v}∈E(π)
An immediate consequence of Theorem 3.1 is that ω(π, ΣAA·A¯ ) represents the contribution of the path π to the partial covariance σxy·A¯ . Corollary 4.1. Let K = Σ−1 be the concentration matrix of XV . If K implies the graph G = (V, E), then for every A ⊆ V and x, y ∈ A, x 6= y, it holds that σxy·A¯ =
X
ω(π, ΣAA·A¯ ).
(9)
π∈Πxy ;V (π)⊆A
Proof. The result follows by applying Theorem 3.1 to the distribution of XA |XA¯ and noticing that the set of paths π ∈ Πxy such that V (π) ⊆ A coincides with the set of all paths between x and y in GA . 7
It is important to remark that equation (9) can be regarded as a generalization of the covariance decomposition in (3), as it coincides with the latter when A = V . Hence, we refer to (9) as to the partial covariance decomposition over G. The theory of path weights we develop here relies on the connection existing between full path weights and partial path weights. The following result provides a recursive rule to compute the partial weight of a path π relative to XA |XA¯ from the partial weight of π relative to XA0 |XA¯0 for any A0 such that A ⊆ A0 . An interesting consequence of it is that the partial vector correlation coefficient arises naturally as an updating multiplicative constant. Lemma 4.2. Let K = Σ−1 be the concentration matrix of XV and, futhermore, assume that K implies the grah G = (V, E) and let π be a path in G. If A and A0 is a pair of sets such that V (π) ⊆ A ⊆ A0 ⊆ V then it holds that ω(π, ΣAA·A¯ ) = ω(π, ΣA0 A0 ·A¯0 ) × {1 − ρ2(P )(A0 \A)·A¯0 } where P = V (π). Hence, sgn {ω(π, ΣAA·A¯ )} = sgn {ω(π, ΣA0 A0 ·A¯0 )}
and
|ω(π, ΣAA·A¯ )| ≤ |ω(π, ΣA0 A0 ·A¯0 )|
where, for nonzero weights, the latter inequality is an equality if and only if ρ(P )(A0 \A)·A¯0 = 0. Proof. See the Appendix A By applying Lemma 4.2 with A0 = V we can see that ω(π, ΣAA·A¯ ) = ω(π, Σ) × {1 − ρ2(P )(A) ¯ ¯ }. This shows that for any A ⊆ V the partial weight of the path π relative to XA |XA can be obtained as the product of the full weight of π by a term that is a function of the linear association between the variables indexed by the vertices in the path P = V (π) and ¯ Next, we notice that for a given path π one can compute a the variables indexed by A. weight ω(π, ΣAA·A¯ ) for every A such that P ⊆ A ⊆ V , thereby obtaining a collection of weights for π. It follows from Lemma 4.2 that all such weights have the same sign and that their values are bounded by ω(π, ΣP P ·P¯ ) and ω(π, Σ), so that |ω(π, ΣP P ·P¯ )| ≤ |ω(π, ΣAA·A¯ )| ≤ |ω(π, Σ)| for every A such that P ⊆ A ⊆ V . Lemma 4.2 allows us also to write the connection between the above lower and upper bounds in the form ω(π, Σ) =
ω(π, ΣP P ·P¯ ) . 1 − ρ2(P )(P¯ )
(10)
The ratio in (10) provides insight into the value taken by the weight ω(π, Σ) of the path π. It shows that ω(π, Σ) is obtained by combining the information provided by ω(π, ΣP P ·P¯ ) and ρ(P )(P¯ ) . More concretely, it is computed by multiplying the partial path weight ω(π, ΣP P ·P¯ ), that is the smallest partial weight associated with the path π, by 1/{1−ρ2(P )(P¯ ) }. The latter 8
quantity is (i) always ≥ 1 and (ii) an increasing function of ρ(P )(P¯ ) . Furthremore, it is worth noticing that ω(π, ΣP P ·P¯ ) and ρ(P )(P¯ ) provide two clearly distinct pieces of information. Specifically, (a) ω(π, ΣP P ·P¯ ) is the weight of the path π (linearly) adjusted for all the variables not involved in the path. Hence, as far as linear relationships are of concern, this quantity provides no information on how the path π interacts with the variables in rest of the network. This kind of interpretation is even stronger in the case where the variables are jointly Gaussian because ω(π, ΣP P ·P¯ ) is the weight of the path π computed in the conditional distribution of XP |XP¯ . (b) The vector correlation ρ(P )(P¯ ) depends on π only through its vertex set P = V (π) and it is a measure of the strength of the association of the variables in the path with the remaining variables. For instance, when the variables indexed by V (π) are disconnected from the rest of the network it holds that ρ(P )(P¯ ) = 0 and therefore ω(π, Σ) = ω(π, ΣP P ·P¯ ). Finally, note that if we apply Corollary 4.1 with A = V (π) then we can write σxy·P¯ =
X
ω(π 0 , ΣP P ·P¯ ),
(11)
π 0 ∈Πxy ;V (π 0 )⊆V (π)
where P = V (π). Hence, ω(π, ΣP P ·P¯ ) in (10) can be interpreted as the contribution of the path π to the partial covariance between Xx and Xy given all the variables that do not belong to the path. As we shall see below, Equation (10) takes a specially simple form when there exists a unique path between x and y involving only vertices in V (π). Definition 4.1. We say that π ∈ Πxy is a minimal path between x and y in G = (V, E) if and only if no proper subset of vertices A ⊂ V (π) can form a path between x and y in G. Clearly, every path between x and y that has the smallest number of edges among all paths in Πxy (a shortest path), is minimal. Furthermore, it is easy to see that a path π is minimal if and only if GV (π) = G(π) and this provides a straightforward way to check for path minimality. Proposition 4.3. Let K = Σ−1 the concentration matrix of XV . If K implies the graph G = (V, E) then for every minimal path π in G between x and y it holds that ω(π, ΣP P ·P¯ ) = σxy·P¯ where P = V (π) and, consequently, ω(π, Σ) =
σxy·P¯ . 1 − ρ2(P )(P¯ )
(12)
Proof. This follows immediately from (10) and (11) and from the definition of minimal path, because if a path π satisfies Definition 4.1, then GV (π) = G(π), and therefore, there is no other path π 0 ∈ Πxy such that π 0 6= π and V (π 0 ) ⊆ V (π). 9
1
2
b
3
b
0.4
8 b
b
b
1.2
x
b
b b
y
2
1.2 b
b
3 b
1.2
5
4
1
9 0.9
1.7 b
b
5
4
0.9
b
(a)
7 b
b
6
(b)
Figure 1: Undirected graphs of the Examples 4.1 and 5.1. Equation (12) shows that the weight of a minimal path can be written as a straightforward combination of two well-established quantities: the partial covariance and the vector correlation. This is important for two main reasons: (i) it provides an intuitive interpretation of the weights of minimal paths; and (ii) it enables the direct comparison of the weights between different minimal paths, even with different endpoints. To achieve the latter one needs to normalize the quantity defined in Equation (12) as √
ρxy·P¯ ω(π, Σ) = . σxx·P¯ σyy·P¯ 1 − ρ2(P )(P¯ )
(13)
Example 4.1. Consider the graph G in Figure 1(a). There are six paths in G between x and y, namely hx, 1, 2, yi, hx, 1, 2, 4, 5, yi, hx, 2, yi, hx, 2, 4, 5, yi, hx, 3, 4, 5, yi and hx, 3, 4, 2, yi. They include two minimal paths which are hx, 2, yi and hx, 3, 4, 5, yi and therefore, by Proposition 4.3, ω(hx, 2, yi, Σ) =
σxy·1345 1 − ρ2(x2y)(1345)
and ω(hx, 3, 4, 5, yi, Σ) =
σxy·12 . 1 − ρ2(x345y)(12)
The paths h1, 2, yi, between 1 and y, and h3, 4, 5, yi, between 3 and y, are both minimal and their relevance can be meaningfully compared through the quantities ρ1y·345x 1 − ρ2(12y)(345x)
and
ρ3y·12x . 1 − ρ2(345y)(12x)
In graphical modelling, the family of models identified by graphs which are trees plays a relevant role due to its computational tractability; see, among others, Edwards et al. (2010), Choi et al. (2011) and Højsgaard et al. (2012, Section 7.4). Since in a tree it holds that |Πxy | = 1 for every x, y ∈ V with x 6= y, then every path in a tree is minimal, and therefore, all the path weights in a tree have the form (12). In fact, it is not difficult to show that minimal paths can be used to characterize trees, that is that a connected undirected graph G is a tree if and only if all its paths are minimal. In the next section we exploit the fact that if {x, y} ∈ G then hx, yi is a minimal path between x and y in G.
10
5
Networked partial covariance and correlation
The decomposition of the covariance over an undirected graph in (3) associates a weight to every path between x and y in G. Note that every edge {x, y} in the graph corresponds to a path hx, yi that we shall call a single-edge path. As with every other path, a single-edge path has a weight associated with it, that we compactly denote by ωxy·(V \{x,y}) ≡ ω(hx, yi, Σ). Since hx, yi is obviously a minimal path it follows from Proposition 4.3 that ωxy·(V \{x,y}) =
σxy·V \{x,y} , 1 − ρ2(xy)(V \{x,y})
(14)
where we have used the suppressed notation ρ(xy)(B) = ρ({x,y})(B) , and refer this quantity as networked partial covariance. Similarly to the interpretation of path weights given in the previous section, networked partial covariances can be also interpreted by looking separately at the numerator and the denominator of (14). The numerator is the partial covariance σxy·V \{x,y} . This is computed after Xx and Xy are linearly adjusted for the remaining variables in the network, and therefore, it does not provide any information on the strength of the linear association between the variables indexed by x and y and the remaining variables in the network. The weight in (15) includes this additional information by dividing the partial correlation by (1 − ρ2(xy)(V \{x,y}) ). Clearly, σxy·V \{x,y} = 0 implies ωxy·(V \{x,y}) = 0 and, furthermore, ωxy·(V \{x,y}) and σxy·V \{x,y} have the same sign and it holds that |ωxy·(V \{x,y}) | ≥ |σxy·V \{x,y} |. If the pair of vertices {x, y} is disconnected from the rest of the network then ρ(xy)(V \{x,y}) = 0 and, consequently, ωxy·(V \{x,y}) = σxy·V \{x,y} . Hence ωxy·(V \{x,y}) can be regarded as an ‘enlarged’ version of the partial covariance so as to keep into account the association existing between X{x,y} and the remaining variables in the network, and this motivates the name of networked partial covariance. Just as covariances need to be normalized into correlations to enable their comparison, we provide here below the normalized version of equation (14) that we shall call the networked partial correlation: ρxy·V \{x,y} ωxy·(V \{x,y}) = . ψxy·(V \{x,y}) ≡ √ σxx·V \{x,y} σyy·V \{x,y} 1 − ρ2(xy)(V \{x,y})
(15)
Note that although this is a normalized quantity, and therefore, comparable between edges from the same graph, it may take values outside the interval [−1, 1]. Example 5.1. Consider the case where |V | = 9 and the concentration matrix K of XV induces the graph G = (V, E) in Figure 1(b). More specifically, we take K to have unit diagonal and off-diagonal elements κuv = −0.4 for every {u, v} ∈ E and κuv = 0 otherwise. The simplified structure of the concentration matrix in this example makes it easy to appreciate the differences existing between partial correlations and networked partial correlations. Indeed, in this case, the partial correlations ρuv·V \{u,v} for {u, v} ∈ E put all the edges of the graph on an equal footing because they are all equal to 0.4. On the other hand, the 11
networked partial correlations, whose values are reported in Figure 1(b), are not constant and, in this case, their differences only depend on the structure of the graph. Indeed, the edge {8, 9} is disconnected from the rest of the vertices so that the values of its networked partial correlation and partial correlation coincide; i.e. ψ89·(V \{8,9}) = ρ89·V \{8,9} = 0.4. The edge {4, 5} has the largest number of connections with other vertices in the graph and, accordingly, its networked partial correlation takes the largest value ψ45·(V \{4,5}) = 1.7. More generally, in this example, the value of the networked partial correlation of every edge is proportional to the number of vertices adjacent to the edge.
6
Limited-order networked partial covariance decomposition
In practical applications, it is common to deal with limited-order partial covariances, that are partial covariances σxy·Q with Q ∪ {x, y} ⊂ V , rather than with full-order partial covariances σxy·V \{x,y} . Typically, this is due to the fact that there exist unobserved variables, possibly not explicitly considered in the analysis, or that the number of variables exceeds the sample size so that the sample covariance matrix has not full rank, thereby making the computation of full-order partial covariances unfeasible; see Castelo and Roverato (2006), Zuo et al. (2014) and references therein. In this section, we formally introduce limited-order networked partial covariances and provide a rule to decompose these quantities over the paths between x and y in GV \Q . In the case where Q = ∅ the limited-order networked partial covariance of Xx and Xy is computed on the distribution of X{x,y} and it coincides with the covariance σxy . From this viewpoint, the decomposition of limited-order networked partial covariances over GV \Q can be regarded as a generalization of the covariance decomposition of Jones and West (2005) given in Theorem 3.1. An interesting feature of the decomposition of the limited-order networked partial covariance given in Theorem 6.1 below is that it allows one to understand the connection existing between the weight of a path in a graph implied by a multivariate distribution and the weight of a path in a graph implied by a marginal distribution. More concretely, it shows how a path that exclusively intersects vertices indexing marginalized variables redistributes its weight among the paths connecting the variables of the marginal distribution. Consider the concentration matrix K = Σ−1 of XV that implies the graph G = (V, E). For a subset Q ⊂ V \{x, y} we define the limited-order weight of the single-edge path hx, yi as the weight of hx, yi relative to XQ∪{x,y} ; formally ωxy·(Q) ≡ ω(hx, yi, ΣQ∪{x,y}Q∪{x,y} ). Since it follows from Proposition 4.3 that σxy·Q , (16) ωxy·(Q) = 1 − ρ2(xy)(Q) then we refer to ωxy·(Q) as a limited-order networked partial covariance. Accordingly, the corresponding limited-order networked partial correlation can be defined as ωxy·(Q) ρxy·Q ψxy·(Q) = √ = . (17) σxx·Q σyy·Q 1 − ρ2(xy)(Q) 12
The following theorem shows how limited-order networked partial covariances can be decomposed over the paths of GV \Q . Theorem 6.1. Let K = Σ−1 be the concentration matrix of XV . If K implies the graph G = (V, E) then for every x, y ∈ V and Q ⊆ V \{x, y} it holds that X (18) ω(π, Σ) × (1 − ρ2(P \{x,y})(Q)·{x,y} ) ωxy·(Q) = π∈Πxy ;V (π)⊆V \Q
where P = V (π). Proof. See the Appendix A We first notice that for Q = V \{x, y} equation (18) simplifies to ωxy·(V \{x,y}) = ω(hx, yi, Σ). More generally, for Q ⊂ V \{x, y}, Theorem 6.1 shows that the limited-order networked partial covariance ωxy·(Q) can be computed by taking the corresponding (full-order) networked partial covariance and then by adding to this quantity a value for every path between x and y that traverses only vertices associated with marginalised variables or, equivalently, for every path between x and y in GV \Q . More precisely, every path π ∈ Πxy such that V (π) ∩ Q = ∅ contributes to the value of ωxy·(Q) with the proportion (1 − ρ2(P \{x,y})(Q)·{x,y} ) of its weight ω(π, Σ). Interestingly, any path with at least one endpoint not equal to x or y, and any path between x and y involving at least one vertex in Q, plays no role in the computation of ωxy·(Q) . In order to make the rules for limited-order networked partial covariance decomposition more concrete, Appendix B gives a detailed description of the case where |V | = 4 and |Q| = 1. When ρ(P \{x,y})(Q)·{x,y} = 0 the quantity ω(π, Σ) fully contributes to the computation of ωxy·(Q) . This happens, for instance, when Q = ∅ so that (18) simplifies to ωxy·(∅) = P π∈Πxy ω(π, Σ) and, since σxy = ωxy·(∅) , it follows that the covariance decomposition of Jones and West (2005) given in Theorem 3.1 turns out to be a special case of Theorem 6.1. Because any networked partial covariance is a path weight, an appealing feature of (18) is that both the term ωxy·(Q) in the left hand side and the terms ω(π, Σ) in the right hand side are path weights. This confers consistency to (18) that can thus be regarded as a rule to update the weight of single-edge paths when the multivariate system is marginalized over some variables. Interestingly, also the covariance decomposition of Jones and West (2005) is feasible of this kind of interpretation. In Jones and West (2005) the covariance σxy is decomposed to investigate the relative role of paths in determining the relationship between x and y in the marginal distribution of X{x,y} . The updating rule (18) shows that, when the interest is for the relative role played by paths in determining the relationship between x and y in the distribution of XQ∪{x,y} , then the networked partial covariance ωxy·(Q) should be decomposed. This provides a motivation for the use of the networked partial covariance that, in this context, can be regarded as a natural generalization of the covariance. From this viewpoint, it is also worth noticing that, by multiplying the left and right hand side of (18) by 1−ρ2(xy)(Q) , Theorem 6.1 can be restated so as to provide a rule to decompose σxy·Q . 13
However, consistency of interpretation between the left and the right side of the equation is lost in this case. We close this section by describing the special case where Q = {z} so that the system is marginalized over the variables indexed by V \(Q ∪ {x, y}). Since we look at the marginal distribution of X{x,y,z} , there are at most two paths between x and y, specifically, hx, yi and hx, z, yi, and Theorem 6.1 allows us to show how the paths between x and y in G confer their weights to the paths of X{x,y,z} . It follows from the direct application of Theorem 6.1 that ωxy·(z) =
X
ω(π, Σ) × (1 − ρ2(z)(P \{x,y})·{x,y} ).
π∈Πxy ;z6∈V (π)
Hence, ωxy·(z) is equal to the weight of hx, yi in G, that is ω(hx, yi, Σ), plus the proportion (1 − ρ2(z)(P \{x,y})·{x,y} ) of every path π in G which does not involve the vertex z. On the other hand, if we exploit the fact that ω(hx, yi, Σ) + ω(hx, z, yi, Σ) = σxy then we can write ω(hx, z, yi, Σ{x,y,z}{x,y,z} ) =
X
ω(π, Σ)
π∈Πxy ;z∈V (π)
+
X
ω(π, Σ) × ρ2(z)(P \{x,y})·{x,y} ,
π∈Πxy ;z6∈V (π)
which shows that the weight of ω(hx, z, yi, Σ{x,y,z}{x,y,z} ) is equal to the sum of the weights of all the paths in G which involve the vertex z, including therefore the weight ω(hx, z, yi, Σ), plus the proportion ρ2(z)(P \{x,y})·{x,y} of any path π in G which does not involve the vertex z. See Appendix B for more details in the case |V | = 4.
7
Identification of genetic interactions in yeast
Costanzo et al. (2010) generated quantitative genetic interaction profiles for about 75% of all genes in yeast in a systematic way using a technique called Synthetic Genetic Array (SGA) analysis. This technique enabled the quantification of the fitness effect of a double mutant with respect to the expected one calculated from the combination of two single mutants, producing a so-called SGA score for about 5 million gene pairs (Baryshnikova et al., 2010). Each of those SGA scores has also an associated p-value that captures the reliability of the SGA score based on a combination of the observed variation across four experimental replicates with estimates of the background log-normal error distributions for the corresponding mutants (Costanzo et al., 2010). We downloaded those SGA scores and p-values for the about 5 million gene pairs, which involved 4457 genes. Here we use these SGA scores as gold-standard for the fitness effect of genetic interactions. To demonstrate the usefulness of networked partial correlations as a proxy for the strength of buffering relationships between genes in the context of identifying genetic interactions, we downloaded raw gene expression data produced by Brem and Kruglyak (2005) 14
from two yeast strains, a wild-type (RM11-1a) and a lab strain (BY4716), which were crossed to generate n = 112 segregants. These raw gene expression data were processed as described by Tur et al. (2014) leading to a normalized gene expression data set formed by p = 6216 genes and n = 112 samples. The calculation of the networked partial correlation between two given genes from expression data involves the estimation of two quantities (see Eq. 15): (i) the partial correlation between these two genes; and (ii) the vector correlation between this pair of genes and the rest of the genes. Because in the expression data the number of genes (p) is much larger than the number of samples (n), i.e., p n, the calculation of these two quantities is not straightforward and requires the use of statistical methods specifically tailored to deal with high-dimensional data where p n. To estimate partial correlation coefficients and their p-values for the null hypothesis of zero-partial correlation, we used the empirical Bayes approach by Sch¨afer and Strimmer (2005) implemented in the R package GeneNet, which works by calculating a shrinkage estimate of the inverse covariance. To estimate vector correlations we exploited (6) and used the sparse canonical correlation analysis technique by Witten et al. (2009) implemented in the R package PMA.
a
b Networked partial correlation
Networked partial correlation
0.5 0.4 0.3 0.2 0.1 0.0 −0.1 −0.2
0.1
0.0
−0.1
−0.2 −0.02 −0.01
0.00
0.01
0.02
0.03
−0.02 −0.01
Partial correlation
0.00
0.01
0.02
0.03
Partial correlation
Figure 2: Networked partial correlations on the y-axis as function of the corresponding partial correlations on the x-axis, calculated from yeast expression data (Brem and Kruglyak, 2005). (a) Values for all pairs of genes in the expression data set. (b) Values for those pairs with significant partial correlations (FDR < 1%). Values for the filtered pairs shown in Figures 3 and 4 are highlighted with circles. Between the 4457 genes forming pairs with SGA score and SGA p-values and the 6216 genes with expression data, there were 4099 genes in common. We restricted the rest of the analysis to the 3,966,346 pairs formed by these 4099 genes. A comparison of the values of partial and networked partial, correlations, shown in Figure 2(a), reveals that differences between these two quantities grow proportionally to their absolute value. Note that small values of partial correlation may still become large networked partial correlation values. Positive and negative SGA scores have a very different interpretation. While negative 15
a
c
b −0.2
−0.2
−0.2
−0.4
−0.4
−0.6
−0.4
−0.6
−0.8
0.0
0.2
−0.6
−0.8
r = −0.39 0.4
0.6
0.8
r = −0.46
SGA score
0.0
SGA score
0.0
SGA score
0.0
−0.8
r = −0.40 0.010
Pearson correlation
0.015
0.020
0.025
0.02 0.04 0.06 0.08 0.10 0.12
Partial correlation
Networked partial correlation
Figure 3: SGA scores as function of the three different gene correlation measures. SGA scores indicate a more severe fitness defect than expected, positive ones identify double mutants with a less severe fitness defect than expected (Costanzo et al., 2010). One of the simpler buffering relationships behind a genetic interaction is the positive coexpression of two genes whose simultaneous deletion produces a double mutant with a severe fitness defect that in its extreme case can lead to synthetic lethality. For this reason, and to provide a meaningful comparison between SGA scores and the three correlation measures, we restricted the analysis to those genes pairs with negative and significant SGA scores with FDR < 1% on the corrected SGA p-value, and with positive and significant coexpression with FDR < 1% on the corrected p-value for zero-partial correlation. There were 227 pairs meeting these criteria, which are shown in Figure 3 where their SGA score is plotted on the y-axis as a function of three different gene correlation measures on the x-axis, concretely, Pearson correlation (a), partial correlation (b) and networked partial correlation (c).
log −SGA score
−1 −2 −3 y = −3.05 + 1.46x r = 0.31 R 2 = 0.09 P = 2.92e−06
−4 −5 0.0
0.2
0.4
0.6
Pearson correlation
0.8
log −SGA score
b
0
c 0
0
−1
−1
−2 −3 y = 2.7 + 1.07x r = 0.27 R 2 = 0.07 P = 2.92e−05
−4 −5 −5.0
−4.5
−4.0
log Partial correlation
log −SGA score
a
−2 −3 y = 0.75 + 0.93x r = 0.40 R 2 = 0.16 P = 6.71e−10
−4 −5 −4.0
−3.5
−3.0
−2.5
−2.0
log Networked partial correlation
Figure 4: SGA scores as function of different gene correlation measures, in logarithmic scale. The relationship between each pair of quantities shows a trend where larger correlation is associated with larger negative SGA scores, as expected. This trend, however, is clearly non-linear probably due to the restriction of SGA scores to negative values, where a large fraction of pairs accumulate in SGA scores, or partial correlations, close to 0. To have a clearer picture of the behavior of these three correlation measures in relationship with SGA 16
scores in Figure 4 we show the same values in logarithmic scale for absolute SGA scores, partial correlations and networked partial correlations. These plots reveal that there is a significant linear relationship between each of these three correlation measures and SGA scores. However, networked partial correlations are more strongly associated with SGA scores (R2 = 0.16) than Pearson (R2 = 0.09) and partial correlations (R2 = 0.07). Moreover, a linear model including the three of them as independent variables yields networked partial correlations as a significant predictor (p < 0.001) while Pearson and partial correlations are not (p > 0.3). This evidence suggests that networked partial correlations can potentially improve the contribution of gene coexpression to the computational identification of genetic interactions.
8
Discussion
We have built on the theory of covariance decomposition introduced by Jones and West (2005) by clarifying the role played by path weights and generalising the decomposition of the covariance, that is computed in a bivariate marginal distribution, to the decomposition of a networked partial covariance computed in an arbitrary marginal distribution. The latter identifies the network partial covariance as a natural generalization of the covariance in network analysis. An important contribution in our work is the distinction of path weights associated with minimal paths, which has lead to the introduction of the networked partial correlation. We have shown that this quantity can be more useful than classical coexpression measures to identify gene pairs involved in genetic interactions. The theory of path weights developed in this paper can be useful beyond the analysis of genetic interactions in yeast and, in particular, it may enable applications involving the comparison of paths across multivariate data sets. Open questions concern the normalization of non-minimal paths and the efficient estimation of path weights. The latter problem is specially relevant in large-scale covariance models where the number of variables is larger than the sample size, i.e., with p n. We would like to remark that we have provided two different generalizations of the covariance decomposition of Jones and West (2005) given in Theorem 3.1. Specifically, Corollary 4.1 extends the covariance decomposition to the case where there exist selection variables, that is, hidden variables over which conditioning occurs, whereas Theorem 6.1 extends the covariance decomposition to the case where there exist latent variables, that is, hidden variables over which marginalization occurs. We deem that the case dealing with latent variables is more relevant because it clarifies the mechanism through which the weight of a path redistributes over the network as the consequence of a marginalization. We also notice that the two generalizations could be unified in an even more general framework involving both selection and latent variables.
17
A
Proofs
Proof of Lemma 4.2 If we set C = V \A0 and B = A0 \A so that V = A ∪ B ∪ C and A0 = A ∪ B then it holds −1 0 0 0 0 that Σ−1 A0 A0 ·C = {Σ }A A = KA A , and therefore, ω(π, ΣA0 A0 ·A¯0 ) = ω(π, ΣA0 A0 ·C ) Y
= (−1)|P |+1 |ΣP P ·C |
κuv
{u,v}∈E(π)
= (−1)|P |+1 |ΣP P ·B∪C | = ω(π, ΣAA·B∪C )
Y {u,v}∈E(π)
|Σ P P ·C | κuv |ΣP P ·B∪C |
|ΣP P ·C | |ΣP P ·B∪C |
(19)
=
ω(π, ΣAA·B∪C ) λ(P )(B)·C
(20)
=
ω(π, ΣAA·B∪C ) 1 − ρ2(P )(B)·C
(21)
=
ω(π, ΣAA·A¯ ) , 1 − ρ2(P )(A0 \A)·A¯0
−1 where (19) follows from the fact that P = V (π) ⊆ A ⊆ A0 and that Σ−1 AA·B∪C = {Σ }AA = KAA . Equation (20) holds true because
|ΣP P ·B∪C | |ΣP ∪BP ∪B·C | = = λ(P )(B)·C , |ΣP P ·C | |ΣP P ·C ||ΣBB·C | which is the vector alienation coefficient in (8), and therefore, (21) follows from (8). Both the results concerning the sign and the inequality are a straightforward consequence of the fact that 0 ≤ ρ(P )(A0 \A)·A¯0 < 1.
Proof of Theorem 6.1 Let A = V \Q. Notice that the concentration matrix of the conditional distribution XA |XQ is equal to Σ−1 AA·Q = KAA . Then, it follows from Corollary 4.1 that, X σxy·Q = ω(π, ΣAA·Q ) (22) π∈Πxy ;V (π)⊆A
where, for π ∈ Πxy with V (π) ⊆ A, ω(π, ΣAA·Q ) = (−1)|P |+1 |ΣP P ·Q |
Y {i,j}∈E(π)
If we divide both sides of (22) by 1 − ρ2(xy)(Q) =
|ΣQQ·{x,y} | , |ΣQQ | 18
κij .
4 b
b
1
3 b
b
2
Figure 5: Complete graph on 4 vertices. The grey part highlight the connection of vertex 4 with the rest of the graph.
then we obtain X
ωxy·(Q) =
ω(π, ΣAA·Q )
π∈Πxy ;V (π)⊆A
|ΣQQ | , |ΣQQ·{x,y} |
where ω(π, ΣAA·Q )
|ΣQQ | |ΣQQ·{x,y} |
= (−1)|P |+1 = (−1)|P |+1 = ω(π, Σ) ×
|ΣP P ·Q ||ΣQQ | |ΣQQ·{x,y} | |ΣQQ·P ||ΣP P | |ΣQQ·{x,y} |
Y
κij
{i,j}∈E(π)
Y
κij
{i,j}∈E(π)
|ΣQQ·P | |ΣQQ·{x,y} |
= ω(π, Σ) × (1 − ρ2(P \{x,y})(Q)·{x,y} ), as required.
B
Limited-order networked partial covariance decomposition on 4 vertices
For the graph in Figure 5 we focus on the decomposition of the covariance σ12 , that is on all the paths between vertices 1 and 2. It follows from Theorem 3.1 that σ12 can be computed as the sum of the 5 path weights given in Table 1, where we use the suppressed notation Σ12 to denote Σ{1,2}{1,2} . We remark that this example also covers the case where the graph is not complete because it is sufficient to recall that the corresponding entry of the concentration matrix is equal to zero and, consequently, the same is true for every path involving such edges. Consider now the case where we marginalize over X4 so that Q = {3}. If we write −1 ∗ ΣQ∪{x,y}Q∪{x,y} = Σ−1 123 = {κij }i,j∈{1,2,3} , then the weights associated to the paths between 1 and 2 in the subgraph of the graph in Figure 5 induced by {1, 2, 3} are given in Table 2. It follows from Theorem 6.1 that ω12·(3) = ω12·(3) + ω(h1, 4, 2i, Σ) × (1 − ρ234·12 ) 19
Table 1: Weights of the paths between vertices 1 and 2 in the graph of Figure 5 Path 1 −− 1 −− 1 −− 1 −− 1 −−
2 3 −− 4 −− 3 −− 4 −−
2 3 −− 2 4 −− 2 2
Path weight ω12·(34) ω(h1, 3, 2i, Σ) ω(h1, 4, 3, 2i, Σ) ω(h1, 3, 4, 2i, Σ) ω(h1, 4, 2i, Σ)
= = = = =
−κ12 +κ13 −κ14 −κ13 +κ14
|Σ12 | κ32 |Σ123 | κ43 κ32 |Σ| κ34 κ42 |Σ| κ42 |Σ124 |
Table 2: Weights of the paths between vertices 1 and 2 in the subgraph of the graph in Figure 5 induced by {1, 2, 3}, after marginalization over variable X4 . Path 1 −− 2 1 −− 3 −− 2
Path weight ω12·(3) ω(h1, 3, 2i, Σ123 )
= =
−κ∗12 |Σ12 | +κ∗13 κ∗32 |Σ123 |
If we exploit the fact that the sum of the five path weights in Table 1 is equal to the sum of the two path weights in Table 2, that is equal to σ12 , then we can decompose the weight of the path h1, 3, 2i, relative to X{1,2,3} as follows ω(h1, 3, 2i, Σ123 ) = ω(h1, 3, 2i, Σ) + ω(h1, 4, 3, 2i, Σ) + ω(h1, 3, 4, 2i, Σ) + ω(h1, 4, 2i, Σ) × ρ234·12 . We can conclude that the weight, relative to X{1,2,3} , of the path h1, 2i can be obtained by adding to the weight, relative to X{1,2,3,4} , of the path h1, 2i the proportion (1 − ρ234·12 ) of the weight of path h1, 4, 2i; notice that (1 − ρ2(34)(12) ) = 1 if the edge {3, 4} does not belong to the graph. On the other hand, the weight, relative to X{1,2,3} , of the path h1, 3, 2i can be be obtained by adding to the weight, relative to X{1,2,3,4} , of the path h1, 3, 2i the proportion ρ234·12 of the weight of path h1, 4, 2i, with ρ2(34)(12) = 0 if {3, 4} does not belong to the graph, and, furthermore, the weights of all the remaining paths between 1 and 2 in the graph which involve the vertex 3.
References Baryshnikova, A., M. Costanzo, Y. Kim, H. Ding, J. Koh, K. Toufighi, J.-Y. Youn, J. Ou, B.-J. San Luis, S. Bandyopadhyay, et al. (2010). Quantitative analysis of fitness and genetic interactions in yeast on a genome scale. Nat. Methods 7 (12), 1017–1024. 20
Brem, R. B. and L. Kruglyak (2005). The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl. Acad. Sci. U.S.A. 102 (5), 1572–1577. Castelo, R. and A. Roverato (2006). A robust procedure for Gaussian graphical model search from microarray data with p larger than n. J. Mach. Learn. Res. 7, 2621–2650. Chen, B. and J. Pearl (2015). Graphical tools for linear structural equation modeling. Psychometrika, to appear . Choi, M. J., V. Y. Tan, A. Anandkumar, and A. S. Willsky (2011). Learning latent tree graphical models. J. Mach. Learn. Res. 12, 1771–1812. Conde-Pueyo, N., A. Munteanu, R. V. Sol´e, and C. Rodr´ıguez-Caso (2009). Human synthetic lethal inference as potential anti-cancer target gene detection. BMC Sys. Biol. 3 (1), 116. Costanzo, M., A. Baryshnikova, J. Bellay, Y. Kim, E. D. Spear, C. S. Sevier, H. Ding, J. L. Koh, K. Toufighi, S. Mostafavi, et al. (2010). The genetic landscape of a cell. Science 327 (5964), 425–431. Cox, D. R. and N. Wermuth (1996). Multivariate Dependencies: Models, analysis and interpretation. Chapman and Hall, London. Dempster, A. P. (1972). Covariance selection. Biometrics 28 (1), 157–175. Deshpande, R., M. K. Asiedu, M. Klebig, S. Sutor, E. Kuzmin, J. Nelson, J. Piotrowski, S. H. Shin, M. Yoshida, M. Costanzo, et al. (2013). A comparative genomic approach for identifying synthetic lethal interactions in human cancer. Cancer Res. 73 (20), 6128–6136. Eddy, S. R. (2006). Total information awareness for worm genetics. Science 5766, 1381. Edwards, D., G. C. De Abreu, and R. Labouriau (2010). Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests. BMC Bioinformatics 11 (1), 18. Eisen, M. B., P. T. Spellman, P. O. Brown, and D. Botstein (1998). Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 95 (25), 14863–14868. Friedman, N. (2004). Inferring cellular networks using probabilistic graphical models. Science 303 (5659), 799–805. Højsgaard, S., D. Edwards, and S. Lauritzen (2012). Graphical Models with R. Springer Science & Business Media. Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28 (3/4), 321–377.
21
Jones, B. and M. West (2005). Covariance decomposition in undirected Gaussian graphical models. Biometrika 92 (4), 779–786. Kim, K. and N. Timm (2006). Univariate and Multivariate General Linear Models: Theory and applications with SAS. CRC Press. Lauritzen, S. L. (1996). Graphical Models. Oxford University Press. Luo, J., M. J. Emanuele, D. Li, C. J. Creighton, M. R. Schlabach, T. F. Westbrook, K.-K. Wong, and S. J. Elledge (2009). A genome-wide RNAi screen identifies multiple synthetic lethal interactions with the Ras oncogene. Cell 137 (5), 835–848. Malioutov, D. M., J. K. Johnson, and A. S. Willsky (2006). Walk-sums and belief propagation in Gaussian graphical models. J. Mach. Learn. Res. 7, 2031–2064. Mardia, K. V., J. T. Kent, and J. M. Bibby (1979). Multivariate Analysis. London: Academic Press. Nijman, S. M. (2011). Synthetic lethality: general principles, utility and detection using genetic screens in human cells. FEBS Lett. 585 (1), 1–6. Rozeboom, W. W. (1965). trika 30 (1), 57–71.
Linear correlations between sets of variables.
Psychome-
Sch¨afer, J. and K. Strimmer (2005). An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 21 (6), 754–764. Timm, N. H. (2002). Applied Multivariate Analysis. Springer-Verlag, New York. Tucker, C. L. and S. Fields (2003). Lethal combinations. Nature Genet. 35 (3), 204–205. Tur, I., A. Roverato, and R. Castelo (2014). Mapping eQTL networks with mixed graphical Markov models. Genetics 198 (4), 1377–1393. Whittaker, J. (1990). Graphical Models in Applied Multivariate Analysis. John Wiley & Sons, Chichester. Winzeler, E. A., D. D. Shoemaker, A. Astromoff, H. Liang, K. Anderson, B. Andre, R. Bangham, R. Benito, J. D. Boeke, H. Bussey, et al. (1999). Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285 (5429), 901–906. Witten, D. M., R. Tibshirani, and T. Hastie (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 (5), 515–534. Wong, S. L., L. V. Zhang, A. H. Tong, Z. Li, D. S. Goldberg, O. D. King, G. Lesage, M. Vidal, B. Andrews, H. Bussey, et al. (2004). Combining biological networks to predict genetic interactions. Proc. Natl. Acad. Sci. U.S.A. 101 (44), 15682–15687. 22
Wright, S. (1921). Correlation and causation. J. Agric. Res. 20 (7), 557–585. Zhong, W. and P. W. Sternberg (2006). Genome-wide prediction of C. elegans genetic interactions. Science 311 (5766), 1481–1484. Zuo, Y., G. Yu, M. G. Tadesse, and H. W. Ressom (2014). Biological network inference using low order partial correlation. Methods 69 (3), 266–273.
23