Prediction of Protein Complexes Based on Protein Interaction Data and Functional Annotation Data Using Kernel Methods Shi-Hua Zhang1 , Xue-Mei Ning1 , Hong-Wei Liu2 , and Xiang-Sun Zhang1 1
Institute of Applied Mathematics, Academy of Mathematics and Systems Science Chinese Academy of Sciences, Beijing 100080, China {zsh, nxm}@amss.ac.cn,
[email protected] 2 School of Economics, Renmin University of China, Beijing 100872, China
[email protected] Abstract. Prediction of protein complexes is a crucial problem in computational biology. The increasing amount of available genomic data can enhance the identification of protein complexes. Here we describe an approach for predicting protein complexes based on integration of protein-protein interaction (PPI) data and protein functional annotation data. The basic idea is that proteins in protein complexes often interact with each other and protein complexes exhibit high functional consistency/even multiple functional consistency. We create a proteinprotein relationship network (PPRN) via a kernel-based integration of these two genomic data. Then we apply the MCODE algorithm on PPRN to detect network clusters as numerically determined protein complexes. We present the results of the approach to yeast Sacchromyces cerevisiae. Comparison with well-known experimentally derived complexes and results of other methods verifies the effectiveness of our approach.
1
Introduction
Cellular organization and function are carried out through gene/protein interactions. With ever-increasing different types of genomic data such as DNA sequences, gene expression measurement, protein-protein interaction, and protein phylogenetic profiles, reconstruction of biological machinery from these genomic data is a crucial problem. Protein complex is a group of proteins that often interact with each other, forming a special biological chemical machinery. However, despite recent advances in detection technologies of protein interactions, only a very few of many possible protein complexes has been experimentally determined [1]. Then prediction of protein complexes is a key problem in computational biology. One of such work has been done within the PPI networks [2,3,4]. Proteins in a complex often interact with each other, so protein complexes generally correspond to dense subgraphs in the PPI networks. Recently, three approaches to network clustering including the MCODE (Molecular Complex Detection) algorithm [2], restricted neighborhood search clustering (RNSC) [3], and local clique merging algorithm (LCMA) [4] have been applied to predict protein complexes. D.-S. Huang, K. Li, and G.W. Irwin (Eds.): ICIC 2006, LNBI 4115, pp. 514–524, 2006. c Springer-Verlag Berlin Heidelberg 2006
Prediction of Protein Complexes
515
The MCODE algorithm utilizes connectivity values in PPI networks to identify complexes, a shortcoming of which is that its resulted clusters may be too sparse. While RNSC partitions the PPI networks using a cost function. It is a random algorithm and relatively fewer complexes can be predicted by this algorithm. LCMA algorithm which is based on local clique merging has been shown more efficient than MCODE algorithm preliminarily. It should be noted that proteins in known complexes often correspond to consistent functional annotation [2,3], so the relatively aboundant functional annotation information can be employed to identify protein complexes. In the study of ref.[3], functional homogeneity has been used as a necessary condition for prediction of protein complexes. Kernel representation of heterogeneous genomic information has already been proven to be very useful tool in computational biology [5,6]. Each type dataset can be represented by means of a kernel function, a real-valued function K(x, y), which defines similarities between pairs of objects (genes, proteins and so on) x and y. Evaluating the kernel on all pairs of data objects yields a symmetric, positive semi-definite matrix K known as the kernel matrix. The distinguished characteristic is that all types of data are represented in the unified framework even though they might be different in nature. Various kernels have been developed for various genomic data integration [5,6]. For example, the linear kernel and Gaussian kernel are natural choice for datasets which are represented by vectors, while diffusion kernel [7] has proven to be very effective for describing network data. In this study, a simple kernel representation which captures the functional consistency or even multiple functional consistency of protein complexes properly is defined naturally for the protein functional annotation data. Here we propose an integrated approach that attempts to identify protein complexes using protein interaction data and functional annotation information. We create an integrated protein-protein relationship network (PPRN) by using the kernel methods to integrate these two genomic data. Then the network clustering method called MCODE algorithm is applied to the created PPRN network to detect numerically derived complexes. The MCODE algorithm is developed for detecting complexes in protein interaction networks and it can detect overlapping modules. So they are also suitable for finding complexes in our networks. This approach was applied to yeast Sacchromyces cerevisiae. The computed protein complexes show good consistency with well-known yeast protein complexes. Comparison with other methods such as the MCODE algorithm applied on PPI network directly shows the effectiveness of our approach.
2 2.1
Systems and Methods Materials
We use yeast-related genomic data to predict protein complexes as it is currently the organism with the most comprehensive experimental datasets available publicly. Protein Interaction Data. A physical network of 4713 yeast Sacchromyces cerevisiae proteins containing 14848 protein interactions is used in our work.
516
S.-H. Zhang et al.
The protein-protein interactions were downloaded from the DIP database as of July 2004 and predominantly included data from large-scale experiments [8,9,10,11]. Functional Annotation Data. To employ the functional consistency of protein complexes, we utilize the functional annotation of Sacchromyces cerevisiae genes in MIPS Functional Catalog (FunCat) [12] database. FunCat is an annotation scheme for the functional description of proteins from various biology and consists of 28 main functional categories (or branches). The main branches exhibit a hierarchical, tree like structure with up to six levels of increasing specificity and 1307 functional categories are included in total. Here we utilize the functional annotation at the second levels of 68 categories (to the 4713 proteins), so that each protein corresponds to a vector of dimension 68 in which 1 or 0 represents a protein belonging to or not belonging to a category. Gold Standard Complex Data. To evaluate the effectiveness of our approach for predicting protein complexes, we compare the predicted complexes of the yeast data with known protein complexes in MIPS yeast complex database [13]. In order to removing/filtering the experimentally predicted protein complexes from the dataset to a certain extent, we only use manually annotated complexes derived from literature scanning and the known Gavin benchmark data [10] as our gold standard dataset. Finally, a set M of 439 yeast complexes is used as known complexes set. Its biggest protein complex contains 88 proteins and the average size of it is 9.11. 2.2
Methods
The outline of our method is shown in Figure 1. Two genomic datasets are represented by two kernel matrices respectively. Then a protein-protein relationship network (PPRN) is produced by integration of these two kernels. A powerful tool of detecting network modules is applied to PPRN network. The resulting modules are our numerically detected protein complexes which constitute the predicted complex set P . Validation of these complexes and comparison with related methods verifies our idea that functional annotation information is helpful for the detection of protein complexes. Kernel Representation and Data Integration. In order to represent each type of genomic information uniformly, kernel representation is an efficient method [5,6]. PPI network can be represented using the diffusion kernel [7]. Let A denote the adjacency matrix of the PPI network and D denote the diagonal matrix of degrees of nodes. So the Laplacian matrix of this network is L = D − A. Then the diffusion kernel is defined as K = expm(−βL),
(1)
where expm is a matrix exponential operation and β is a parameter to control
Prediction of Protein Complexes
PPI network
K PPI
517
Functional annotation data
K Fa
PPRN network
MCODE Predicted protein complexes Match with well-known protein complexes
Validation of protein complexes
Fig. 1. The schematic diagram of our method for detection of protein complexes
the degree of diffusion. Then the diffusion kernel is normalized so that its all diagonal elements are one: Kij KP P I = . Kii Kjj
(2)
Functional annotation data is represented by means of liner kernel: Kf a (i, j) = xi · xj ,
KF a =
Kf a , max(Kf a )
(3)
where · means the inner product and max() means the maximal value of the matrix. These two kernels measure the similarity of proteins with respect to every genomic data. A new kernel defined as the sum of the two kernels: KInt =
KP P I + KF a , 2
(4)
is a simple approach of data integration. Although more complex kernel operation can be employed to create new integrating method, this simple kernel has been used comprehensively [5]. Protein-Protein Relationship Network (PPRN): Kernels describe some implicit similarity of proteins, so any protein kernel matrix K can denote a weighted/unweighted network G(V, E, W )/G(V, E) of protein-protein relationship. The nodes set V consists of all proteins, the matrix W is the value of corresponding kernel matrix which denotes the weights of the edges, and the edge set of such network is defined as: E = {(i, j)|Kij ≥ c},
(5)
where c is a parameter to control the density of the network. We denote the network of kernel KInt as our protein-protein relationship network (PPRN). We
518
S.-H. Zhang et al.
believe that a group of proteins which have large enough kernel mutually likely corresponds to a protein complex. MCODE Algorithm: Bader and Hogue [2] have developed a novel graph theoretic clustering algorithm, i.e., so-called MCODE algorithm (http://cbio.mskcc. org/ bader/software/mcode/index.html), which utilizes connectivity values in PPI networks to detect protein complexes. This algorithm is based on vertex weighting according to its local neighborhood density and then outward traversal from a dense seed protein with a high weighting value to recursively include neighboring vertices whose weight satisfies some given threshold. Here we also apply it on our PPI network and PPRN networks to evaluate our idea that functional annotation can improve the ability of prediction of complexes.
3 3.1
Experiments and Results Validation of Protein Complexes
We assess the precision of results of applying MCODE algorithm on our PPRN networks by using evaluation metric used in [2,4]. They used the overlap score: OS(p, m) =
k2 n1 × n2
(6)
to determine matching between a predicted complex p ∈ P and a known complex m ∈ M , where k is the size of overlap of p and m and n1 ,n2 are the sizes of p and m respectively. Given a predicted complex p and a known complex m, they are considered to be matching if OS(p, m) ≥ 0.2, where 0.2 is an experientially determined threshold used in [2] firstly and also was used in [4]. And then we refer the notation in [4] to define the set of true positives (T P ) as T P = {p|∃m, OS(p, m) ≥ 0.2, p ∈ P, m ∈ M }, and the set of false positives (F P ) as F P = P − T P . Naturally, the set of false negatives (F N ) is defined as F N = {m|∀p, OS(p, m) < 0.2, p ∈ P, m ∈ M }, and the matched gold-standard complex set M S can be defined as M S = M − F N , which contains known complexes matched by predicted complexes. Then the recall (sensitivity) and precision (specificity) are defined as |T P |/(|T P | + |F N |) and |T P |/(|T P | + |F P |) respectively. The so-called F-measure which is defined as F =
2 × P recision × Recall P recision + Recall
(7)
adopted in [4] is used to evaluate the performance of our approach. Just as the authors have pointed that F-measure of every method only can be taken as comparative measures rather than their real values for the incompleteness of known complexes set. In order to further test our approach, we consider another index which measures the coverage of predicted protein complexes: Cov(p, m) =
k , n2
Cq = {m|∃p, Cov(p, m) ≥ q, p ∈ P, m ∈ M },
(8)
Prediction of Protein Complexes
519
where q is a real number between 0 and 1, the set Cq contains the known complexes whose members appear in a predicted complex above the degree q. 3.2
Experimental Results
Since the noise and incompleteness of known protein interaction data, our approach aims to detect more complexes through integrating functional annotation data to current protein interaction data with high recall and precision. The effectiveness of kernel methods employs the functional consistency of proteins and implicit relationship of interacting proteins. The functional annotation information can complement the absence of existing interactions and correct some false interactions. So integration of the two genomic data can enhance the robustness of network clustering method against only the high noise protein interaction data. Figure 2 shows an example of MIPS complex of size 18 and two matching complexes both of size 13 in PPI and PPRN network respectively by means of MCODE algorithm. Table 1 shows the functional annotation of proteins in Figure 2 (only the functional annotation that is labeled by at least three proteins has been shown). The predicted complex in PPRN network (with c = 0.24, see below) is contained within the known complex while for the predicted complex in PPI network only ten proteins are included in it. We can see that all the proteins in the given MIPS complex (the first 18 proteins in table 1) show high multiple functional consistency, while three proteins (the last three proteins labeled in black font) that are included in the predicted complex in PPI network but not included in the given MIPS complex do not show such multiple consistent functional annotation information. This shows our idea that known functional annotation information/functional annotation consistency is helpful for detecting complexes.
Fig. 2. An example: MIPS complex (MIPS-420.50)-F0/F1 ATP synthase complex and the matching complex predicted in PPI network and PPRN network respectively
In all the study, the diffusion kernel of protein interaction network is computed with β = 3. And then we choose c = 0.25 and 0.24 experientially for producing two PPRN networks with 14423 and 16413 edges respectively. We
520
S.-H. Zhang et al. Table 1. Functional annotation of proteins in Figure 2
Q0080 Q0085 Q0130 YOL077W-A YLR295C YBL099W YBR039W YDL004W YDL181W YDR298C YDR322C-A YDR377W YJR121W YKL016C YML081C-A YPL078C YPL271W YPR020W YJL180C YNL315C YBR271W
02.11 02.11 02.11 02.11 02.11 02.11 02.11 02.11 02.11 02.11 02.11 02.11 02.11 02.11 02.11 02.11 02.11 02.11
02.13 02.13 02.13 02.13 02.13 02.13 02.13 02.13 02.13 02.13 02.13 02.13 02.13 02.13 02.13 02.13 02.13
02.45.15 02.45.15 02.45.15 02.45.15 02.45.15 02.45.15 02.45.15 02.45.15 02.45.15 02.45.15 02.45.15 02.45.15 02.45.15 02.45.15 02.45.15 02.45.15 02.45.15 02.45.15
14.10 14.10
14.10 14.10 14.10
16.07 16.07
16.07 16.07
16.07 16.07 16.07
20.01.01.01 20.01.01.01 20.01.01.01 20.01.01.01 20.01.01.01 20.01.01.01 20.01.01.01 20.01.01.01 20.01.01.01 20.01.01.01 20.01.01.01 20.01.01.01 20.01.01.01 20.01.01.01 20.01.01.01
20.01.15 20.01.15 20.01.15 20.01.15 20.01.15 20.01.15 20.01.15 20.01.15 20.01.15 20.01.15 20.01.15 20.01.15 20.01.15 20.01.15 20.01.15 20.01.15 20.01.15 20.01.15
20.03.22 20.03.22 20.03.22 20.03.22 20.03.22 20.03.22 20.03.22
20.09.04 20.09.04 20.09.04 20.09.04 20.09.04 20.09.04 20.09.04 20.09.04
34.01.01.03 34.01.01.03 34.01.01.03
20.03.22 20.03
20.09.04 20.09.04
34.01.01.03
20.03.22 20.03.22
20.09.04 20.09.04
34.01.01.03 34.01.01.03
20.03.22 20.03.22
20.09.04 20.09.04
34.01.01.03 34.01.01.03
34.01.01.03 34.01.01.03 34.01.01.03 34.01.01.03
14.10 14.10
apply the MCODE algorithm on the two networks to predict protein complexes. For comparison, we also apply it on the original protein interaction network to show the effectiveness of integration of two genomic information using kernel methods. The MCODE algorithm needs two important parameters w and f to control the number and size of resulting clusters. With respect to different networks, the optimal results will be produced by different parameter pairs [2]. We choose some parameter pairs to optimize the biological relevance. Some of the predicted complexes are of size 3, while complex with size 3 is less statistical significant, since it is easy to produce in a random graph. So we discuss two cases: one is that the predicted complex set includes complexes size of 3 and the other is not. Table 2 and Figure 3 show that the optimal results with respect to the largest F-measures with three groups w in three networks. The results show the integration of functional annotation information with PPI data can enhance the prediction results largely. More complexes are detected using the same network clustering method, and the F-measures of the two PPRN networks are also clearly higher than that of only PPI network used. For example, the F-measures of two PPRN networks are able to achieve 15.85%/22.62% higher than that of PPI networks and 32.66%/39.74% higher in two cases with w = 0.1. We test coverage of predicted complexes, i.e., the degree to which entire complexes appear in the same predicted complexes [16]. Figure 4 shows the large improvement of our results for varying values of q in two cases respectively. For example, in our two PPRN networks, there are 139/140 gold-standard complexes (with w = 0.1 and f = 0) for which 50% or more of their members appeared in the same predicted complex, compared only 70 in the predicted results of PPI network. King et al.[3] applied RNSC algorithm to predict complexes from protein interaction networks. But they only predicted 45 complexes which match 30 MIPS complexes. A new recent system LCMA algorithm based on local clique merging has been reported to be more efficient than MCODE algorithm. But we do not
Prediction of Protein Complexes
521
Table 2. List of various protein-protein networks and its related results, where P3 and P4 represent predicted complexes set with minimum size 3 and 4 of their predicted complexes respectively Protein-Protein network
para(c, w, f )
|P3 | |T P |
|M S| para(c, w, f )
PPI (MCODE) PPI (MCODE)
(no, 0.00, 0.00) (no, 0.05, 0.00)
157 237
74 94
104 143
(no, 0.00, 0.20) (no, 0.05, 0.10)
|P4 |
|T P |
|M S|
147 235
82 121
60 74
PPI (MCODE)
(no, 0.10, 0.00)
305
169
105
(no, 0.10, 0.25)
268
130
92
PPRN (Int+MCODE) PPRN (Int+MCODE)
(0.25, 0.00, 0.00) (0.25, 0.05, 0.00)
371 510
216 260
122 139
(0.25, 0.00, 0.10) (0.25, 0.05, 0.00)
357 360
192 204
112 95
PPRN (Int+MCODE)
(0.25, 0.10, 0.00)
591
284
142
(0.25, 0.10, 0.00)
405
225
97
PPRN (Int+MCODE)
(0.24, 0.00, 0.30)
377
239
119
(0.24, 0.00, 0.30)
330
216
104
PPRN (Int+MCODE) PPRN (Int+MCODE)
(0.24, 0.05, 0.00) (0.24, 0.10, 0.00)
526 633
286 318
141 150
(0.24, 0.05, 0.00) (0.24, 0.10, 0.00)
368 435
223 248
93 105
A
PPI
B
PPI PPI PPRN(0.25) PPRN(0.25) PPRN(0.25) PPRN(0.24) PPRN(0.24) PPRN(0.24) 0
0.1
0.2
0.3
0.4
0.5
0
0.1
0.2
0.3
0.4
0.5
Fig. 3. Comparison of F-measures of applying the MCODE algorithm on various networks with various selective parameters
do direct comparison for the lack of their system, and we emphasize that our contribution is that we complement the incomplete PPI data with functional annotation information by means of kernel methods which well exploit the functional homogeneity of protein complexes or even multiple functional consistency of proteins. Our approach also predicted complexes that do not match current protein complexes set just like other methods have done. Since the known complex set is largely incomplete, these new unmatched complexes could be real complexes likely. So the actual precision of our approach would be higher than current results. Recent studies have well shown that biological networks (eg. metabolic network, physical interaction networks) show the characteristic of scale-free networks just like many natural networks [14]. Here, we examined the scale-free characteristic of protein-protein relationship networks and size distribution of predicted complexes based on the PPI network and the two PPRN networks. On the top of Figure 5, plots A,B,C show that the probability P (k) of a node with degree in these three networks follows power law: P (k) ∝ k −γ , and at the bottom of Figure 5, plots B,C show that the size distribution of clusters (modules) of two PPRN networks also follow power law clearly, while that of PPI networks has a high slope (γ = 3.56)(see in plot A).
522
S.-H. Zhang et al.
200
|Cq|
160
PPI PPI PPI PPRN(0.25) PPRN(0.25) PPRN(0.25) PPRN(0.24) PPRN(0.24) PPRN(0.24)
250
150
140 120 100 80
100
60 40
50 20 0 20
40
60
80
0 20
100
40
60
q(%)
80
100
q(%)
Fig. 4. Complex coverage: |Cq | represents the number of complexes whose member proteins appear in the same predicted complex for various q values. The left figure plots the results with all the predicted complexes including complexes of size 3, while the right removing complexes of size 3.
A
Frequency (log)
3
slope=−1.57
2
2
1
1
1
0
0
0
1 2 Degree (log)
0
1 2 Degree (log)
A
0
C 2
slope=−3.56
slope=−1.98
1.5
1.5
1
1
1
0.5
0.5
0.5
0 0.4
slope=−1.86
1.5
0 0.6 0.8 1 Cluster size (log)
1 2 Degree (log)
B 2
2
C
3
slope=−1.61
2
0
Number of clusters (log)
B
3
slope=−1.82
0 0.5
1 1.5 Cluster size (log)
0.5
1 1.5 Cluster size (log)
Fig. 5. On the top of the figure, plots show the degree distribution of PPI network and two PPRN networks, and at the bottom of the figure, plots show size distribution of these networks by MCODE algorithm with parameter w = 0.1 and f = 0
4
Conclusion and Discussion
In this paper, we develop a method of predicting protein complexes based on integration of two important genomic data (physical interaction data and protein
Prediction of Protein Complexes
523
functional annotation data) by means of kernel methods. Group of genes/proteins which may correspond to functional modules have been detected comprehensively in physical interaction data [15]. However, it is often hard to conclude that these clusters/modules must have such properties. One reason is that these data is very noisy and incomplete. Prediction of protein complexes has been done based on protein interaction data such as MCODE algorithm [2], RNSC algorithm [3] and recent LCMA algorithm [4]. Detection of molecular pathways/functional modules also have been done based on integration of physical interaction data and another important genomic data—gene expression data [16]. Here, we introduce the functional annotation data to improve the limitation of the physical interaction data and this approach well employ the functional consistency of protein complexes. Kernel representation has been proven to be very useful for various types data, e.g. string, trees, network and so on. Its merit has been comprehensively used in bioinformatics, e.g. inference of biological network [5]. In this study, we well exploit the characteristic of kernel methods and combine these two data. The experimental results with yeast data show the effectiveness of our proposed method. Compared with the results of only using protein interaction data, our predicted complexes match or contain more known experimentally protein complexes. More novel predicted complexes may help biologists to detect new protein complexes experimentally. We can conclude that the combination of these two data sources can produce more better results than only using protein interaction data.
Acknowledgements This work is partly supported by Important Research Direction Project of CAS “Some Important Problem in Bioinformatics”, National Natural Science Foundation of China under Grant No.10471141.
References 1. Sear, R.P.: Specific Protein-Protein Binding in Many-componet Mixtures of Proteinsn. Phys. Biol., 1(2004), 53-60 2. Bader, G.D., Hogue, C.W.: An Automated Method for Finding Molecular Complexes in Large Protein Interaction Networks. BMC Bioinformatics, 4(2003), 2 3. King, A.D., Prˇzulj, N., Jurisica, I.: Protein Complex Prediction via Cost-based Clustering. Bioinformatics, 20(2004), 3013-3020 4. Li, X.L., Tan, S.H., Foo, C.S., Ng, S.K.: Interaction Graph Mining for Protein Complexes Using Local Clique Merging. Genome Informatics, 16(2005), 260-269 5. Yamanishi. Y, Vert, J.P., Kanehisa, M.: Protein Network Inference from Multiple Genomic Data: a Supervised Approach. Bioinformatics, 20(2004), i363-i370 6. Lanckriet, G.R., De Bie T.D, Cristianini, N., Jordan, M.I., Noble, W.S.: A Statistical Framework for Genomic Data Fusion. Bioinformatics, 20(2004), 2626-2635 7. Kondor, R.I., Lafferty, J.: Diffusion Kernels on Graphs and Other Discrete Input. In Proceedings of the 19th International Conference on Machine Learning, Morgan Kaufmann, University of South Wales, Sydney, Australia, (2002), 315-322
524
S.-H. Zhang et al.
8. Ito, T., Tashiro, K., Muta, S., Ozawa, R., Chiba, T., Nishizawa, M., Yamamoto, K., Kuhara, S. Sakaki, Y.: Toward a Protein-Protein Interaction Map of the Budding Yeast: a Comprehensive System to Examine Two-hybrid Interactions in All Possible Combinations between the Yeast Proteins. Proc. Natl Acad. Sci., USA, 97(2000), 1143-1147 9. Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., et al.: A Comprehensive Analysis of Protein-Protein Interactions in Saccharomyces Cerevisiae. Nature, 403(2000), 623-627 10. Gavin, A.C., Bosche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J.M., Michon, A.M., Cruciat, C.M., et al.: Functional Organization of the Yeast Proteome by Systematic Analysis of Protein Complexes. Nature, 415(2002), 141-147 11. Ho, Y., Gruhler, A., Heilbut, A., Bader, G.D., Moore, L., Adams, S.L., Millar, A., Taylor, P., Bennett, K., Boutilier, K., et al.: Systematic Identification of Protein Complexes in Saccharomyces Cerevisiae by Mass Spectrometry. Nature, 415(2002), 180-183 12. Ruepp, A., Zollner, A., Maier, D., Albermann, K., Hani, J., Mokrejs, M., Tetko, I., Guldener, U., Mannhaupt, G., Munsterkotter, M. et al.: The FunCat, a Functional Annotation Scheme for Systematic Classification of Proteins from Whole Genomes. Nucleic Acids Res., 32(2004), 5539-5545 13. Mewes, H.W., Frishman, D., Guldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstern, B., Munsterkotter, M., Rudd, S., Weil, B.: MIPS: a Database for Genomes and Protein Sequences. Nucleic Acids Res. 30(2002), 31-34 14. Barab´ asi, A.-L., Oltvai, Z.N.: Network Biology: Understanding the Cell’s Functional Organization. Nature Rev. Genet., 5(2004), 101-114 15. Spirin, V., Mirny, L.A.: Protein Complexes and Functional Modules in Molecular Networks. Proc. Natl Acad. Sci., USA, 100(2003), 12123-12126 16. Segal, E., Wang, H., Koller, D.: Discovering Molecular Pathways from Protein Interaction and Gene Expression Data. Bioinformatics, 19(2003), i264-i272