PROCOS: Computational analysis of ... - Semantic Scholar

Report 2 Downloads 76 Views
PROCOS: Computational Analysis of Protein–Protein Complexes FLORIAN FINK,1 JOCHEN HOCHREIN,1 VINCENT WOLOWSKI,2 RAINER MERKL,3 WOLFRAM GRONWALD1 1 Institute of Functional Genomics, University of Regensburg, Regensburg, Germany Faculty of Mathematics and Computer Science, University of Hagen, Hagen, Germany 3 Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany 2

Received 9 November 2010; Revised 15 April 2011; Accepted 15 April 2011 DOI 10.1002/jcc.21837 Published online 31 May 2011 in Wiley Online Library (wileyonlinelibrary.com).

Abstract: One of the main challenges in protein–protein docking is a meaningful evaluation of the many putative solutions. Here we present a program (PROCOS) that calculates a probability-like measure to be native for a given complex. In contrast to scores often used for analyzing complex structures, the calculated probabilities offer the advantage of providing a fixed range of expected values. This will allow, in principle, the comparison of models corresponding to different targets that were solved with the same algorithm. Judgments are based on distributions of properties derived from a large database of native and false complexes. For complex analysis PROCOS uses these property distributions of native and false complexes together with a support vector machine (SVM). PROCOS was compared to the established scoring schemes of ZRANK and DFIRE. Employing a set of experimentally solved native complexes, high probability values above 50% were obtained for 90% of these structures. Next, the performance of PROCOS was tested on the 40 binary targets of the Dockground decoy set, on 14 targets of the RosettaDock decoy set and on 9 targets that participated in the CAPRI scoring evaluation. Again the advantage of using a probability-based scoring system becomes apparent and a reasonable number of near native complexes was found within the top ranked complexes. In conclusion, a novel fully automated method is presented that allows the reliable evaluation of protein–protein complexes. © 2011 Wiley Periodicals, Inc.

J Comput Chem 32: 2575–2586, 2011

Key words: protein–protein complex; docking; scoring; reranking; support vector machine

Introduction Protein–Protein Interactions

Proteins are an essential part of nearly all cellular processes. One important aspect of proteins is their three-dimensional structure, which must be known to understand their function in detail. Most frequently, protein structures are determined by means of X-ray crystallography and NMR spectroscopy, leading to a rapidly growing number of solved structures. To date, more than 60,000 protein structures are deposited in the Protein Data Bank (PDB) available at (www.rcsb.org).1 However, cellular functions are rarely carried out by single proteins but by complexes composed of several interacting proteins. It has been estimated that each protein has nine interaction partners on average.2 However, due to experimental complexity only a very small part of the deposited structures consists of protein–protein complexes. High-throughput methods for detecting protein interactions, like yeast2hybrid assays or tandemaffinity-purification mass spectrometry, predict a large number of protein–protein interactions. These experimental approaches are supplemented by bioinformatic methods such as phylogenetic

profiling, investigations of gene neighborhoods, and gene fusion analysis. Unfortunately, it is not possible to determine the structures of all these protein complexes by experimental methods due to limitations concerning large or transient complexes. In addition, the experimental structure determination of protein–protein complexes is in most cases a time-consuming and challenging process. For that reason, computational approaches like docking algorithms that predict the structure of these complexes are needed. During the last few years, considerable effort has been put in the development and application of docking algorithms; for a review see.3 The success of docking algorithms has consistently improved over the last years, as measured by the CAPRI blind docking experiment.4, 5 Due to such efforts, on one hand the applicability of in silico created complexes is becoming more widely accepted, and on the other hand the various available docking algorithms can be objectively compared.

Correspondence to: W. Gronwald; e-mail: wolfram.gronwald@klinik. uni-regensburg.de Contract/grant sponsor: Bavarian Genome Research Network (BAYGENE)

© 2011 Wiley Periodicals, Inc.

2576

Fink et al.



Vol. 32, No. 12



Docking

All docking approaches assume that the native complex is near the global minimum of the energy landscape constituted by the set of all theoretically possible complex conformations of the interacting proteins. The main challenges of any docking algorithm can be divided into three separate elements: First, the possible docking orientations have to be enumerated at a sufficiently high resolution. Second, minor or even major structural changes that occur upon complex formation have to be considered. Third, from all putative solutions the near native ones have to be selected. For a reliable decision, a scoring function that distinguishes native(-like) from non-native(-like) docking solutions is necessary. In the following, we focus on optimizing the third task, the selection step. Usually, several factors are considered in the identification of near native models, including steric surface complementarity,6, 7 electrostatic interactions,8–10 hydrogen bonding,11 knowledge based pair-potentials,12, 13 desolvation energies14 and van der Waals interactions.15 It has been shown that evaluation of complexes can be improved considerably by combining the information of several analysis functions,16 and this is increasingly becoming common practice.17–20 Despite these efforts the selection step still remains a challenging task18, 21 and frequently, high scores are obtained for non-native solutions. One important drawback of many scoring approaches used today is that in most cases score values strongly depend on other factors, e.g., the size of the proteins and vary widely even for correct solutions from target to target. Therefore, it is difficult to define a priori thresholds up to which a docked complex should be retained for further analysis. In addition, it is not possible to directly compare the scores of different complexes with each other. Here, we describe with the PROtein COmplex analysis Server (PROCOS) a novel approach for the evaluation of both computationally and experimentally derived protein complexes that overcomes several of the limitations mentioned earlier. For complex analysis PROCOS uses a combination of well established analysis functions. Specifically, van der Waals and electrostatic energies and knowledge based pair potentials are used. As detailed later, the novel part of PROCOS concerns the way in which these functions together with a large database of native and false complexes are used to calculate a probability-like measure to be native for each given complex.

Methods The underlying idea of PROCOS is to classify complexes based on Bayes’s theorem,22 which is used to calculate the probability p that a complex with a global score value S belongs to the class of native complexes N [eq. (1)]. Here, p(N) and p(F) denote the a priori probabilities that a complex belongs to the class of native complexes (N) or to the class of false ones (F). In addition, estimates of the probability distributions DN and DF of the scores S for the classes N and F are required for obtaining the probabilities p(S|N) and p(S|F), respectively. Although, it is possible to formulate a priori assumptions on these distributions, the extraction of this information from known complex structures is more robust. Therefore, native complexes were taken from a database, termed the Mintz database in the following, which contains 2541 nonhomologous native protein

Journal of Computational Chemistry

complexes.23 A meaningful antipode of false complexes was taken from CAPRI scoring data as detailed later. For each of the native and false complexes the values of three analysis functions were calculated: intermolecular electrostatic energy (e), intermolecular van der Waals energy (v), and the score of an intermolecular amino acid based pair-potential (k).24 The e, v, and k values obtained for each complex were used to train a support vector machine (SVM) with two classes. In this case, the property S is related to the position of an individual complex relative to the separating hyperplane of the SVM model. Next, using these data, probability distributions were obtained for the two classes N and F. Figure 1 gives an overview of the procedure which is detailed later. Finding reasonable values for the a priori probabilities p(N) and p(F) that a complex belongs to the class of native complexes N or to the class of false complexes F is a difficult task that depends on several factors such as the docking algorithm used, the system under investigation, etc. As an approximation, we used p(N) = p(F) = 0.5. This does, of course, not at all reflect the real proportion between the amount of true solutions and all theoretically possible conformations. However, it would be meaningless to select some other arbitrarily chosen values as long as there are no facts available resulting in more reasonable estimates for the priors. This affects the results in a way that our so-called “probabilities” are not real probabilities to be native structures. To obtain somewhat more realistic priors, one could scan the solutions of typical docking runs for the fraction of native and non-native complexes. For example, near native and false complexes of the recent CAPRI scoring competitions could be used for this purpose. This would lead to priors p(N) = 0.062 and p(F) = 0.938. However, it should be noted that these are no general values and therefore, in this contribution priors of p(N) = p(F) = 0.5 were used. For our approach it is necessary to obtain a reasonable set of false complexes. Creating this set, one cannot simply join two proteins in an arbitrary way, since the resulting complexes would be extremely unrealistic. For a realistic set, false complexes are needed that do not exist in nature but are, nevertheless, optimized in a way that they could theoretically exist. As a possible solution to this problem we took already existing decoys from targets of the last CAPRI scoring competitions (T29, T32, T35, T36, T37_1, T38, T39, T40_CA, T41) that were generated by many different predictor groups using a variety of different algorithms. Of those, for each target 25% arbitrarily chosen complexes (2194 structures) that were marked as incorrect according to the CAPRI criteria were used for the calculation of the probability distributions of the false complexes. This approach ensures that the resulting distributions are not biased towards a single algorithm used for calculating the structures. In addition, to provide roughly the same amount of structures for the computation of the probability distributions of the classes N and F, only a subset of the incorrect CAPRI structures was used. Note, that for targets 37 and 40 two evaluations were performed by CAPRI. For T37 this was done due to high symmetry between the two chains in the ligand of T37 and their close proximity to each other and the interface. For target 40 there are two possible interfaces at opposite sides of the receptor (see CAPRI homepage for details25 ). However, to not overuse the structures of these targets, they were used only once for the generation of probability distributions. The so obtained probability distributions for the false complexes should represent

Journal of Computational Chemistry

DOI 10.1002/jcc

PROCOS: Computational Analysis of Protein–Protein Complexes

a meaningful antipode to the group of the native complexes. The remaining 75% of false complexes together with the CAPRI complexes that were marked as being at least acceptable were later used for testing the algorithm. Using the above definitions the calculation of the probability p that a complex with a global score value S belongs to the class of native complexes N is described by the following equation:

2577

for the electrostatic energy the used function (3) is similar to the one used by CNS and HADDOCK. Evdw =

 n,m

   σ 12  σ 6 4ε − SW (R, Ron , Roff ) R R

(3)

with p(N | S) =

p(N) · p(S | N) p(N) · p(S | N) + p(F) · p(S | F)

(1)

Complex Analysis

A central part of the present work is the design of a global probability-like measure deduced from a variety of individual analysis functions. Currently, we include intermolecular electrostatic energies, intermolecular van der Waals energies and a term based on knowledge based amino acid pair-potentials. However, the algorithm easily allows the addition of further functions. A good overview of discriminating features that could be used for the analysis of protein–protein interactions is given in the recent review of Ezkurdia et al.26 Note, that the energy functions used within PROCOS mainly aim at the analysis of complexes that only contain a moderate number of clashes. Before the individual energy terms are computed within PROCOS, all hydrogens present are first removed followed by the addition of polar and nonpolar hydrogens using the program REDUCE3.10.27 This ensures a comparable protonation state and atom nomenclature for all pdb-files investigated. Electrostatic Energy

The electrostatic energy of the complex is the sum of the individual electrostatic energies of all intermolecular atom pairs in the complex. The used function (2) is similar to the one used by the molecular dynamics program CNS28 and the docking program HADDOCK 20 which is based on CNS.

Eelec =

 qn qm C n,m

ε0 R



R2 1− 2 Roff

 (2)

where n and m enumerate the atoms of the first and second protein, respectively; q is the charge of the atom. The individual partial charges are similar to those taken by HADDOCK2.0 (see HADDOCK distribution, file “topallhdg5.3.pro”); C is a scaling factor as it is also used in CNS and HADDOCK; the dielectric constant ε0 is set to one; R denotes the distance between the atoms. The term in brackets ensures that the electrostatic energy approaches zero at a cutoff value of Roff = 8.5 Å. This cutoff saves computation time and the introduced error is negligible. Van der Waals Energy

The van der Waals energy is a combination of the Pauli repulsion and the van der Waals attraction. Similar to the electrostatic energy, it is calculated as a sum over all intermolecular atom pairs. As described

SW

 0     

2

2 2 2 2 R2 − Roff · R − Roff − 3 R2 − Ron = 2

 2 3  Roff − Ron    1

if R > Roff if Roff > R > Ron if R < Ron (4)

where ε and σ parametrize the Lennard-Jones potential of identical atom types. Between different atom types, the following combina√ σ +σ tion rule is used: σij = ii 2 jj and εij = εii εjj . The individual values were taken from the literature and are similar to those used by HADDOCK2.0 (see HADDOCK distribution, file “toppar/parallhdg5.3.pro”).20, 29 Ron and Roff were set to 6.5 Å and 8.5 Å, respectively. Pair-Potential

A knowledge based potential for the occurrence of amino acids in protein interfaces has been deduced from the nonhomologous complexes of the Mintz database.24 A score Sinter (aa1 , aa2 ) has been calculated for each intermolecular amino acid pair in contact at an interface, according to the following equation:  Sinter (aa1 , aa2 ) = log

fpair (aa1 , aa2 ) fsurface (aa1 )fsurface (aa2 )

 (5)

Sinter is a typical log-odds-score. In (5) the denominator models the frequency of finding a contacting pair given the occurrence of amino acids at the surface of proteins. The numerator is the observed frequency deduced from protein interfaces in the Mintz database. Therefore, Sinter (aa1, aa2) is positive, if a contacting pair of amino acids occurs in interfaces more frequently than expected, given the amino acid frequencies for the protein surface. Analogously, the score is negative, if the amino acid pair is found less frequent than expected. For example, Sinter (Val, Trp) is 3.1 and Sinter (Glu, Asp) is −1.9, which are among the most extreme values. In the following, the pair-potential score of a complex is the sum of the scores of all contacting amino acid pairs.12 Visualization Through Probability Distribution Plots

As the functions above are very diverse in their physical meaning, rescaling of the individual functions was performed for easier visual inspection. Therefore, in all data and each analysis function the point where the value of the analysis function reached zero was set to the origin. Going in the direction of more favorable values,

Journal of Computational Chemistry

DOI 10.1002/jcc

2578

Fink et al.



Vol. 32, No. 12



Journal of Computational Chemistry

of this analysis function. Using the step size that was derived in the calculation above and the same cutoff criteria, rescaling was also performed in the opposite direction. Using the rescaled data, probability distributions were obtained for each analysis function for the groups of native and false complexes. Curves were calculated and smoothed using the following Kernel Density Estimator: D(x) =

 n

Figure 1. Overview of the work-flow to obtain probability distributions for native and false protein complexes: Protein complexes from the Mintz database23 are used as native complexes. False complexes were taken from data of the CAPRI scoring competition according to the definitions provided by CAPRI. For all complexes three different analysis functions were used, namely van der Waals energies, electrostatic energies, and amino acid based pair-potentials. Resulting values were rescaled for reasons of data comparison. A support vector machine (SVM) was trained with the different scores and a measure related to the distance of every complex to the separating hyperplane was calculated. This data was used to calculate a new set of probability distributions for the two classes N and F. The data flow of native and false complexes is represented by blue and red arrows, respectively.

a maximum number of 1000 was assigned to the point where the probability density values for the distributions of the native and false complexes both approached a value of zero, i.e., they were both below 0.1% of the largest obtained probability density value

    1 1 x − µn 2 . exp − √ 2 σn σn 2π

(6)

From every data-point n and its m neighbors the mean µn and the variance σn were calculated. These values were used to derive a Gauss function for the corresponding data point and all Gaussians were added to produce the density. The parameter m (number of neighbors) determines the degree of smoothing and was set to 200. The resulting rescaled probability distributions are shown in Figure 2. Analysis of the diagrams showed that in all cases distinct differences were obtained between the distributions of the native and false complexes. For reasons of comparison also distributions obtained from near native complex structures of the latest CAPRI scoring competitions were included. Note that the latter distributions were not used for any calculations. Calculation of Probabilities with an SVM

To combine the three calculated scores to one global probability measure, an SVM was trained using the libSVM library.30 For

Figure 2. Probability distribution plots for electrostatic energy (top left), van der Waals energy (top right) and amino acid based pair-potentials (bottom). The curves for the native complexes (DN ) are plotted in blue, those for the false complexes in red (DF ). For reasons of comparison also distributions obtained for the near native structures of the CAPRI test data are included (green). All values are rescaled, see Methods section for details.

Journal of Computational Chemistry

DOI 10.1002/jcc

PROCOS: Computational Analysis of Protein–Protein Complexes

2579

namely van der Waals, electrostatic and desolvation energies, whereas DFIRE uses an all-atom knowledge-based potential for decision making.13 Here, the newest version dDFIRE was used.32 DFIRE is available as web server (http://sparks.informatics.iupui. edu/yueyang/DFIRE/dDFIRE-service) as well as a stand-alone program.

Results and Discussion Analysis of Native Complexes

Figure 3. Probability distributions of the obtained SVM model. The distributions of native (DN ) and false (DF ) complexes are plotted in blue and red, respectively.

training, the e, v, and k values obtained from the complexes of the Mintz database and from the selected false complexes of the CAPRI set were used. In all cases a radial basis function kernel was utilized. The standard output of an SVM is a yes/no answer. In our case, the SVM decides whether the complex belongs to the group of the native complexes or not. However, as mentioned before, the aim of PROCOS is to calculate a probability-like measure that a complex belongs to the class of native complexes. For this, after training, a measure related to the distance of every complex to the separating hyperplane is computed. This measure is called the decision value. Based on this data, probability distributions DN and DF are calculated as described earlier. Figure 3 shows the corresponding distributions for native and false complexes. For a newly investigated complex the e, v, and k values are calculated, and based on this data, the position relative to the separating hyperplane is calculated according to the previously trained model. Using the position relative to the separating hyperplane and the distributions DN and DF shown in Figure 3 probabilities p(S|N) and p(S|F) are estimated. With eq. (1) and the a priori probabilities p(N) and p(F) the PROCOS probability-like measure p(N|S) that this complex belongs to the class of native complexes is computed.

One of the potential goals of PROCOS includes the comparison between different targets. Therefore, for complexes of similar quality but stemming from different source proteins, comparable values should be obtained. To further investigate this claim we choose to analyze a set of experimentally solved native complexes, since for most of theses structures a comparable high quality can be assumed. For this task, 95 native complexes were selected in a random manner from the PDB. All of them were, on the sequence level, less than 25% identical to an entry in our training set. These complexes were analyzed by PROCOS, ZRANK, and DFIRE. Results show that PROCOS yielded for 87 of these complexes probabilitylike values p(N|S) between 100% and 50% and only for 8 of them lower values. The average probability value obtained for all native test structures amounts to 85.2%. In addition to the global probability-like values, PROCOS provides the values of the individual analysis functions to allow for a detailed evaluation of the results. Analyzing the complex showing the lowest probability value of 7.9%, it became apparent that this complex shows very high van der Waals energies, indicating a possible problem with the experimental structure determination. There were a few other complexes with low probability values; in these cases this is mostly due to an unfavorable pair potential score.

User Interface

We have implemented a web interface, PROCOS (http://compdiag. uni-regensburg.de/procos), that allows the analysis of binary protein complexes. For data processing single or multiple models are uploaded as a pdb-file. After parsing the input data, values of the analysis functions are calculated and displayed together with the corresponding probability distribution plots and the actual values marked by colored bars within it. This data is provided as additional information to the calculated probability measures. Validation

To validate the performance of PROCOS, results were compared to those from ZRANK 2.0 and DFIRE. ZRANK is a well established program for the analysis and reranking of docked complexes.31 As PROCOS, it uses a combination of different scoring terms,

Figure 4. Receiver operating characteristics (ROC) obtained for the Dockground decoy set employing PROCOS (blue), ZRANK (red), and DFIRE (yellow).

Journal of Computational Chemistry

DOI 10.1002/jcc

2580

Fink et al.



Vol. 32, No. 12



This data clearly shows the advantage of using a probability based analysis scheme, since the values obtained for a set of very different complexes are directly comparable with each other. The probability values between 100% and 50% that were obtained by PROCOS for native structures provide for the analysis of docking runs an expected upper-range of values that may be reached using high resolution docking algorithms. When using more conventional scoring schemes like ZRANK and DFIRE, for the same set of 96 native complex structures a range of scoring values between −814 and −14 (ZRANK) and −234 and 301 (DFIRE) is obtained. These values are potentially more difficult to interpret than the probabilitylike values obtained by PROCOS. To obtain these results, it was not necessary to divide the complexes into different groups, e.g., enzyme-inhibitor, antibody-antigen and others, as proposed in the literature33, 34 and adapted by several scoring approaches (e.g., refs. 17, 19). This enhances the usability and general applicability of PROCOS. Investigated Decoy Sets

The main application of PROCOS is the analysis of the many models generated by a docking approach. We tested PROCOS’ performance on three different sets of test data. First, decoys from all 40 binary targets of the Dockground decoyset35 that are listed in Tables 1 and 3 were investigated. For all complexes the sequence identity between targets is less than 30%. Decoys were generated using the GRAMM-X FFT docking method, where top scoring predictions were subjected to conjugate gradient enerergy minimization using a smoothed Lennard-Jones potential.36 The decoyset contains for each target the 100 lowest energy non-native structures and at least one near-native structure. Second, decoys generated employing RosettaDock37 were obtained from the website of the Gray lab (http://graylab.jhu.edu/ docking/decoys/unbound_global.tgz). For each target the top 200 structures from global searches starting from unbound starting structures with rebuilt sidechains were available. To ensure that a reasonable amount of near native structures was available, decoys for each target were screened for near native structures leading to the selection of the decoys of 14 enzyme/inhibitor complexes, listed in Table 4. Next, decoys of the last CAPRI scoring competion were selected for testing. For each target deocys were generated by many different predictor groups using a large variety of algorithms. Note, that for the CAPRI targets 25% of the complexes that were marked as incorrect were used for training of PROCOS. Therefore, these complexes were excluded from testing. For CAPRI targets T36 and T38 no near native sructures were present and therefore, these targets were also excluded leading to the selection of nine targets, (T29, T32, T35, T37_1, T37_2, T39, T40_CA, T40_CB, and T41). Due to high symmetry between the two chains in the ligand of T37 and their close proximity to each other and the interface, two assessments were performed. For target 40 there are two possible interfaces at opposite sides of the receptor. Therefore, two examinations were performed (see CAPRI homepage for details25 ). In the following evaluation we distinguished only between two groups of complexes: incorrect structures and near native structures. Those that are acceptable or better according to criteria proposed by CAPRI5 are termed as near native structures. All structures were reranked by PROCOS, ZRANK, and DFIRE. The results are shown in Tables 1–6 and Figure 4 and are discussed in the following.

Journal of Computational Chemistry

Analysis of Near Native Structures

First, results obtained for the experimentally solved native structures were compared to those of the near native structures from docking. When looking at the average probability values that were obtained by PROCOS for the near native structures of the various complexes of the Dockground decoy set (Table 1), a range between of 55.9% and 0.3% with an average value over all targets of 10.4% ± 11.8% (Table 2) can be seen. The first obvious observation is that the average values of the near native structures of the Dockground decoys are considerably smaller than those calculated for the experimentally solved native complex structures. As noted earlier, for 87 out of the 95 experimentally solved structures probability values between 100% and 50% were achieved. This is a clear indication that the average quality of the near native Dockground structures is still well below the quality of experimentally solved structures. When analyzing the Rosetta decoys, considerably higher average values of 65.1% ± 20.6% were obtained by PROCOS for the near native structures (Table 2). These values are already in the range of those obtained for the experimental stuctures. These data indicate a considerable influence of the used docking algorithm. To further investigate the average quality of near native structures obtained from docking runs, the CAPRI test data was employed. Here, average values of 27.5% ± 23.8% were computed by PROCOS for the near native structures. These figures are somewhat in between those obtained for the Dockground and Rosetta data, which might be explained by the fact that the CAPRI data were generated by numerous different algorithms. In addition, probability distributions for the near native CAPRI decoys were computed. These curves are indicated in green in Figure 2. Analysis of Figure 2 shows that for the van der Waals and electrostatic energies as well as for the pair-potentials the green curves are somewhere between the curves of the native and false complexes and especially for the pairpotentials the curve of the near native complexes is far apart from the curve of the native ones and almost identical to the curve obtained for the false complex structures. This data is a further indication that often still considerable differences exist between experimentally solved native complex structures and near native termed structures that were generated by docking. This observation is probably one of the main causes why in many cases scoring approaches have difficulties to single out near native structures.

Threshold Selection

One of the main problems when analyzing docking runs is the selection of an appropriate threshold for the used scoring function to pick structures that should be considered for further analysis. We analyzed whether it is possible to achieve such a goal with PROCOS. When comparing the average PROCOS probability-like values for the near native structures of the Dockground decoy set (Table 1) with those of the worst 25% solutions, with a corresponding range between 0.1 and 0.2%, it is apparent that the probability-like values for these two sets of structures do not overlap. Therefore, setting of a global threshold to safely remove a considerable subset of the wrong solutions seems feasible. The advantage of such a global threshold is that it may be selected a priori, independent of the investigated target.

Journal of Computational Chemistry

DOI 10.1002/jcc

PROCOS: Computational Analysis of Protein–Protein Complexes

2581

Table 1. Analysis of Average Results of the Dockground Decoy Set.

Average results near native solutions Complex 1avw_A_B 1bui_A_C 1bui_B_C 1bvn_P_T 1cho_E_I 1dfj_E_I 1e96_A_B 1ewy_A_C 1f6m_A_C 1fm9_A_D 1g6v_A_K 1gpq_A_D 1gpw_A_B 1he1_A_C 1he8_A_B 1ku6_A_B 1ma9_A_B 1nbf_A_D 1oph_A_B 1ppf_E_I 1r0r_E_I 1s6v_A_B 1t6g_A_C 1tmq_A_B 1tx6_A_I 1u7f_A_B 1ugh_E_I 1w1i_A_F 1wq1_R_G 1xd3_A_B 1yvb_A_I 2a5t_A_B 2bkr_A_B 2btf_A_P 2ckh_A_B 2fi4_E_I 2goo_A_C 2sni_E_I 3fap_A_B 3sic_E_I

Average results 25% worst solutions

PROCOS

ZRANK

DFIRE

PROCOS

ZRANK

DFIRE

20.62 20.85 1.07 24.40 29.81 36.36 0.30 6.17 0.28 18.30 1.00 11.01 1.24 2.48 55.93 2.30 8.71 14.35 3.40 22.21 1.85 2.29 7.86 5.08 1.11 17.99 3.53 1.06 0.30 17.12 8.11 0.26 0.77 18.89 6.94 11.64 0.27 8.35 9.53 12.24

623.10 455.49 623.70 1050.17 283.83 3253.77 627.08 356.14 970.85 903.57 399.81 431.02 1672.01 799.12 118.37 733.64 1269.32 220.88 510.61 37.68 476.91 261.11 1142.53 990.57 1223.19 366.85 1388.10 539.27 1151.19 625.74 65.21 1583.81 1217.61 584.26 160.95 649.15 714.86 627.15 201.55 155.81

−20.27 −10.27 −16.36 −30.70 −20.73 −11.83 −18.29 −15.42 −12.21 −28.02 −10.85 −14.61 −20.09 −14.87 −10.28 −13.19 −28.27 −10.09 −14.54 −18.52 −17.25 −8.82 −28.15 −27.09 −18.24 −17.46 −26.31 −8.38 −14.41 −17.48 −19.16 204.80 −14.90 −18.13 −12.39 −14.86 −22.68 −15.50 −11.76 −17.21

0.21 0.17 0.18 0.20 0.20 0.10 0.19 0.17 0.20 0.17 0.18 0.22 0.18 0.15 0.11 0.19 0.10 0.14 0.14 0.24 0.23 0.15 0.20 0.20 0.18 0.20 0.20 0.10 0.16 0.19 0.21 0.12 0.19 0.13 0.18 0.22 0.21 0.22 0.23 0.19

1522.81 1805.63 1814.42 1707.82 1047.46 2786.44 1597.88 1801.21 1878.92 1841.94 1566.24 1220.22 1944.96 1791.32 2489.49 1564.13 2864.80 1913.58 2016.66 1277.76 1304.52 1704.70 1843.32 1621.02 2158.75 1729.67 1636.86 2546.49 2426.31 1685.97 1630.27 2394.10 1638.06 1799.41 1528.02 1144.50 1205.66 1317.28 1087.03 1381.82

−14.00 −10.02 −12.01 −16.88 −12.60 −8.64 −9.51 −11.13 −11.41 −9.92 −10.41 −9.22 −8.77 −10.07 −11.42 −12.77 −15.72 −4.78 −11.83 −11.47 −10.23 −4.80 −18.91 −15.85 −18.00 −15.33 −8.07 −10.50 −10.40 −6.01 −11.49 213.73 −6.19 −12.04 −4.72 −12.47 −15.85 −9.56 −10.39 −13.95

For all targets, the average scores (ZRANK and DFIRE) and probability values in % (PROCOS) of the near native solutions according to CAPRI criteria were calculated. The corresponding values are shown in the first three columns. The data contained in the three columns to the right was calculated by taking the mean of the 25% worst solutions of a target according to the measures calculated by PROCOS, ZRANK, and DFIRE.

When analyzing the corresponding average score values from ZRANK for the near native solutions of the various targets of the Dockground decoy set (Table 1), values are between 37.7 and 3253.8 with an average of 736.6 ± 588.0 and for the 25% worst solutions there is a range from 1047.5 to 2864.8 with an average of 1755.9 ± 441.4 (Table 2). For DFIRE a range between −30.7 and 204.8 with an average of −11.6±35.6 is computed for the near native solutions while for the 25% worst solutions there is a range between −18.9

and 213.7 with a corresponding average of −5.6 ± 35.7 (Tables 1 and 2). Especially, for DFIRE considerable overlap exists between the score values obtained for the near native structures and the 25% worst structures, which makes the setting of a target independent threshold for selection purposes more difficult. Next, it was investigated whether it is possible to set a threshold for selection purposes independent of the used docking method. As mentioned earlier all decoy sets investigated were obtained by

Journal of Computational Chemistry

DOI 10.1002/jcc

2582

Fink et al.



Vol. 32, No. 12



Journal of Computational Chemistry

Table 2. Analysis of Average Results with Respect to Used Docking Algorithm.

Average results near native solutions Decoy set Dockground Rosetta CAPRI

PROCOS 10.40 ± 11.78 65.05 ± 20.60 27.49 ± 23.84

ZRANK 736.65 ± 587.98 −195.98 ± 40.34 147.80 ± 407.77

Average results 25% worst solutions DFIRE

PROCOS

ZRANK

DFIRE

−11.62 ± 35.57 −751.72 ± 303.64 −593.18 ± 318.18

0.18 ± 0.04 23.86 ± 12.83 0.26 ± 0.04

1755.94 ± 441.38 −94.11 ± 20.28 1109.01 ± 590.77

− 5.59 ± 35.73 −747.63 ± 303.78 −160.10 ± 165.68

Analysis of average results with respect to the used docking algorithm. For all used targets of a given decoy set average score values and corresponding standard deviations are given according to the measures calculated by PROCOS, ZRANK, and DFIRE. The Dockground decoy set was generated using the GRAMM-X docking approach, the Rosetta decoy set was obtained by RosettaDock and the decoys of the CAPRI targets were generated by a multitude of different algorithms. Targets which were used for calculating average values are summarized in Tables 3–5 for the Dockground, Rosetta, and CAPRI targets, respectively.

different methods. Table 2 summarizes the average score values and standard deviations obtained for the near native solutions and 25% worst solutions for all targets of one decoy set combined. In general, the same trend between near native structures and 25% worst solutions that was observed for the Dockground decoy set is also visible for the other decoy sets. However, as can be clearly seen for PROCOS as well as for ZRANK and DFIRE the obtained results strongly depend on the method used for decoy computation. The lowest, most favorable, scores (ZRANK and DFIRE) and highest probability values (PROCOS) were obtained for the decoys generated by RosettaDock, while for the decoys generated by GRAMM-X considerably higher scores, respectively lower probability values were obtained. The values obtained for the CAPRI test data are somewhat in between these values. This is true for both the near native structures and the 25% worst solutions. Further analysis shows that using either PROCOS, ZRANK, or DFIRE the average values obtained for the 25% worst solutions of the Rosetta data are more favorable than the average figures calculated for the near native structures of the Dockground set. These results clearly show that for the a priori setting of thresholds one has to take the used docking method into account. Influence of Priors

Although, as explained in the Methods section, due to the issue of defining appropriate priors, a PROCOS probability value cannot yet be interpreted as a real probability that a given complex structure is close to its native form. Currently PROCOS uses the approximation of p(N) = p(F) = 0.5 in eq. (1). It was also investigated to which degree results are influenced by this selection. Employing the values obtained from the data of the CAPRI scoring competition of 0.062 for p(N) and 0.938 for p(F) calculations were repeated for target 1dfj_E_I of the Dockground decoy set. The average values obtained by PROCOS for the near native and the 25% worst solutions are 6.17% and 0.01%, respectively. These probability-like values would of course be nearer to real probabilities that native complexes are found, but would depend significantly on the arbitrarily chosen dataset they were derived from. A comparison with the previous values of 36.36% for the near native structures and of 0.10% for the 25% worst solutions (Table 1) shows that a relatively large shift is caused by the selection of the priors but that in both

cases a clear gap is visible between the near native structures and the 25% worst solutions. Also, the selection of the priors has only a negligible influence on the ranking performance of PROCOS (data not shown). As a consequence we choose to stick to the approximation p(N) = p(F) = 0.5.

Reranking of Decoys

As shown earlier, often the setting of a global threshold for the selection of decoys for further analysis is difficult, in many applications the 10 or 20 best structures according to their corresponding scores are selected. The results of the rerankings of PROCOS, ZRANK and DFIRE for the Dockground decoy set are given in Table 3, showing the number of near native solutions found in the top 10, top 20, and top 50 structures. For method comparison, the number of near native structures within the 10 top ranked structures was counted. In case that the same number of structures was assessed the following line of Table 3 (top 20 ranked solutions) was evaluated. In case that no distinction was achieved by the top 2 lines, the compared methods were considered as equal. Inspection of Table 3 shows that in comparison with ZRANK PROCOS performs in 23 cases better, in three cases both methods perform equally and in 14 cases ZRANK outperforms PROCOS. In comparison with DFIRE, PROCOS performs better in 29 cases, equally in 4 cases and worse in 7 cases. Adding the total number of near native structures found in the top 10 ranked structures for all targets, PROCOS detects 159 structures, ZRANK 105 and DFIRE 98 (Table 6). To further investigate the performance of the different methods, receiver operating characteristics (ROCs) were calculated for the Dockground decoy set. Analysis of the ROC curves in Figure 4 shows that using PROCOS an area under the curve (AUC) of 0.71 was obtained, followed by DFIRE with an AUC of 0.62, and ZRANK with an AUC of 0.58. These data clearly show that for the Dockground decoy set PROCOS performs quite well in terms of reranking decoys. For the 14 targets of the Rosetta decoy set results are shown in Table 4 and methods were compared as described earlier. For these targets PROCOS performs in 1 case better than ZRANK, while the opposite is true in 10 cases and in 3 cases both methods perform equally. The comparison with DFIRE shows for 4 targets a better performance of PROCOS, while in 5 cases DFIRE outperforms

Journal of Computational Chemistry

DOI 10.1002/jcc

PROCOS: Computational Analysis of Protein–Protein Complexes

2583

Table 3. Analysis of Dockground Targets. PROCOS Complex Top 10 Top 20 Top 50 Near natives

4 5 8

8 9 13

0 0 2

0 6 9

8 9 10

5 6 8

0 1 4

2 3 5

0 1 3 10 of 110

1 2 3 18 of 110

3 5 7 10 of 110

2 5 9 13 of 110

0 2 3 10 of 110

2 2 7 12 of 110

2 5 7

0 1 2 11 of 110

0 0 0

9 12 12

0 0 4 10 of 110

1 4 9 10 of 110

0 0 0 10 of 109

2 4 7 13 of 110

9 14 17

3 3 6

3 5 9 13 of 110

0 0 0

0 5 6

7 7 8

6 8 10 10 of 110

1 3 7

0 0 2

2 2 2

2 4 4 4 of 104

8 13 13

1 2 6

4 7 9

6 8 9 11 of 110

1 1 8

1 1 1

6 9 9

2 2 6 10 of 110

0 0 4

2 5 7

6 8 9

3 4 9 10 of 110

0 0 2

9 19 47

4 6 8

2 2 4 10 of 110

1 2 7 10 of 110

2 3 8 8 of 108

1 1 1 1 of 101

4 5 9 10 of 110

4 8 25 66 of 110

0 0 0

4 5 7

0 0 1 12 of 110

1 6 10

5 6 8

5 6 10

6 10 11 11 of 110

0 0 0

6 6 8

3 4 6

5 7 9 10 of 110

0 0 0

4 8 8

4 6 7

4 4 6 10 of 110

3 7 9 10 of 110

3 5 6 10 of 110

3 4 7 10 of 110

0 0 3

3 6 10

6 8 10 10 of 110

DOI 10.1002/jcc

0 1 8

0 0 0

0 1 8

1tmq_A_B 9 19 48

4 4 6

0 1 4 10 of 110

7 10 10

1w1i_A_F 10 12 12

3 3 4

2 3 4 4 of 104

0 0 0

2a5t_A_B 4 6 10

0 0 0

0 0 0 1 of 101

0 0 1

2fi4_E_I 2 7 9

3 4 11

2 2 6 11 of 110

0 0 2

3sic_E_I 0 0 0

4 6 9

6 7 10 10 of 110

Reranking results of all 40 binary targets from the Dockground decoy set that were generated using the GRAMM-X docking approach. For every target the number of near native structures in the top 10, top 20, and top 50 ranked solutions is shown, and the last line contains the total number of near native structures for each target.

Journal of Computational Chemistry

0 0 4

1ppf_E_I

3fap_A_B 0 3 6

10 12 12

1ku6_A_B

2ckh_A_B 0 1 8

0 2 6 12 of 110

1gpq_A_D

1yvb_A_I 8 10 10

DFIRE

1ewy_A_C

1ugh_E_I

2sni_E_I 0 0 7

7 8 10

1t6g_A_C

2btf_A_P 0 1 10

0 0 0

1oph_A_B

1x3d_A_B 0 0 4

1 4 6 10 of 110

ZRANK 1bvn_P_T

1he8_A_B

1u7f_A_B 0 0 0

PROCOS

1g6v_A_K

1s6v_A_B 4 8 12

DFIRE

1e96_A_B

1nbf_A_D 6 8 9

ZRANK 1bui_B_C

1he1_A_C

2goo_A_C 0 0 4

PROCOS

1fm9_A_D

2bkr_A_B

Complex Top 10 Top 20 Top 50 Near natives

6 7 7

1wq1_R_G

Complex Top 10 Top 20 Top 50 Near natives

8 12 15

1tx6_A_I

Complex Top 10 Top 20 Top 50 Near natives

5 7 10 15 of 110

DFIRE

1dfj_E_I

1r0r_E_I

Complex Top 10 Top 20 Top 50 Near natives

5 8 10

1ma9_A_B

Complex Top 10 Top 20 Top 50 Near natives

0 0 5

1gpw_A_B

Complex Top 10 Top 20 Top 50 Near natives

2 3 5 10 of 110

ZRANK 1bui_A_C

1f6m_A_C

Complex Top 10 Top 20 Top 50 Near natives

PROCOS

1cho_E_I

Complex Top 10 Top 20 Top 50 Near natives

DFIRE

1avw_A_B

Complex Top 10 Top 20 Top 50 Near natives

ZRANK

0 1 4

2584

Fink et al.

Vol. 32, No. 12





Journal of Computational Chemistry

Table 4. Analysis of Rosetta Targets. PROCOS Complex Top 10 Top 20 Top 50 Near natives

0 0 3

0 1 6

2 3 5 6 of 200

6 7 7 7 of 200

0 0 0

0 0 0 6 of 200

0 0 0

2 3 10

0 0 1 44 of 200

DFIRE

PROCOS

5 9 18 29 of 200

0 1 4

4 11 15

6 12 15 15 of 200

6 12 15

6 13 30

5 12 18

8 14 18 18 of 200

DFIRE

PROCOS

ZRANK

2 5 10 39 of 200

0 0 1

6 12 33

8 15 33 49 of 200

1 2 10

0 0 0

0 0 2 3 of 200

1 3 13

4 9 14 20 of 200

0 0 0

1ugh 10 15 26

10 19 47

10 20 40 71 of 200

2sni 0 1 2

DFIRE

1cse

1tgs

2sic 0 0 0

ZRANK 1cgi

1mah

1tab 0 0 0

ZRANK 1brs

2ptc

Complex Top 10 Top 20 Top 50 Near natives

PROCOS

1fss

Complex Top 10 Top 20 Top 50 Near natives

DFIRE

1acb

Complex Top 10 Top 20 Top 50 Near natives

ZRANK

10 20 48

1stf 1 4 6

3 8 15

10 12 16 17 of 200

3 7 13

2tec 0 0 2

0 0 1

1 2 9 18 of 200

1 2 9

Results achieved by PROCOS, ZRANK, and DFIRE in terms of reranking Rosetta targets. For every target the number of near native structures in the top 10, top 20, and top 50 ranked solutions is shown, and the last line contains the total number of near native structures for each target. Table 5. Analysis of CAPRI Targets. PROCOS Complex Top 10 Top 5% Top 10% Top 25% Top 50% Near natives

1 58 86 92 108

PROCOS

0 28 69 99 132 143 of 1607

0 0 0 1 5

0 0 1 1 2 7 of 843

ZRANK

DFIRE

PROCOS

T32 0 26 66 109 137

0 0 0 0 6

T37_2

Complex Top 10 Top 5% Top 10% Top 25% Top 50% Near natives

DFIRE

T29

Complex Top 10 Top 5% Top 10% Top 25% Top 50% Near natives

ZRANK

0 0 0 0 6 15 of 386

0 0 0 0 0

0 0 0 0 0 4 of 1049

DFIRE

PROCOS

T35 0 0 1 9 15

0 0 0 0 0

T39 0 0 0 5 6

ZRANK

0 0 0 0 0 2 of 351

2 4 8 81 202

7 29 63 180 245 346 of 1603

0 0 0 2 2

0 0 1 9 20

0 0 5 9 20 34 of 843

1 8 36 128 186 295 of 893

0 17 34 95 147

2 52 79 91 106

2 35 59 84 96 123 of 1603

4 14 19 43 178

Results achieved by PROCOS, ZRANK, and DFIRE in terms of reranking recent CAPRI targets. The number of near native structures found in the top 10 ranked structures is given in the first line (this is what would have been submitted in a CAPRI participation). Since the number of available decoys greatly varies between targets, the number of near native structures in the top 5%, top 10%, top 25%, and top 50% are provided for better comparison between targets. The last line shows the total number of near native structures together with the total number of decoys.

Journal of Computational Chemistry

DOI 10.1002/jcc

1 3 3 13 25

T40_CB

T41 5 22 40 101 192

DFIRE

T37_1

T40_CA 0 0 0 0 0

ZRANK

6 23 33 54 67

PROCOS: Computational Analysis of Protein–Protein Complexes

Table 6. Total Number of Near Native Structures Found in the Top 10

Structures of All Targets. Detected near native structures Decoy set Dockground Rosetta CAPRI

PROCOS

ZRANK

DFIRE

159 37 10

105 62 10

98 32 11

Total number of near native structures found by PROCOS, ZRANK, and DFIRE in the top 10 structures of all targets of the Dockground, Rosetta and CAPRI decoy sets.

PROCOS. For 5 targets both methods work equally. When analyzing the total number of near native structures found in the top 10 structures of all targets, ZRANK detects 62 structures, followed by PROCOS with 37 structures and DFIRE with 32 structures (Table 6). These results show that for the Rosetta decoys ZRANK shows the best performance in terms of reranking, while PROCOS and DFIRE perform almost equally with a slight advantage for PROCOS when considering the total number of decoys in the top 10 structures. For the decoys of the CAPRI scoring competition the results of the rerankings obtained by PROCOS, ZRANK and DFIRE are given in Table 5. As for the other decoy sets the number of near native structures found in the top 10 ranked structures is given in the first line. Since the number of available decoys greatly varies between targets, the number of near native structures in the top 5%, top 10%, top 25%, and top 50% are provided to allow for a meaningful comparison between different targets. Note that the decoys for targets 36 and 38 contained no near native structures and therefore, were omitted in Table 5. As noted in the description of the different decoy sets, two evaluations were performed for target T37 and for target T40. As described earlier, the first two lines of Table 5 were evaluated for method comparison. Analysis of results shows that PROCOS outperforms ZRANK for 4 targets, whereas ZRANK performs better for 1 target and for the remaining 5 targets PROCOS and ZRANK performed equally well. In comparison with DFIRE PROCOS performs in 3 cases better, in 2 cases worse and for the remaining 4 targets both methods performed equally. Inspecting the total number of near native structure found in the top 10 structures of all targets shows that DFIRE detects 11 structures followed by both ZRANK and PROCOS with 10 decoys each (Table 6). For targets T35 and T39 neither PROCOS, nor ZRANK, nor DFIRE performed well, however, for T35 only 2 and for T39 only four near native structures were available. Combining all three investigated decoy sets an overall number of 63 targets was analyzed. For 45 of these targets both PROCOS and ZRANK found at least one near native structure within the top 10 structures, while for DFIRE this was true in 26 cases. A summary of the results discussed previously shows that PROCOS compares in a considerable number of cases favorable with the other methods investigated. However, it also becomes clear that none of the methods is superior to the others in all cases. For the Dockground decoy set on average a clear advantage for PROCOS is visible, while for the structures calculated by RosettaDock ZRANK

2585

performs quite well. For each of the CAPRI targets decoys were generated by a number of different approaches. Here a slight advantage for PROCOS is visible.

Conclusion In our opinion, the PROCOS probability-like values are easier to interpret than the score values obtained by other approaches. One of the advantages of the PROCOS values is that they are by definition within well defined limits between 0 and 100% and therefore, are more general in nature than scores. In addition, they open the possibility to compare the docking results of different targets that were generated by the same docking method. The analysis of native structures has shown that for structures of comparable high quality, as it can be assumed for native structures, consistently high probability values irrespective of the investigated target were obtained by PROCOS. For the analysis of docking applications these values might serve as an expected upper-range of values that may be reached in some cases using a high resolution docking algorithm. A considerable subsection of false complexes can be eliminated from further analysis by setting an appropriate threshold a priori. As detailed earlier this threshold has to be set with respect to the used docking method. This is in part due to the functions used for calculating the van der Waals and electrostatic energies that are sensitive to small structural changes. In addition, it was shown that PROCOS performs well in terms of reranking existing decoys as examplified on the different decoy sets. PROCOS is freely available as an easy-to-use web server. Processing a pdb-file containing one or several models of a protein complex, it calculates a probability-like measure for each model that this structure belongs to the class of native complex structures. To support the user’s decision, the computed values are visualized in a plot which represents the probability distributions of the training data. In future developments we expect further improvements by adding additional discriminatory features such as hydrophobicity or solvent accessible surface area as mentioned in ref. 26. Due to the modular concept of PROCOS, this can easily be achieved.

Acknowledgments The authors thank Joël Janin and Marc Lensink for providing access to the CAPRI scoring data. They also thank Tully Ernst for carefully reading the manuscript.

References 1. Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Res 2000, 28, 235. 2. Aloy, P.; Russel, R. B. Nat Biotech 2004, 22, 1317. 3. Ritchie, D. W. Curr Prot Pep Sci 2008, 9, 1. 4. Janin, J.; Wodak, S. Structure 2007, 15, 755. 5. Lensink, M. F.; Méndez, R.; Wodak, S. Proteins 2007, 69, 704. 6. Katchalski-Katzir, E.; Shariv, I.; Eisenstein, M.; Friesem, A. A.; Aflalo, C.; Vakser, I. A. Proc Natl Acad Sci USA 1992, 89, 2195. 7. Walls, P. H.; Sternberg, M. J. J Mol Biol 1992, 228, 277.

Journal of Computational Chemistry

DOI 10.1002/jcc

2586

Fink et al.



Vol. 32, No. 12



8. Gabb, H. A.; Jackson, R. M.; Sternberg, M. J. J Mol Biol 1997, 272, 106. 9. Mandell, J. G.; Roberts, V. A.; Pique, M. E.; Kotlovyi, V.; Mitchell, J. C.; Nelson, E.; Tsigelny, I.; TenEyck, L. F. Protein Eng 2001, 14, 105. 10. Heifetz, A.; Katchalski-Katzir, E.; Eisenstein, M. Protein Sci 2002, 11, 571. 11. Meyer, M.; Wilson, P.; Schomburg, D. J Mol Biol 1996, 264, 199. 12. Moont, G.; Gabb, H. A.; Sternberg, M. J. Proteins 1999, 35, 364. 13. Zhang, C.; Liu, S.; Zhou, H.; Zhou, Y. Protein Sci 2004, 13, 400. 14. Fernández-Recio, J.; Totrov, M.; Skorodumov, C.; Abagyan, R. Proteins 2005, 58, 134. 15. Camacho, C. J.; Gatchell, D. W.; Kimura, S. R.; Vajda, S. Proteins 2000, 40, 525. 16. Murphy, J.; Gatchell, D. W.; Prasad, J. C.; Vajda, S. Proteins 2003, 53, 840. 17. Li, C. H.; Ma, X. H.; Shen, L. Z.; Chang, S.; Chen, W. Z.; Wang, C. X. Biophys Chem 2007, 129, 1. 18. Müller, W.; Sticht, H. Proteins 2007, 67, 98. 19. Martin, O.; Schomburg, D. Proteins 2008, 70, 1367. 20. Dominguez, C.; Boelens, R.; Bonvin, A. M. J. J. J Am Chem Soc 2003, 125, 1731. 21. Kastritis, P. L.; Bonvin, A. M. J. J. J Proteome Res 2010, 9, 2216. 22. Cornfield, J. Biometrics 1969, 25, 617. 23. Mintz, S.; Shulman-Peleg, A.; Wolfson, H. J.; Nussinov, R. Proteins 2005, 61, 6.

Journal of Computational Chemistry

24. Wolowski, V. R. Computational analysis of protein–protein complexes related to knowledge based predictions of interaction, Master thesis, Department of Computer Science, University of Hagen, German, 2008. 25. Capri homepage. Available at http://www.ebi.ac.uk/msd-srv/capri/. 26. Ezkurdia, I.; Bartoli, L.; Fariselli, P.; Casadio, R.; Valencia, A.; Tress, M. L. Brief Bioinform 2009, 10, 233. 27. Word, J. M.; Lovell, S. C.; Richardson, J. S.; Richardson, D. C. J Mol Biol 1999, 285, 1735. 28. Brünger, A. T.; Adams, P. D.; Clore, G. M.; DeLano, W. L.; Gros, P.; Grossekunstleve, R. W.; Jiang, J. S.; Kuszewski, J.; Nilges, M.; Pannu, N. S.; Read, R. J.; Rice, L. M.; Simonson, T.; Warren, G. L. Acta Cryst D 1998, 54, 905. 29. deVries, S. J.; vanDijk, A. D. J.; Krzeminski, M.; vanDijk, M.; Thureau, A.; Hsu, V.; Wassenaar, T.; Bonvin, A. M. J. J. Proteins 2007, 69, 726. 30. Chang, C. C.; Lin, C. J. http://www.csie.ntu.edu.tw/∼cjlin/libsvm 2001. 31. Pierce, B.; Weng, Z. Proteins 2007, 67, 1078. 32. Yang, Y.; Zhou, Y. Proteins 2008, 72, 793. 33. Mintseris, J.; Wiehe, K.; Pierce, B.; Anderson, R.; Chen, R.; Janin, J.; Weng, Z. Proteins 2005, 60, 214. 34. Chen, R.; Mintseris, J.; Janin, J.; Weng, Z. Proteins 2003, 52, 88. 35. Liu, S.; Gao, Y.; Vakser, I. A. Bioinformatics 2008, 24, 2634. 36. Tovchigrechko, A.; Vakser, I. A., Proteins 2005, 60, 296. 37. Gray, J. J.; Moughon, S.; Wang, C.; Schueler-Furman, O.; Kuhlman, B.; Rohl, C. A.; Baker, D. J Mol Biol 2003, 331, 281.

Journal of Computational Chemistry

DOI 10.1002/jcc