Simulation of conformational changes occurring ... - Semantic Scholar

Report 1 Downloads 82 Views
Computational Biology and Chemistry 31 (2007) 196–206

Simulation of conformational changes occurring when a protein interacts with its receptor S. Costantini a,b , G. Colonna b , A.M. Facchiano a,b,∗ a

Laboratory of Bioinformatics and Computational Biology, Institute of Food Science, CNR, via Roma 52 A/C, 83100 Avellino, Italy b CRISCEB, Research Center of Computational and Biotechnological Sciences, Second University of Naples, via Costantinopoli 16, 80138 Naples, Italy Received 5 September 2006; accepted 26 March 2007

Abstract In order to simulate the conformational changes occurring when a protein interacts with its receptor, we firstly evaluated the structural differences between the experimental unbound and bound conformations for selected proteins and created theoretical complexes by replacing, in each experimental complex, the protein-bound with the protein-unbound chain. The theoretical models were then subjected to additional modeling refinements to improve the side chain geometry. Comparing the theoretical and experimental complexes in term of structural and energetic factors is resulted that the refined theoretical complexes became more similar to the experimental ones. We applied the same procedure within an homology modeling experiment, using as templates the experimental structures of human interleukin-1␤ (IL-1␤) unbound and bound with its receptor, to build models of the homologous proteins from mouse and trout in unbound and bound conformations and to simulate the interaction with the related receptors. Our results suggest that homology modeling techniques are sensitive to differences between bound and unbound conformations, and that modeling with accuracy the side chains in the complex improves the interaction and molecular recognition. Moreover, our refinement procedure could be used in protein–protein interaction studies and, also, applied in conjunction with rigid-body docking when is not available the protein-bound conformation. © 2007 Elsevier Ltd. All rights reserved. Keywords: Molecular modeling; Protein–protein interaction; Conformational changes; Protein–receptor complex; Interleukin-1 beta

1. Introduction It is well known that proteins may assume different stable conformations, depending on the different environment or their interaction with other molecules. In fact, the enzymes adapt their conformation to the substrate when it is recognized, and the monomeric proteins modify their conformation when subjected to oligomerization. The differences between two stable conformations of the same protein may be very subtle, as well as, in the extreme case of the prion protein, a change from a “mainly alpha

Abbreviations: 3D, three-dimensional; ASA, accessible surface area; CASP, comparative assessment of structure prediction; FGF, fibroblast growth factor; FGFR, fibroblast growth factor receptor; G-CSF, granulocyte colony stimulating factor; IL-1␤, interleukin-1 beta; IL-1R, IL-1␤ receptor; PDB, Protein Data Bank; RMSD, root mean square deviation; TGF, transforming growth factor ∗ Corresponding author at: Institute of Food Science, CNR, via Roma 52 A/C, 83100 Avellino, Italy. Tel.: +39 0825 299625; fax: +39 0825 299813. E-mail address: [email protected] (A.M. Facchiano). 1476-9271/$ – see front matter © 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiolchem.2007.03.010

helix” to a “mainly beta sheet” architecture may occur. In any case, the structural modification is driven by the thermodynamics rules, and we can consider that, when a protein interacts with another molecule, it modifies its “protein-unbound” conformation to assume a “protein-bound” conformation, more suitable for the interaction. Computational tools for protein docking simulations, recently reviewed (Vajda and Camacho, 2004), allow to set up rigid-body docking, flexible docking, molecular dynamics and free energy calculations. The Critical Assessment of Protein Interactions (CAPRI) competitions have been devoted to the problem of predicting protein–protein interactions (Mendez et al., 2005). In recent editions, some of the targets were protein structures to be created by homology modeling, on the basis of templates selected by the management committee (Janin, 2005; Camacho, 2005; Rajamani et al., 2004). This added one more difficulty to the competition, being the final prediction of protein–protein interaction affected by a previous prediction of the protein conformation. The comparative assessment of tech-

S. Costantini et al. / Computational Biology and Chemistry 31 (2007) 196–206

niques of protein structure prediction (CASP) has confirmed that “models competitive with experiment can be produced for proteins with sequences very similar to that of a known structure” (Moult et al., 2003, 2005). Therefore, the comparative modeling strategy may be applied for creating models to be used in further protein–protein interaction studies. Good models are easily obtained when high sequence identity exists between the target and the template protein, but in the case of very low sequence identity, homology modeling can produce wrong models, as a consequence of wrong sequence alignments (Moult et al., 2003, 2005; Venclovas, 2001, 2003; Tress et al., 2005). Therefore, sequence identity of at least 20–40% is the lowest limit to apply homology modeling with success. In addition the researcher may have often to choose the template among more structures of the same protein, because frequently a protein has been solved in different conditions, and in particular, unbound as well as bound conformations. In theory, if we need a model of a target protein for simulating the interaction with its receptor, the best template model to use should be the complexed with the receptor. However, it could be possible that only an unbound conformation is available. Can it be used with success? To answer this question, we firstly studied the conformational differences between the unbound and bound structures, and, then, how homology modeling strategy is able to simulate these differences. To this scope, we selected from the PDB some proteins, for which both the protein-unbound and protein-bound structures are available and investigated their conformational differences by comparing them and evaluating how the interaction with the receptor is affected by these differences. We have also evaluated how the unbound protein conformation can be used to simulate the protein–receptor complex and to improve its conformation by side chains modeling. Finally, we applied the homology modeling strategy to predict the unbound and bound conformations, and the protein–receptor complexes, of mouse and trout interleukin-1␤ (IL-1␤), by using as templates the experimental conformations of human IL-1␤, in the unbound state and complexed with its receptor. The mouse and trout IL-1␤ have 78 and 34% of sequence identity with the human template, respectively, and so they are examples ranging from the highest to the lowest limits of application of comparative modeling. We simulated the interaction of mouse and trout IL-1␤ with the respective receptors and observed how subtle differences in

197

the conformations of the two templates are still present in the modeled proteins. Moreover, we verified that the conformational changes occurring in the interaction of IL-1␤ with its receptor can be reproduced by comparative modeling techniques and that the molecular recognition could be improved by modeling with accuracy the protein side chains in the complex. 2. Methods 2.1. Selection of protein models and simulation of protein–receptor complexes The searching tools of the PDB web site were used to select proteins having an experimental protein–receptor complex structure deposited, as well as an experimental structure of the protein-unbound. The proteins selected and the related PDB codes are reported in Table 1. In order to evaluate how the protein-unbound molecular structure is suitable to the interaction with the receptor, each experimental model of the protein–receptor complex was modified by replacing the protein chain with the protein-unbound model. Insight II package (Accelrys, Inc.) was used to superimpose the chain of the protein-unbound model to the corresponding chain in the experimental complex, which was then removed. After this assembly procedure, each theoretical complex was subjected to three different strategies of modeling refinement aimed to improve the side chain geometry: (1) an optimization procedure (see below), which generates the complex COMPL-A; (2) an optimization procedure followed by enhanced modeling of side chain conformations of both protein and receptor, which generates the complex COMPL-B; and (3) the enhanced modeling of side chain conformations of complexed protein and receptor, followed by optimization of the complex, which generates the complex COMPL-C. The optimization procedure was based on the standard structure optimization protocol of InsightII package, i.e. 500 steps of energy minimization under conjugate gradient algorithm applied in vacuo (dielectric constant = 1.00). The enhanced modeling of side chains of amino acids in theoretical complexes was performed by SCWRL3 program (Canutescu et al., 2003), that uses a backbone-dependent rotamer library (Dunbrack, 2002) and an energy function based on the log probabilities of these rotamers, and a simple repulsive steric energy

Table 1 Comparison of ”protein-unbound“and ”protein-bound“conformations

Human Fgf-1 Human Fgf-2 Human G-CSF Human IL-1␤ Human IL-4 Human IL-10 Human TGF-beta3a

CATH classification class – architecture-topology

Protein-unbound PDB code

Protein-bound with the receptor PDB code

RMSD (C-alpha)

RMSD (all-atoms)

Mainly beta – trefoil–trefoil Mainly beta – trefoil–trefoil Mainly alpha – Up-down bundle–4 helix bundle Mainly beta – trefoil–trefoil Mainly alpha – Up-down bundle–4 helix bundle Mainly alpha – Up-down bundle–4 helix bundle Mainly beta – ribbon–cystine knot cytokines

2AFG A 2BFH B 1GNC 1IOB 1HIK 2ILK A 1TGK

1EVT A 1CVS A 1CD9 A 1ITB A 1IAR A 1J7V L 1KTZ A

0.45 0.37 5.14 1.08 1.13 1.56 2.19

0.91 1.21 5.64 1.78 2.06 2.19 2.90

RMSD values (expressed in Angstroms) are referred to the comparison of the “protein-unbound” and “protein-bound” conformations. a The TGF-beta3 chain in the PDB entry 1KTZ lacks the 55–72 segment. The RMSD values for the two fragments are the following: 13–54 fragment, RMSD = 2.52 (C-alpha) and 3.50 (all-atoms); 73–112 fragment, RMSD = 1.53 (C-alpha) and 1.75 (all-atoms).

198

S. Costantini et al. / Computational Biology and Chemistry 31 (2007) 196–206

term. In order to compare complexes similarly optimized, the optimization procedure was applied also to the experimental structures of protein–receptor complexes, to generate complexes COMPL-D. The CVFF force field within the Discover module of Insight II was used to assign potentials and charges. To compare the reliability of the results obtained with two different force fields, the same optimization procedure was applied by using AMBER force field. 2.2. Analysis and modeling of IL-1β protein structure and IL-1β/IL-1R complexes Protein sequences of IL-1␤ and IL-1R were taken from public databases (human IL-1␤: GenBank M15840; mouse IL-1␤: GenBank M15131; trout IL-1␤: GenBank AJ223954; human IL-1R: GenBank X16896, SwissProt P14778; mouse IL-1R: GenBank M29752, SwissProt P13504; trout IL-1R: GenBank AJ295296, TrEMBL Q8AXU1). The 3D models of IL-1␤ were created by homology modeling in agreement to the procedure used in our laboratory (Caporale et al., 1999, 2003; Facchiano et al., 2001; Scapigliati et al., 2004; Marabotti et al., 2004; Marabotti and Facchiano, 2005; Costantini et al., 2005; Buonocore et al., 2006), based on the application of a number of bioinformatics tools. Briefly, we used BLAST (Altschul et al., 1990) for sequence similarity searches, CLUSTALW (Thompson et al., 1994) and CONSENSUS (Prasad et al., 2004) for multiple alignments, MODELER (Sali and Blundell, 1993) implemented in the Quanta molecular simulation package (Accelrys, Inc.) to build full-atom protein models, PROCHECK (Laskowski et al., 1993) to evaluate their stereo chemical quality, DSSP (Kabsch and Sander, 1983) for secondary structure assignment to 3D models, NACCESS (Hubbard et al., 1991) for the evaluation of solvent accessibility, Insight II (Accelrys, Inc.) for molecular superimposition and energy calculations, HBplus (McDonald and Thornton, 1994) to evaluate the presence of putative Hbonds. All software was used without modification of standard settings. The theoretical IL-1␤/IL-1R complexes in trout and mouse were created on the reference of the experimental complex between human IL-1␤ and its receptor (PDB code: 1ITB) (Scapigliati et al., 2004). These complexes were then subjected to same modeling refinement strategies to improve the side chain conformations and same optimization procedures were applied to the other protein–receptor complexes.

The “Protein–Protein Interaction Server” and the program NACCESS (Hubbard et al., 1991) were used to identify also the amino acids at the protein–receptor interface. 3. Results 3.1. Comparisons of the protein-unbound and protein-bound experimental conformations We selected seven proteins from the PDB, for which both the protein-unbound structure and the protein-bound with receptor structure are available as experimentally solved. The conformational changes, occurring when the protein interacts with its receptor, were analyzed by structural superimpositions, obtain˚ ing RMSD values (see Table 1) ranging from 0.45 to 2.19 A for the alpha-carbons, with the exception of the G-Csf, for ˚ was observed. This higher difference which a value >5 A may be related to the experimental methods and conditions used to solve the protein-unbound structure (i.e. NMR) and the protein–receptor complex (i.e. RX diffraction), whereas also in the case of IL-1␤ the experimental structures were obtained with different methods, but RMSD value was lower. By excluding the G-Csf values, the average of the RMSD ˚ for the alpha-carbons and 1.84 A ˚ for allvalues is 1.13 A atoms. We created theoretical protein–receptor complexes for each protein by substituting the protein chain in the experimental model of the complex with the protein-unbound experimental model (see Section 2 for details). Each theoretical complex was subjected to three additional modeling phases and named as reported in Section 2. The optimization procedure, applied to the experimental and theoretical complexes, was repeated using CVFF and AMBER force fields. The differences between each COMPL-D and three related theoretical complexes (i.e. COMPL-A, COMPL-B, COMPL-C) were evaluated by structural superposition obtaining RMSD values shown in Fig. 1. The COMPL-C complexes were always more similar to the experimental reference in terms of both the backbone and all-atom conformations. The RMSD val-

2.3. Evaluation of protein–protein interaction In order to compare the protein–protein interaction in complexes, we computed the following parameters: (i) the binding free energies, by using the program DCOMPLEX (Liu et al., 2004) and FOLDX server (Schymkowitz et al., 2005); (ii) the interface accessible surface area (ASA) of the complexes, evaluated with the ”Protein–Protein Interaction Server“(Jones and Thornton, 1996); and (iii) the presence of putative Hbonds, by using the program HBplus (McDonald and Thornton, 1994).

Fig. 1. Comparison of experimental and theoretical complexes, optimized using CVFF force field, in terms of RMSD values. Bar graphs represent RMSD values (for all-atoms), expressed in Angstroms, obtained comparing the “COMPL-D” with “COMPL-A” (dotted bars) or “COMPL-B” (blank bars) or “COMPL-C” (horizontal line bars) conformations.

S. Costantini et al. / Computational Biology and Chemistry 31 (2007) 196–206 Table 2 Evaluation of protein–receptor interaction in the theoretical and experimental complexes optimized using CVFF force field Complexes FGF1/receptor1 COMPL-D COMPL-C COMPL-B COMPL-A FGF2/receptor1 COMPL-D COMPL-C COMPL-B COMPL-A G-CSF/receptor COMPL-D COMPL-C COMPL-B COMPL-A IL-1␤/IL-1R COMPL-D COMPL-C COMPL-B COMPL-A IL-4/IL-4R COMPL-D COMPL-C COMPL-B COMPL-A IL-10/IL-10R COMPL-D COMPL-C COMPL-B COMPL-A TGF-beta3/tbr-2 COMPL-D COMPL-C COMPL-B COMPL-A

˚ 2) ASA (A

Number of H-bonds

1167 1146 1126 1094

17 18 16 16

1359 1350 1344 1342

23 18 12 18

1017 1136 1052 1121

25 16 6 18

2337 2258 2183 2158

36 31 28 31

902 895 699 835

21 14 5 9

990 956 929 839

19 12 5 10

575 608 508 535

10 7 6 6

Accessible surface area (ASA) values have been evaluated with the “Protein–Protein Interaction Server” (Jones and Thornton, 1996). H-bonds have been evaluated with the program HBplus (McDonald and Thornton, 1994).

ues, obtained superposing the experimental complexes with the COMPL-A and COMPL-B complexes, were equal for back˚ bone (data not shown) and differ in the range of 0.1–0.5 A for side chain conformations, because each COMPL-B complex was obtained from COMPL-A by modeling only the side chains. Then, to compare the experimental and theoretical complexes in terms of the protein–receptor interaction, we computed for each complex binding free energy by DCOMPLEX and FOLDX (Fig. 2), surface area of interaction and number of the inter-chain H-bonds (Table 2). The experimental complex (i.e. COMPL-D) had a more negative value of free binding energy (except than for the free energy values evaluated with DCOMPLEX and FOLDX for the FGF1/receptor 1 and G-Csf/receptor complexes, respectively), an higher value of interface ASA (except for G-Csf/receptor complex), and an higher number of H-bonds. The COMPL-C complexes had the energy term always more favourable to the interaction than that

199

evaluated in other two theoretical complexes (i.e. COMPL-A and COMPL-B) and in the FGF1/receptor 1 and G-Csf/receptor complexes the value of binding free energy is more negative than that computed in the related COMPL-D complexes. Instead, the COMPL-B complexes showed the worst protein–receptor interaction in terms of free binding energy, of interface ASA value and of H-bonds number. The same results were obtained using AMBER force field in the optimization procedure (see Table S1 and Figs. S1 and S2 in the Supplementary data). These results indicated that the conformational changes occurring in the transition from the protein-unbound to the protein-bound can be appreciated by the evaluation of these computed parameters, and that modeling with accuracy the side chains of the proteins in the theoretical complex can improve the interaction between each protein and its receptor. In fact, just one poorly modeled side chain could be the difference between detecting and missing protein interaction, as recently reported (Camacho, 2005). 3.2. Comparisons among the two experimental models of human IL-1β We investigated in more details one of the protein–receptor systems, i.e. the IL-1␤ molecular structures and their differences. The reasons for the choice of this protein–receptor system will be described in detail under Section 4. Two different experimental models of human IL-1␤ were available in PDB, i.e. the protein-unbound (PDB code: 1IOB) and bound with its receptor (PDB code: 1ITB, chain A) structures. We compared these two conformations in terms of structural superimposition, as already shown in Table 1, and of their secondary structures (see Fig. 3) characterized by a mainly beta architecture. Although differences in the overall content of secondary structure were less evident, the secondary structure was differently assigned for 27 amino acids, which means 18% of the whole protein, and 63% of them were at the interface with the receptor. The bound form had a lower content of beta structure and an higher content of coil structure. By considering that the algorithm used for the evaluation of secondary structure is based on backbone H-bonds, these differences in secondary structure reflect backbone modifications occurring when human IL-1␤ interacts with its receptor, mainly in the region of the protein–protein interface, while effects related to the side chains conformations should be evaluated by other more detailed analyses. We also evaluated what IL-1␤ amino acids are at the interface with the receptor in the experimental complex and in the three theoretical complexes obtained as described in the previous paragraph, in order to analyze how the different conformations of specific amino acids at the interface may affect the interaction. As an example, we show in Fig. 4 the Lys92-IL-1␤ environment in the experimental and in the theoretical complexes. It is evident that, in COMPL-D, Lys92 is very close to Asp253 of the ˚ and H-bonds and electrostatic interactions are receptor (1.6 A), possible. On the contrary, the distances increase in the COMPLA and COMPL-B complexes, in agreement with the computed loss of H-bonds (see Table 2). In COMPL-C, Lys92 of IL-1␤

200

S. Costantini et al. / Computational Biology and Chemistry 31 (2007) 196–206

Fig. 2. Binding free energies for the complexes, optimized using CVFF force field. Bars represent the differences of binding free energy (in kcal/mole) for theoretical complexes in comparison to the experimental complex, computed by using the program DCOMPLEX (A) and FOLDX server (B).

S. Costantini et al. / Computational Biology and Chemistry 31 (2007) 196–206

201

Fig. 3. Comparison of human IL-1␤ secondary structures. The sequence of the human IL-1␤ is reported in the first row. Amino acids in bold are involved into the binding with the receptor, as evidenced by the analysis of the 1ITB experimental model of the human complex. The other rows report secondary structure elements assigned with the DSSP program for the two experimental models (1IOB and 1ITB).

˚ like in the forms H-bond with Asp253 of the receptor (1.8 A) experimental complex. These results indicate that the refinement modeling procedure, applied to the side chains of complexed protein and receptor, improves the molecular interaction.

3.3. Modeling of mouse IL-1β and comparison to the experimental model Is it possible to apply homology modeling, to predict the “unbound” and “bound” conformation of a given protein? Their

Fig. 4. Details of the interaction IL-1␤/receptor. The possible interaction between Lys92-IL-1␤ and Asp253-IL-1R in the theoretical complexes and in the experimental complex (optimized using CVFF force field) is highlighted. The backbone of the receptor is shown as blue ribbon. The backbone of the IL-1␤ in COMPL-A is shown in red (panel A), in COMPL-B is shown in cyan (panel B), in COMPL-C is shown in purple (panel C) and in COMPL-D is shown in green (panel D). Evidenced amino acids are in stick-and-ball representation with standard atom colours.

202

S. Costantini et al. / Computational Biology and Chemistry 31 (2007) 196–206

Fig. 5. Comparison of mouse IL-1␤ secondary structures. The sequence of the mouse IL-1␤ is reported in the first row. Amino acids in bold are involved into the binding with the receptor by homology to the human IL-1␤ and reported in Figs. 1 and 2. The other rows report secondary structure elements assigned with the DSSP program for the two theoretical models (m-ThF and m-ThC) and the experimental model 2MIB.

differences will affect the simulation of the protein–receptor complex? We tried to answer this question by simulating IL1␤/IL-1R complexes with protein structures obtained by a homology modeling experiment. Mouse, as well as trout (next paragraph) proteins were used being their similarity to the human proteins at the boundaries of the identity range considered suitable for homology modeling studies. The BLAST pairwise alignment between mouse and human sequences indicated amino acid identity of 78% with only one gap insertion (Scapigliati et al., 2004) (Fig. S3 in Supplementary data). On the basis of this alignment of sequences, we created two models of mouse IL-1␤, each based on a different human model as template. We named these two models as m-ThF (i.e. mouse Theoretical “Free” model) and m-ThC (i.e. mouse Theoretical “Complexed” model), corresponding to the features of the human template used. The differences between the experimental mouse IL-1␤ model (PDB code: 2MIB) and the two theoretical models were evaluated by comparison of secondary structure elements (Fig. 5) and by structural superposition and related RMSD values (Table 3A). The RMSD values indicated that the experimental model of mouse IL-1␤, obtained in absence of the receptor, differs at a similar extent from both m-ThF and m-ThC mod˚ for alpha-carbons, 3.37 and 3.43 A ˚ for els (1.16 and 1.27 A all-atoms, respectively). The two theoretical models had some differences in their secondary structure elements. In the whole sequence of 149 amino acids, the secondary structure was differently assigned for 24 amino acids, which means 16% of the protein, and 58% of them were located at the interface with the receptor. Differences in the total content of secondary structure were less evident. The m-ThF model had an increase in beta structure and a decrease in coil structure. Similarly to the human models, the majority of these differences regarded IL-1␤ amino

acids putatively involved into the binding with the receptor, and an higher content of coil structure was observed for the m-ThC. 3.4. Interaction between mouse IL-1β and its receptor Recently, we have applied comparative modeling strategy to model the mouse IL-1 receptor (IL-1R) by using the experimental structure of human IL-1R as reference model (Scapigliati et al., 2004). The model was submitted to the PDB, and accepted (mouse IL-1R PDB code: 1OU3) having passed the required processing for validation. In the present work, on the basis of the experimental structure of the human IL-1␤/IL-1R comTable 3 ˚ between pairs of IL-1␤ models RMS deviations (A)

(A) Mouse IL-1␤ models C-alpha 2MIB m-ThF m-ThC All-atoms 2MIB m-ThF m-ThC

(B) Trout IL-1␤ models C-alpha t-ThF t-ThC All-atoms t-ThF t-ThC

2MIB

m-ThF

m-ThC

0 1.16 1.27

1.16 0 0.97

1.27 0.97 0

0 3.37 3.43

3.37 0 1.7

3.43 1.7 0

t-ThF

t-ThC

0 1.79

1.79 0

0 2.52

2.52 0

S. Costantini et al. / Computational Biology and Chemistry 31 (2007) 196–206

plex (PDB code: 1ITB), we simulated the interaction of the two obtained models of mouse IL-1␤ with the model of their receptor. We used as “reference” the complex generated with the m-ThC model, i.e. the mouse IL-1␤ model obtained using as template the bound structure of human IL-1␤, and named it as m-COMPL-D for similarity to the names previously used. The complex obtained using m-ThF, i.e. the mouse IL1␤ model obtained using as template the “unbound” human IL-1␤ structure, was subjected to same additional modeling phases applied to theoretical protein–receptor complexes. So, we named m-COMPL-A the complex after optimization procedure, m-COMPL-B the complex optimized and then subjected to modeling of side chains, and m-COMPL-C the complex subjected to modeling of the residue side chains and then to optimization procedure. The differences between the reference m-COMPL-D and the other three complexes (i.e. m-COMPL-A, m-COMPLB, m-COMPL-C), were evaluated by structural superposition obtaining RMSD values (Fig. 6A). The m-COMPL-C complex is more similar to m-COMPL-D in terms of both the backbone and all-atom conformations and the related RMSD values ˚ respectively. The RMSD values, obtained are 0.97 and 1.27 A, superposing m-COMPL-D with m-COMPL-A and m-COMPL˚ for side chain B, are equal for backbone and differ of 0.1 A conformations. For each simulated complex, the binding free energies have been evaluated using DCOMPLEX and FOLDX programs. In Fig. 6B and C, the differences in energies between m-COMPLD and other complexes are shown. The energy term is more favourable to the interaction in the m-COMPL-C, in which the side chain conformations of protein and receptor models complexed were modeled before optimization procedure. We calculated the interface ASA for all mouse complexes (Table 4). The value of interface ASA in m-COMPL-C resulted higher than that in other complexes. Moreover, by analyzing the inter-chain H-bonds (Table 4), we find the higher number of H-bonds in m-COMPL-D, followed by m-COMPL-C. Similar results were obtained also optimizing the complexes with AMBER force field (see Table S2 and Fig. S4 in Supplementary data).

Table 4 Evaluation of interaction in the theoretical IL-1␤/IL-1R complexes optimized using CVFF force field Complex Mouse m-COMPL-D m-COMPL-C m-COMPL-B m-COMPL-A Trout t-COMPL-D t-COMPL-C t-COMPL-B t-COMPL-A

˚ 2) ASA (A

H-bonds

2259 2275 2220 2222

32 28 8 26

2765 2579 2261 2394

41 34 8 30

203

Fig. 6. Mouse and Trout complexes, optimized using CVFF force field. (A) Bar graphs represent RMSD values, expressed in Angstroms, obtained comparing all-atom of m-COMPL-D with m-COMPL-A (dotted bars) or m-COMPL-B (blank bars) or m-COMPL-C (horizontal line bars), and of t-COMPL-D with t-COMPL-A (dotted bars) or t-COMPL-B (blank bars) or t-COMPL-C (horizontal line bars). (B) Bars represent the differences in binding free energy (in kcal/mole), computed by using the program DCOMPLEX, for complexes in mouse and in trout in comparison with m-COMPL-D and t-COMPL-D, respectively. (C) Bars represent the differences in binding free energy (in kcal/mole), computed by using the program FOLDX, for complexes in mouse and in trout in comparison with m-COMPL-D and t-COMPL-D, respectively.

3.5. Modeling of trout IL-1β Trout IL-1␤ has 34% of amino acid identity with the human sequence. A previous multiple alignment of the whole family of IL-1␤ sequences was used to obtain the best alignment of trout and human sequences (Scapigliati et al., 2004). On the basis of this alignment of sequences, two models of trout IL-1␤ were created, each on a different human model as template. These two models were named as trout Theoretical Free model (t-ThF) and trout Theoretical Complexed model (t-ThC), corresponding to the features of the human template.

204

S. Costantini et al. / Computational Biology and Chemistry 31 (2007) 196–206

Fig. 7. Comparison of trout IL-1␤ secondary structures. The sequence of the trout IL-1␤ is reported in the first row. Amino acids in bold are involved into the binding with the receptor. The other rows report secondary structure elements assigned with the DSSP program for the two theoretical models (t-ThF and t-ThC).

The differences between the two models were evaluated by comparison of secondary structure elements (Fig. 7) and by structural superposition and related RMSD values (Table 3B). ˚ and The RMSD values observed for the alpha-carbons (1.79 A) ˚ for all-atoms (2.52 A) suggested that differences exist at both backbone and side chain levels. Some differences were observed in their secondary structure. In the whole sequence of 166 amino acids, the secondary structure is differently assigned for 23 amino acids, 35% of them putatively located at the interface with the receptor. Differences in the total content of secondary structure were less evident. The t-ThF model had a higher content of beta structure and a lower content of coil structure, just as observed for the mouse models. 3.6. Interaction between trout IL-1β and its receptor Recently, we have applied comparative modeling strategy to model the trout IL-1 receptor (IL-1R) by using the experimental structure of human IL-1R as reference model (Scapigliati et al., 2004). This model was submitted to the PDB, and accepted (trout IL-1R PDB code: 1OU1) having passed the required processing for validation. We simulated the interaction of the two models of trout IL-1␤ with this receptor model. As for the mouse example, we named t-COMPL-D the complex obtained using t-ThC as IL-1␤ conformation. The complex, obtained using t-ThF, was subjected to same additional modeling phases, as for mouse theoretical complexes. We named t-COMPL-A the complex after optimization procedure, t-COMPL-B the complex optimized and, then, subjected to modeling of side chains, and t-COMPL-C the complex subjected to modeling of the residue side chains and then to optimization procedure. The differences between t-COMPL-D and other three complexes (i.e. t-COMPL-A, t-COMPL-B, t-COMPL-C) were evaluated by structural superposition obtaining RMSD values (Fig. 6A). The t-COMPL-C complex is more similar to t-COMPL-D and the related RMSD values, obtained comparing their backbone ˚ respectively. The RMSD and side chains, are 1.57 and 2.00 A,

values obtained for t-COMPL-A and t-COMPL-B were equal for ˚ for side chain conformations. backbone and different of 0.2A The binding free energies in each complex were evaluated and compared with that in t-COMPL-D (Fig. 6B and C) using DCOMPLEX and FOLDX programs. The energy term is more favourable to the interaction in the complex t-COMPL-D, in which it is present the IL-1␤ modeled by using as template the bound form of human IL-1␤. The value of interface ASA (Table 4) resulted higher in tCOMPL-D than in other complexes. Indeed, we have analyzed the inter-chain H-bonds in the trout complexes (Table 4) and the higher number of H-bonds was observed in t-COMPL-D. The optimization procedure was repeated also using AMBER force field, as in the case of mouse complexes, and the similar results were obtained (see Table S2 and Fig. S4 in Supplementary data). 4. Discussion Comparative modeling, or homology modeling, is considered the most reliable method to predict the 3D structure of a protein. It is based on the assumption that homologous proteins share the same structural organization, so that the known 3D structure of a protein can be used as template to model the structure of an homologous protein. The modeling of the backbone is considered right when the right sequence alignment is performed, but the modeling of side chains represents a task to be still improved (Cozzetto and Tramontano, 2005; Tramontano and Morea, 2003). From assessment of predictions, submitted for the CASP6 comparative modeling category, it is resulted that the accuracy of side chain modeling is improving. The SCWRL3.0 program “did well in rotamer accuracy” (Tress et al., 2005), and it was also considered the best program to build the side chains (Wallner and Elofsson, 2005). Modeling the side chains with accuracy should improve the recognition of protein–protein binding and the results of the rigid-body docking. Recently, it has been also reported that molecular dynamics is able to predict

S. Costantini et al. / Computational Biology and Chemistry 31 (2007) 196–206

the rotamer conformation of the anchor side chains important for the molecular recognition even if good results were obtained for complexes in which did not undergo significant structural rearrangements upon binding (Camacho, 2005; Rajamani et al., 2004). In our work, we explored the opportunity to model by homology the structure of a protein, and to use this model to simulate the interaction with the receptor. The best template to use is the conformation of the protein in the complex with the receptor, but when this model is not available, the unbound conformation can be used to create a starting model. After the assembly of the complex, the refinement of side chains followed by optimization procedure made the complexes more similar to the experimental ones in terms of conformations and protein–receptor interaction. This finding was not dependent by the force field used for energy calculations and optimization procedure, as we measured these parameters on complexes optimized with both CVFF and AMBER force field (the latter results are reported as Supplementary data). It is interesting to note that the modeling of mouse and trout IL-1␤ was performed on the basis of sequence similarity to the human templates ranging from highest to the lowest limits (78 and 34%, respectively) of application of comparative modeling, and in such cases the comparative modeling needs to be applied only after a careful alignment of the sequences. Therefore, the results related to the protein–receptor interaction confirmed that the models were of good quality and in agreement with expectable features and experimental results, as reported in a recent article (Scapigliati et al., 2004). Moreover, our results suggest a possible modeling procedure to apply when a model should be generated by using the unbound protein as template. Differences between unbound and bound protein conformations affect mainly the interface between protein and receptor, and the bound conformation can be better simulated with careful application of modeling and computational techniques. Our modeling procedure could represent an interesting tool, in conjunction with rigid-body docking, for further protein–protein docking studies. Acknowledgements This work was partially supported by CNR-Bioinformatics Project and Fondazione A. Buzzati-Traverso. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.compbiolchem. 2007.03.010. References Altschul, S.F., Gish, W., Miller, W., Myers, E., Lipman, D.J., 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403–410. Buonocore, F., Randelli, E., Bird, S., Secombes, C.J., Costantini, S., Facchiano, A., Mazzini, M., Scapigliati, G., 2006. The CD8alpha from sea bass (Dicentrarchus labrax L.): cloning, expression and 3D modeling. Fish Shellfish Immunol. 20, 637–646.

205

Camacho, C.J., 2005. Modeling side-chains using molecular dynamics improve recognition of binding region in CAPRI targets. Proteins 60, 245– 251. Canutescu, A.A., Shelenkov, A.A., Dunbrach Jr., R.L., 2003. A graphtheory algorithm for rapid protein side-chain prediction. Protein Sci. 12, 2001–2014. Caporale, C., Caruso, C., Facchiano, A., Nobile, M., Leonardi, L., Bertini, L., Colonna, G., Buonocore, V., 1999. Probing the modelled structure of wheatwin1 by controlled proteolysis and sequence analysis of unfractionated digestion mixtures. Proteins 36, 192–204. Caporale, C., Facchiano, A., Bertini, L., Leopardi, L., Chiosi, G., Buonocore, V., Caruso, C., 2003. Comparing the modeled structures of PR-4 proteins from wheat. J. Mol. Model. 9, 9–15. Costantini, S., Colonna, G., Rossi, M., Facchiano, A.M., 2005. Modeling of HLA-DQ2 and simulations of its interaction with gluten peptides to explain molecular recognition in celiac disease. J. Mol. Graph. Model. 23, 419– 431. Cozzetto, D., Tramontano, A., 2005. Relationship between multiple sequence alignments and quality of protein comparative models. Proteins 58, 151– 157. Dunbrack Jr., R.L., 2002. Rotamer libraries in the 21st century. Curr. Opin. Struct. Biol. 12, 431–440. Facchiano, A.M., Stiuso, P., Chiusano, M.L., Caraglia, M., Giuberti, G., Marra, M., Abruzzese, A., Colonna, G., 2001. Homology modeling of the human eukaryotic initiation factor 5A (eIF-5A). Protein Eng. 14, 881–890. Hubbard, S.J., Campbell, S.F., Thornton, J.M., 1991. Molecular recognition. Conformational analysis of limited proteolytic sites and serine proteinase protein inhibitors. J. Mol. Biol. 220, 507–530. Janin, J., 2005. The targets of CAPRI rounds 3–5. Proteins 60, 170–175. Jones, S., Thornton, J.M., 1996. Principles of protein–protein interactions derived from structural studies. Proc. Natl. Acad. Sci. U.S.A. 93, 13– 20. Kabsch, W., Sander, C., 1983. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637. Laskowski, R.A., MacArthur, M.W., Moss, D.S., Thornton, J.M., 1993. PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 283–291. Liu, S., Zhang, C., Zhou, H., Zhou, Y., 2004. A physical reference state unifies the structure-derived potential of mean force for protein folding and binding. Proteins 56, 93–101. Marabotti, A., D’Auria, S., Rossi, M., Facchiano, A.M., 2004. Theoretical model of the three-dimensional structure of a sugar binding protein from Pyrococcus horikoshii: structural analysis and sugar binding simulations. Biochem. J. 280, 677–684. Marabotti, A., Facchiano, A.M., 2005. Homology modeling studies on human galactose-1-phosphate uridylyltransferase and on its galactosemia-related mutant Q188R provide an explanation of molecular effects of the mutation on homo- and heterodimers. J. Med. Chem. 48, 773–779. McDonald, I.K., Thornton, J.M., 1994. Satisfying hydrogen bonding potential in proteins. J. Mol. Biol. 238, 777–793. Mendez, R., Leplae, R., Lensink, M.F., Wodak, S.J., 2005. Assessment of CAPRI predictions in rounds 3–5 shows progress in docking procedures. Proteins 60, 150–169. Moult, J., Fidelis, K., Rost, B., Hubbard, T., Tramontano, A., 2005. Critical assessment of methods of protein structure prediction (CASP)-Round 6. Proteins 61, 3–7. Moult, J., Fidelis, K., Zemla, A., Hubbard, T., 2003. Critical assessment of methods of protein structure prediction (CASP)-Round V. Proteins 53, 334– 339. Prasad, J.C., Vajda, S., Camacho, C.J., 2004. Consensus alignment server for reliable comparative modeling with distant templates. Nucleic Acids Res. 32, W50–W54. Rajamani, D., Thiel, S., Vajda, S., Camacho, C.J., 2004. Anchor residues in protein–protein interactions. Proc. Natl. Acad. Sci. U.S.A. 101, 11287–11292. Sali, A., Blundell, T.L., 1993. Comparative protein modeling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779–815.

206

S. Costantini et al. / Computational Biology and Chemistry 31 (2007) 196–206

Scapigliati, G., Costantini, S., Colonna, G., Facchiano, A., Buonocore, F., Boss`u, P., Holland, J.W., Secombes, C.J., 2004. Modeling offish interleukin 1 and its receptor. Dev. Comp. Immunol. 28, 429–441. Schymkowitz, J., Borg, J., Stricher, F., Nys, R., Rousseau, F., Serrano, L., 2005. The FoldX web server: an online force field. Nucleic Acids Res. 33, W382–W388. Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680. Tramontano, A., Morea, V., 2003. Exploiting evolutionary relationships for predicting protein structures. Biotechnol. Bioeng. 84, 756–762.

Tress, M., Ezkurdia, I., Grana, O., Lopez, G., Valencia, A., 2005. Assessment of predictions submitted for the CASP6 comparative modeling category. Proteins 61, 27–45. Vajda, S., Camacho, C.J., 2004. Protein–protein docking: is the glass half-full or half-empty? Trends Biotechnol. 22, 110–116. Venclovas, C., 2001. Comparative modeling of CASP4 target proteins: combining results of sequence search with three-dimensional structure assessment. Proteins 53, 47–54. Venclovas, C., 2003. Comparative modeling in CASP5: progress is evident, but alignment errors remain a significant hindrance. Proteins 53, 380–388. Wallner, B., Elofsson, A., 2005. All are not equal: a benchmark of different homology modeling programs. Protein Sci. 14, 1315–1327.