Parallel algorithms in molecular biology - Springer Link

Report 5 Downloads 196 Views
Parallel Algorithms in Molecular Biology

Robert L. Martino*, Tieng K. Yap, and Edward B. Suh Computational Bioscience and Engineering Laboratory Division of Computer Research and Technology National Institutes of Health Building 12A Room 2033 12 South Drive MSC 5624 Bethesda, Maryland 20892 U.S.A. *presenting and corresponding author E-mail address: [email protected] Voice: 301-496-1112 Fax: 301-402-2867

Abstract Scalable parallel computer architectures provide the computational performance needed for advanced computing problems in molecular biology. Many scientific challenges in molecular biology have associated with them a computational requirement that must be solved before scientific progress can be made. We have developed a number of parallel algorithms and techniques useful in determining biological structure and function. Two example applications are the alignment of multiple DNA and protein sequences using speculative computation and the calculation of the solvent accessible surface area of proteins used to predict the three-dimensional conformation of these molecules from their primary structure. Timing results demonstrate substantial performance improvements with parallel implementations compared with conventional sequential systems. As the developed methods allow molecular biologists to perform computational tasks that would not otherwise be possible, we continue to develop parallel algorithms useful to this important scientific field.

1

Introduction

High performance parallel computers provide the computational rates necessary for advanced biomedical computing problems [1,2]. Molecular biologists can greatly reduce the time it takes to complete computationally intensive tasks and take new approaches in processing their data. This may allow the inclusion of more data in a calculation, the determination of a more accurate result, a reduction in the time needed

233

to complete a long computation, or the implementation of a new algorithm or more realistic model. With proper computer network connections and interactive user interfaces, parallel computing is readily available to a biomedical researcher in the laboratory or clinic at the investigator's computer workstation. This availability makes it possible to include this powerful computational resource as another tool that can be used in the research process. We have implemented a high-speed communication network based on Asynchronour Transfer Mode (ATM) Switch technology to facilitate the use of parallel computing in these biomedical environments. Many scientific challenges in molecular biology have associated with them a computational requirement that must be solved before scientific progress can be made. We present the following two examples where we have developed new parallel algorithms and techniques to reduce the time required to complete computationally intensive tasks.

2 Multiple Sequence Alignment Using Speculative Computation Pairwise sequence alignment is an important tool for locating similarity patterns between two biological (DNA and protein) sequences. This analytical tool has been used successfully to predict the function, structure, and evolution of biological sequences. As more sequences were generated by the biomedical community, multiple sequence alignment was used to obtain a better understanding of biological sequences. Although the rigorous dynamic programming algorithm used for pairwise sequence alignment can be extended to align more than two sequences, it is impractical to extend this algorithm to the alignment of more than three sequences since memory space and computation time is proportional to the product of the sequence lengths. The Berger-Munson iterative improvement algorithm [3] has been successfully used to perform multiple sequence alignment. The core part of this algorithm is as follows: best_score=initial_score(); While (stop criteria is not met){ 1 currentscore = calculate(seq, gap_positions); 2 flag = decide(current_score, bestscore); 3 seq = modify(seq, flag, gap_positions);

} To implement a parallel version of this algorithm, we separated the computational process into three steps. In step 1, the n input sequences are first randomly partitioned into two groups. Then, the alignment score between these two groups of sequences is calculated. In this step, the new gap positions are also saved for performing the alignment in step 3. In step 2, a decision flag is set to A (accepted) if the new

234

resulting alignment is accepted; otherwise, it is set to R (rejected). A new alignment is accepted if the current score is higher than the current best score. If the decision flag in step 2 is set to A, the gap positions determined in step 1 are used to modify the current alignment in step 3 and the best score is updated. Then, the modified or unmodified alignment is used as the input for the next iteration. This iterative improvement algorithm continues until the stop criterion is met. We define the stop criterion as follows. After q consecutive iterations of rejections, the process is stopped where q is the number of all possible partitions. The Berger-Munson algorithm is highly sequential due to a loop-carried dependence between iterations. Iteration i depends on iteration (i-l) since step 3 may modify the alignment during the (i-1)th iteration and the modified alignment must be used by the ith iteration. In addition, the three steps within each iteration are also dependent on each other. Step 1 uses seq which may be modified by step 3. Step 2 uses currentscore which produces by step 1. Step 3 uses flag variable which is set in step 2 and gap positions which are generated in step 1. These dependencies make it difficult to implement a parallel version of this algorithm while preserving the behavior of the original sequential version. Speculative computation [4] has been applied efficiently to parallelize sequential algorithms like simulated annealing, an algorithm similar to the Berger-Munson algorithm. By applying speculative computation to the parallelization of the BergerMunson algorithm, we were able to obtain a efficient, scalable implementation [5]. In addition, our parallel alignment is guaranteed to be the same as the sequential one. The basic concept of speculative computation is to speculate the future solutions based on the current input parameters. Therefore, we can speculate (p-l) future solutions if we have p processors. In this application, we can speculate the alignments for the next (p-1) iterations based on the current alignment. In the original Berger-Munson algorithm, the final alignment is obtained by performing a sequence of alignments between two groups of sequences. Each iteration is accepted if its alignment score is higher than the current best score. Otherwise, it is rejected. We stated earlier that the ith iteration may depend on the (i-l)th iteration. To be exact, the ith iteration depends on (i-1)th iteration only if the (i-1)th iteration has accepted a new alignment; otherwise, it only depends on the last accepted iteration. Our parallel speculative computation approach is based on the recognition of the fact that a consecutive sequence of rejected iterations are not dependent on each other and can be done in parallel. Therefore, we can speculate that the (p-l) previous iterations will be rejected so that they can be done in parallel. If the speculations are correct, the computation time could be reduced by a factor of p. To evaluate the performance of our approach, we have used it to improve the alignments generated manually by experts, Kabat et al. [6], and automatically by a popular program, CLUSTALV [7]. Three different groups of immunoglobulin sequences with varying lengths and numbers of sequences were selected from the Kabat

235

Database (Beta Release 5.0) which is maintained by Kabat et al. [6]. The average sequence length is about the same for all three groups. However, the number of sequences in the third group is about twice the second one which is about twice the first one. MKL5 is the largest group in this database. CLLC is the chicken immunoglobulin lambda light chains V-region group. HHC3 is the human immunoglobulin heavy chains subgroup Ill V-region group. MKL5 is the mouse immunoglobulin kappa light chains V V-region group.

Number of Processors 1 2 4 8 16 32 64

Number of Processors 1 2 4 8 16 32 64

Speedup w.r.t. Kabat Alignment CLLC 1.0 1.8 3.4 6.4 11.6 19.5 29.5

HHC3 1.0 1.9 3.4 6.2 11.4 20.8 38.1

MKL5 1.0 1.9 3.8 7.3 14.1 28.3 53.1

Speedup w.r.t. CLUSTAL Alignment CLLC 1.0 1.8 3.3 6.1 10.7 17.0 23.8

HHC3 1.0 1.8 3.3 5.9 10.7 19.4 35.1

MKL5 1.0 1.9 3.7 7.1 13.6 26.3 50.7

The above tables show the speedup factors for the three groups of sequences on varying numbers of processors. The speedup factor is defined as the ratio of the total run time of the sequential version of the program to the total run time of the parallel version. These tables demonstrate that significant speedups were obtained for all three groups of sequences. From these tables, we can make three observations. First, we obtained the best speedup factors with the largest group, MKL5. Second, we obtained better efficiencies by using a smaller number of processors where efficiency is defined as the ratio of the speedup factor to the number of processors. Third, we achieved higher speedup factors when improving the Kabat alignments compared to CLUSTALV alignment improvement. The results of the first two observations are as

236

expected. The first observation was due to a larger number of partitions (search space) and the second to less communication and lower rates of incorrect speculations. For a larger number of processors, we had to speculate further into the future which resulted in a higher error rate. The third observation is due to the fact that the Kabat alignments were already better than the CLUSTALV alignments. As a result, there were more rejections in improving the Kabat alignments than the CLUSTALV alignments. 3

Calculation of the Solvent Accessible Surface Area of a Protein

Since it was first noted that the information needed to determine how a protein will fold resides within its amino acid sequence, the protein folding prediction problem has been one of the most important unsolved problems in biochemistry. Understanding the three-dimensional structure of proteins is important to studying their function in living systems and designing new ones for biological and medical purposes. The amino acid sequences of proteins are being discovered at an explosive rate. However, experimental procedures for determining their three-dimensional structure, such as xray crystallography and nuclear magnetic resonance spectroscopy, are slow, costly and complex. A need exists for theoretical and computational techniques that can be used to predict the structure from the sequence. We developed parallel computing methods for simulating protein folding so that more possible conformations can be considered and a more realistic energy function can be computed. This work involves strategies for searching through a large number of possible structures representing different energy states. The computationally intensive parts of a simulation are the long search through the great number of possible conformations and the computation of the free energy of the structures being considered during the simulation. We use a potential energy function that is the sum of three energy terms. The first term is an effective potential based on the ~),~t angle probabilities of the protein's main-chain. The second term is due to hydrogen bonding between the residues of the protein. The time to calculate these two terms is short. The third term is the hydrophobic potential which is proportional to the solvent accessible surface area (ASA) of the protein molecule. The calculation of the ASA of the numerous structures considered in the simulation requires parallel computer performancel We have used the Lee and Richards algorithm [8] to determine the ASA of a protein. With this algorithm, we consider a protein molecule as a set of interlocking spheres, one for each atom of the protein. The radius of the sphere for an atom j is given by the sum of the van der Waals radii of the a t o m j and the chosen solvent. In our work, the Lee and Richards set of the van der Waals radii are used and the solvent radius R s was set to 1.4 angstroms corresponding to water.

237

The entire set of interlocking spheres is sectioned by a set of parallel planes perpendicular to the z-axis with a predetermined spacing, ZkZ. The intersection of each sphere with a given plane appears as a circle. The arcs of a circle that are within other circles are eliminated. The drawing in any one plane thus becomes the trace of the accessible surface on the given section. If any part of the circle for a given atom is drawn, the atom is accessible and the summed length of all arcs for the atom will be a measure of the A S A of the atom in that plane. The total A S A of the atom is given by numerical integration of all the summed arc lengths appearing in all sections. An evaluation of the equation for determining the ASA of a protein reveals that the A S A calculation can be performed on individual atoms separately, and independently, if the location of the neighboring atoms of each atom are known. We can use a data replication method where each processor keeps a complete list of atomic coordinates of all the atoms in the protein structure. Subsequently, no communication is ~equired between processors until the end of the computation. Based on these observations, we implemented the following parallel algorithm [9]: 1.

Read the atomic coordinates and broadcast all N atomic coordinates to each processor, allowing each processor to keep a complete list of the atomic coordinates.

2.

Partition the atoms, which represent the computing workload, among the n processors: for each processor i, 1 _< i < n , determine a set of atoms for which it must compute the ASA; these atoms are defined in the range (start[i], end[i]) for processor i, where 1 _<start[i],end[i] _~.

.

In parallel, for all processors i: first, find the neighboring atoms r/j , for each atom j that is assigned to processor i, that is, for start[i] <j